[Chugalug] ┌∩┐(◣_◢)┌∩┐ JAVA Swap Death

flushy at flushy.net flushy at flushy.net
Tue Oct 29 15:09:44 UTC 2013

Quoting Mike Harrison <cluon at geeklabs.com>:

> Ok, forget the nice things I said about the Oracle MySQL Enterprise Monitor.
> On a RHEL system doing NOTHING else... with 2 dual-cores and 16gb of ram...
> It hit swap death in < 24 hours since a reboot.

java is not some magical beast that consumes random memory.

The JVM is actually very structured and controlled in how much memory  
it consumes. You define it via some start up parameter. By default,  
with no arguments, it should be 128Megs of Heap. In practice, that's  
never enough for most non-trivial applications.

If you have 16GB in your machine, you should NEVER tell java to use  
16GB of heap. You'll just kill your machine. You need to answer  
several questions:

1) how much Heap/PermGen does your app require?
2) how much RAM does your regular user apps require?
3) how much RAM does your OS require?
4) how much RAM does your machine have?

So, you take:

$Left_Over_Ram = $4 - $3 - $2 - $1;

 From the discussion about this monitoring app, it appears that you  
probably only need about 256megs of heap. If it's creating graphs or  
processing large datasetse of monitoring data, then you might need 1G  
of Heap. It really depends upon the workload of that monitor app, what  
it presents to you, what it does on the backend, etc..

Here are some of my examples:

* Oracle SQL Developer (for Oracle databases) *
I have it configured to use 640Megs of Heap + 128Megs of PermGen.  
That's being generous... sometimes I run big queries or export them to  
spreadsheets or text files. That sucks up some memory to hold some of  
the data structures to build the spreadsheet when there are a lot of  
columns or rows. Most of the time it's only using about 400 megs of RAM.

* MyEclipse IDE *
This is configured for 1GB of Heap and between 128 - 512 Megs of  
PermGen. Why so much PermGen? Because when I debug an app, it has to  
suck in all the classes, the javadocs, the debug symbols, etc.. most  
of the time, however, 128M of PermGen is fine.

* Java Unit Test Environment *
This is a generic environment I have setup to run the unit tests or  
integration tests. I have it configured for 2GB of Heap and 512Megs of  
PermGen. Why? Well, some of my apps deal with pre-caching data from  
databases or files, or build indices from data streams, or various  
other memory intensive things. Not all do, but some.. and I run the  
tests with a memory configuration that allows me this flexibility.

* Production Middlware Orchestration Web Service *
(not it's real name, I name things after tragic Greek characters -- my  
management hasn't caught on yet)
This is an in-house web service tier that orchestrates other services,  
databases, custom feeds from 3rd parties, and billing systems into a  
cohesive layer that allows us to integrate with other teams or  
companies. The data feeds are heavily cached, highly monitored, and  
require extreme uptimes. This is configured for 14GB of Heap with  
about 192Megs of PermGen. The VM has 16GB of RAM, leaving ~2GB for the  
OS and file caching.

We determined these values from application and OS profiling. I know  
this app can use up to 80% of the Heap for instances of about 30  
seconds. This is mostly due to garbage collection latency -- which is  
OK as it allows for better performance.
The permgen size was calculated the same way, and it sits at 40% usage  
-- again OK, as PermGen is determined based on the code/classes that  
are loaded at runtime. This won't change until we perform a code push.
The remaining ~2GB for OS and other services was profiled to determine  
what services are actually running (rsyslog, sysedge, zabbix, cron,  
etc) as well as how much is typically used for file caching averaged  
over a week.

We've had uptimes of several months, only requiring app restarts for  
code pushes, and one VM reboot due to a storage subsystem outage --  
and that reboot wasn't really required, except we like having logs  
written to disk.

I have several java front end servers that have uptimes of almost a  
year, with application uptimes the same.

TL;DR: It's not java.. it's the app's code or the implementer ;-)


More information about the Chugalug mailing list