[Helma-user] Problems with helma Swarm and jgroups

Franz Philipp Moser philipp.moser at chello.at
Fri Jan 11 12:13:32 CET 2008


Hannes Wallnoefer schrieb:
> Hi Philipp,

Hi,

> first, an administrative note: your message was held back by the
> mailing list software because of the attachment. The list is
> configured for a max message size of 40kb. I accepted your message for
> this time, next time please post attachments on the web and just
> include a link!

Sorry for that next time I will put the pictures and log files on one of 
our servers.

> 2008/1/10, Franz Philipp Moser <philipp.moser at chello.at>:
>> Hi list,
>>
>> I experinced latetly some outOfMemory Exceptions when using current
>> helmaswarm. With a profiler I was able to find 35824 org.jgroups.Message
>> Objects that consume a total of 300.000.000 Bytes of Memory (see
>> screenshot, retained size). This only happens on the app server that is
>> NOT the Swarm Master.
> 
> I think this is probably a problem with the JGroups configuration
> stack. A google search for "jgroups memory leak" revealed this:
> 
> http://osdir.com/ml/java.javagroups.general/2006-12/msg00022.html
 >
> This looks very much like your problem. A few questions:
> 
> - Can you find out by which class the retained Message objects are
> referenced? This should be possible to find out with Yourkit profiler
> (don't ask me how).

Thats a good idea and I found the following: 
http://static.brandnews.at/pm/images/profiler1.png

Its pbcast.NAKACK thanks for the hint. I think a lot of messages get 
lost between our servers, but we have no Problems with cuncurrency.

> - Which Jgroups stack do you use? Helmaswarms standard UDP stack does
> contain pbcast.Stable, but it doesn't have
> "discard_delivered_msgs="true"" in its NACKAK config.

We are using the TCP Stack. Here is our swarm.conf (IPs greyed out):

{{{
<helmaswarm>
     <jgroups-stack name="tcp">
         <!-- TCP based JGroups protocol stack -->
         TCP(start_port=7800;
             bind_addr=X.X.X.X;
             discard_incompatible_packets=true;
             max_bundle_size=64000;
             max_bundle_timeout=30;
             recv_buf_size=500000;
             send_buf_size=150000;
             down_thread=false;
             up_thread=false):
         TCPPING(initial_hosts=X.X.X.X[7800],X.X.X.X[7800];
             port_range=2;
             timeout=3000;
             num_initial_members=2;
             down_thread=false;
             up_thread=false):
         MERGE2(min_interval=5000;
             max_interval=10000;
             down_thread=false;
             up_thread=false):
         FD_SOCK(down_thread=false;
             up_thread=false):
         FD(timeout=10000;
             max_tries=5;
             shun=true;
             down_thread=false;
             up_thread=false):
         VERIFY_SUSPECT(timeout=1500;
             down_thread=false;
             up_thread=false):
         pbcast.NAKACK(gc_lag=50;
             retransmit_timeout=300,600,1200,2400,4800;
             down_thread=false;
             up_thread=false):
         pbcast.STABLE(desired_avg_gossip=20000;
             down_thread=false;
             up_thread=false):
         VIEW_SYNC(avg_send_interval=60000;
             down_thread=false;
             up_thread=false):
         pbcast.GMS(join_timeout=5000;
             join_retry_timeout=2000;
             shun=false;
             print_local_addr=true;
             down_thread=false;
             up_thread=false):
         FRAG2(frag_size=8192;
             down_thread=false;
             up_thread=false):
         pbcast.STATE_TRANSFER(down_thread=false;
             up_thread=false)
     </jgroups-stack>
</helmaswarm>
}}}

Reading 
http://www.jgroups.org/javagroupsnew/docs/manual/html/user-advanced.html#d0e2522 


we also should use discard_delivered_msgs="true" in the tcp stack. You 
helped us a lot.

Maybe I should dig a little bit into the docs.

> I also found that the current JGroups release is 2.6.1, whereas
> helmaswarm currently comes with 2.4.1-SP3. Updating JGroups might be
> worth a try, too.

Ok we will try this one if every other thing doesnt work.

> hannes
<snip />

One thing still makes me nervous. Messages with 7MBs, but I think thats 
an application problem ;)

I will tell you if this solved our problems.

THX for your help.

cu Philipp


More information about the Helma-user mailing list