[Helma-user] Problems with helma Swarm and jgroups
Franz Philipp Moser
philipp.moser at chello.at
Fri Jan 11 12:13:32 CET 2008
Hannes Wallnoefer schrieb:
> Hi Philipp,
Hi,
> first, an administrative note: your message was held back by the
> mailing list software because of the attachment. The list is
> configured for a max message size of 40kb. I accepted your message for
> this time, next time please post attachments on the web and just
> include a link!
Sorry for that next time I will put the pictures and log files on one of
our servers.
> 2008/1/10, Franz Philipp Moser <philipp.moser at chello.at>:
>> Hi list,
>>
>> I experinced latetly some outOfMemory Exceptions when using current
>> helmaswarm. With a profiler I was able to find 35824 org.jgroups.Message
>> Objects that consume a total of 300.000.000 Bytes of Memory (see
>> screenshot, retained size). This only happens on the app server that is
>> NOT the Swarm Master.
>
> I think this is probably a problem with the JGroups configuration
> stack. A google search for "jgroups memory leak" revealed this:
>
> http://osdir.com/ml/java.javagroups.general/2006-12/msg00022.html
>
> This looks very much like your problem. A few questions:
>
> - Can you find out by which class the retained Message objects are
> referenced? This should be possible to find out with Yourkit profiler
> (don't ask me how).
Thats a good idea and I found the following:
http://static.brandnews.at/pm/images/profiler1.png
Its pbcast.NAKACK thanks for the hint. I think a lot of messages get
lost between our servers, but we have no Problems with cuncurrency.
> - Which Jgroups stack do you use? Helmaswarms standard UDP stack does
> contain pbcast.Stable, but it doesn't have
> "discard_delivered_msgs="true"" in its NACKAK config.
We are using the TCP Stack. Here is our swarm.conf (IPs greyed out):
{{{
<helmaswarm>
<jgroups-stack name="tcp">
<!-- TCP based JGroups protocol stack -->
TCP(start_port=7800;
bind_addr=X.X.X.X;
discard_incompatible_packets=true;
max_bundle_size=64000;
max_bundle_timeout=30;
recv_buf_size=500000;
send_buf_size=150000;
down_thread=false;
up_thread=false):
TCPPING(initial_hosts=X.X.X.X[7800],X.X.X.X[7800];
port_range=2;
timeout=3000;
num_initial_members=2;
down_thread=false;
up_thread=false):
MERGE2(min_interval=5000;
max_interval=10000;
down_thread=false;
up_thread=false):
FD_SOCK(down_thread=false;
up_thread=false):
FD(timeout=10000;
max_tries=5;
shun=true;
down_thread=false;
up_thread=false):
VERIFY_SUSPECT(timeout=1500;
down_thread=false;
up_thread=false):
pbcast.NAKACK(gc_lag=50;
retransmit_timeout=300,600,1200,2400,4800;
down_thread=false;
up_thread=false):
pbcast.STABLE(desired_avg_gossip=20000;
down_thread=false;
up_thread=false):
VIEW_SYNC(avg_send_interval=60000;
down_thread=false;
up_thread=false):
pbcast.GMS(join_timeout=5000;
join_retry_timeout=2000;
shun=false;
print_local_addr=true;
down_thread=false;
up_thread=false):
FRAG2(frag_size=8192;
down_thread=false;
up_thread=false):
pbcast.STATE_TRANSFER(down_thread=false;
up_thread=false)
</jgroups-stack>
</helmaswarm>
}}}
Reading
http://www.jgroups.org/javagroupsnew/docs/manual/html/user-advanced.html#d0e2522
we also should use discard_delivered_msgs="true" in the tcp stack. You
helped us a lot.
Maybe I should dig a little bit into the docs.
> I also found that the current JGroups release is 2.6.1, whereas
> helmaswarm currently comes with 2.4.1-SP3. Updating JGroups might be
> worth a try, too.
Ok we will try this one if every other thing doesnt work.
> hannes
<snip />
One thing still makes me nervous. Messages with 7MBs, but I think thats
an application problem ;)
I will tell you if this solved our problems.
THX for your help.
cu Philipp
More information about the Helma-user
mailing list