[Helma-user] file read and processing pretty slow. ideas?
Maksim Lin for technical support mailling lists
maksim_lin at ngv.vic.gov.au
Fri Jul 13 01:34:10 CEST 2007
Hi Joshua,
Since the word file is pretty smal (400kb) I can't see why the file i/o
would have much impact on the speed of running your code. I see in your
code that you not only do the file i/o, but tat the same time process
the data as you read it from the file and I suspect the spilt(),
filter(), sort(), join(), etc functions would be creating alot of
objects, which they would then drop references to, hence leaving a lot
of used memory for the garabage collector to clean up if memory fills up
so...
The first thing to check that comes to my mind is the memory settings
for the JVM.
Helma is very memory effecient (especially in comparison to anything
Java J2EE land) so it ships with a start.bat that uses the default
(conservative) memory settings for the settings JVM, But this will lead
to quite a bit of garbage collection running if your application needs a
fair bit of memory.
So if you're running on windows I'd recommend editing the start.bat file
and un-commenting out the line that reads:
rem set JAVA_OPTIONS=-server -Xmx128m
And try setting it to something like:
set JAVA_OPTIONS=-server -Xms64m -Xmx128m
That will tell the JVM to allocate initially 64 mb of heap space, since
on jdk 1.6 in winXP, it seems to default to around 11mb.
See here for all the different options:
http://java.sun.com/javase/6/docs/technotes/tools/windows/java.html
BTW it would be interesting to know what PHP's mem usage is while it
runs this algorithm?
Maks.
> -----Original Message-----
> From: helma-user-bounces at helma.org
> [mailto:helma-user-bounces at helma.org] On Behalf Of Joshua Paine
> Sent: Friday, 13 July 2007 06:49
> To: Helma User Mailing List
> Subject: [Helma-user] file read and processing pretty slow. ideas?
>
> I'm coming to Helma from PHP. I solved the kata at
> <http://codekata.pragprog.com/2007/01/kata_six_anagra.html>
> in PHP and in JavaScript. I tremendously prefer the
> JavaScript code (that's why I'm coming to Helma), but it was
> pretty slow compared to PHP. One source of slowness was
> certainly IO, especially reading.
>
> Here's my anagrams.hac code:
>
> var f, w, hash, start, a;
> start = (new Date()).getTime();
> (f = helma.File('c:/temp/wordlist.txt')).open();
> hash = {};
> while(w = f.readln()) {
> if(!hash[(key =
> w.toLowerCase().split('').filter(function(c){ return
> c>='0'; }).sort().join())]) hash[key] = [w];
> else hash[key].push(w);
> }
> f.close();
> res.contentType='text/plain';
> for(a in hash) if(hash[a].length>1)
> res.writeln(hash[a].join(' ')); writeln('found anagrams in
> '+(((new Date()).getTime() - start)/1000).toFixed(3)+' seconds');
>
> This takes about 3x as long to run as the equivalent (though
> longer and
> uglier) PHP on my system. Actually not quite equivalent,
> because in PHP handling a text file line by line is actually
> quickest done by reading the whole file into a string in one
> go with file_get_contents and then exploding it on "\n". I
> tried a similar technique in Helma using
> helma.File.readAll().split("\n"), but that proved to be
> slower than using helma.File.readln.
>
> Is there a faster way to read and go through a large file? A
> faster way to do the rest?
>
> Briefly, it works by stripping non-alphanumeric characters of
> each word, lowercasing, and sorting the rest of the
> characters. The sorted, stripped, lowercase word becomes a
> key in an object (being used merely as a hash table) which
> stores an array of words that reduce to that key,
> e.g.:
>
> {
> 'bdu':['bud','dub'],
> 'angt':['gnat','tang'],
> ...
> }
>
> Then loops through the keys and print out the ones that have
> more than one entry.
> _______________________________________________
> Helma-user mailing list
> Helma-user at helma.org
> http://helma.org/mailman/listinfo/helma-user
>
>
>
More information about the Helma-user
mailing list