[Helma-user] file read and processing pretty slow. ideas?

Joshua Paine joshua at papercrown.org
Fri Jul 13 03:26:09 CEST 2007


Maksim Lin for technical support mailling lists wrote:
> I can't see why the file i/o
> would have much impact on the speed of running your code.

If I cut all actual processing out and just read the file into a 
variable a line at a time (replacing the previous value each line), it 
still takes ~0.45 sec to run (as indicated by the in-code timer). 
Earlier tests indicated that if I did everything (including the loop 
that goes through the results) except actually write the output I saved 
~0.3 seconds. So Helma spends almost as much time just on IO as PHP does 
running the whole thing.

> I suspect the spilt(),
> filter(), sort(), join(), etc functions would be creating alot of
> objects, which they would then drop references to, hence leaving a lot
> of used memory for the garabage collector to clean up if memory fills 

Sounds likely. Nothing comes to my mind as a preferable way of doing 
this, though. Am I missing something? I certainly like the compactness 
of this version (long lines of code don't bother me--my main monitor is 
1680px wide).

> So if you're running on windows I'd recommend editing the start.bat file
> and un-commenting out the line that reads:
> rem set JAVA_OPTIONS=-server -Xmx128m
> 
> And try setting it to something like:
> set JAVA_OPTIONS=-server -Xms64m -Xmx128m

As I discovered early in my Helma adventures and just confirmed, the 
-server option doesn't work for me and Java won't start with it (am I 
using the wrong kind of JRE? how do I get the right one?), but I've 
changed the memory settings. It's jre1.5.0_10, btw.

Just starting Helma without loading anything java uses ~39 MB (I have 
jackrabbit and h2 running in the vm also, though not yet doing anything 
either).

Before the code was running in 2.9-3.16 seconds (different averages on 
different days for some reason).

With the new memory settings after a few outliers I'm consistently down 
to ~2.78 seconds, but java memory usage climbs ~10MB with each request 
until it stops at ~97MB. Task manager also shows that the script maxs 
out one core and uses a bit of the other sometimes during requests. 
LATER: with nothing going on, memory usage has eventually dropped to 17MB.

> BTW it would be interesting to know what PHP's mem usage is while it
> runs this algorithm?

PHP as I have it configured uses 8.3MB running sleep(5). It finishes too 
quickly (~0.9 sec) for the task manager to show its usage very well, but 
the highest number I've seen for memory usage was 19MB and cpu usage 
seems to max out at ~70% of one core.

The PHP implementation is quite a bit different, though, as PHP doesn't 
have objects for files or strings or arrays. Looks like this:

<?php
$start = microtime(true);
$words = explode("\n",file_get_contents('c:/temp/wordlist.txt'));
array_pop($words);
print(count($words)." words\n");
$hash = array();
$count = 0;
foreach($words as $word)
{
	++$count;
	$key = str_split(strtolower($word));
	sort($key);
	while($key[0]<'0') array_shift($key);
	$key = implode('',$key);
	if(!@$hash[$key]) $hash[$key] = array();
	$hash[$key][] = $word;
}
$out = fopen('anagrams.txt','w');
foreach($hash as $set) if(count($set)>1) fwrite($out,implode(" 
",$set)."\r\n");
fclose($out);
$time = microtime(true) - $start;
print "solved in $time seconds";
?>

-Joshua


More information about the Helma-user mailing list