[Helma-user] file read and processing pretty slow. ideas?
Joshua Paine
joshua at papercrown.org
Fri Jul 13 03:26:09 CEST 2007
Maksim Lin for technical support mailling lists wrote:
> I can't see why the file i/o
> would have much impact on the speed of running your code.
If I cut all actual processing out and just read the file into a
variable a line at a time (replacing the previous value each line), it
still takes ~0.45 sec to run (as indicated by the in-code timer).
Earlier tests indicated that if I did everything (including the loop
that goes through the results) except actually write the output I saved
~0.3 seconds. So Helma spends almost as much time just on IO as PHP does
running the whole thing.
> I suspect the spilt(),
> filter(), sort(), join(), etc functions would be creating alot of
> objects, which they would then drop references to, hence leaving a lot
> of used memory for the garabage collector to clean up if memory fills
Sounds likely. Nothing comes to my mind as a preferable way of doing
this, though. Am I missing something? I certainly like the compactness
of this version (long lines of code don't bother me--my main monitor is
1680px wide).
> So if you're running on windows I'd recommend editing the start.bat file
> and un-commenting out the line that reads:
> rem set JAVA_OPTIONS=-server -Xmx128m
>
> And try setting it to something like:
> set JAVA_OPTIONS=-server -Xms64m -Xmx128m
As I discovered early in my Helma adventures and just confirmed, the
-server option doesn't work for me and Java won't start with it (am I
using the wrong kind of JRE? how do I get the right one?), but I've
changed the memory settings. It's jre1.5.0_10, btw.
Just starting Helma without loading anything java uses ~39 MB (I have
jackrabbit and h2 running in the vm also, though not yet doing anything
either).
Before the code was running in 2.9-3.16 seconds (different averages on
different days for some reason).
With the new memory settings after a few outliers I'm consistently down
to ~2.78 seconds, but java memory usage climbs ~10MB with each request
until it stops at ~97MB. Task manager also shows that the script maxs
out one core and uses a bit of the other sometimes during requests.
LATER: with nothing going on, memory usage has eventually dropped to 17MB.
> BTW it would be interesting to know what PHP's mem usage is while it
> runs this algorithm?
PHP as I have it configured uses 8.3MB running sleep(5). It finishes too
quickly (~0.9 sec) for the task manager to show its usage very well, but
the highest number I've seen for memory usage was 19MB and cpu usage
seems to max out at ~70% of one core.
The PHP implementation is quite a bit different, though, as PHP doesn't
have objects for files or strings or arrays. Looks like this:
<?php
$start = microtime(true);
$words = explode("\n",file_get_contents('c:/temp/wordlist.txt'));
array_pop($words);
print(count($words)." words\n");
$hash = array();
$count = 0;
foreach($words as $word)
{
++$count;
$key = str_split(strtolower($word));
sort($key);
while($key[0]<'0') array_shift($key);
$key = implode('',$key);
if(!@$hash[$key]) $hash[$key] = array();
$hash[$key][] = $word;
}
$out = fopen('anagrams.txt','w');
foreach($hash as $set) if(count($set)>1) fwrite($out,implode("
",$set)."\r\n");
fclose($out);
$time = microtime(true) - $start;
print "solved in $time seconds";
?>
-Joshua
More information about the Helma-user
mailing list