[Helma-user] using built-in db
Maksim Lin for technical support mailling lists
maksim_lin at ngv.vic.gov.au
Wed Jul 25 02:10:19 CEST 2007
> I've gotten only vague and conflicting information about this for
> Linux--ext3 in my case (where I'll be running the production version).
> Nothing like just trying it, I suppose.
My prod env is all linux and ext3 too, so trying the tests on there
gives:
created 1000 books in 0.078s scanned 1000 books in 0.044s
created 5000 books in 0.525s scanned 5000 books in 0.07s
created 10000 books in 1.438s scanned 10000 books in 0.118s
and pushing a bit further:
created 25000 books in 17.081s scanned 25000 books in 0.312s
created 50000 books in 90.35s scanned 50000 books in 1.044s
of course it hardly a fair comparison given that this is with -Xms256 on
a dual Xeon server with RAIDed 10k disks...
but it does go to show that throwing more hardware at the problem helps
a bit :-)
> > created 1000 books in 0.06s scanned 1000 books in 0.05s
> created 5000
> > books in 0.741s scanned 5000 books in 0.14s created 10000 books in
> > 7.231s scanned 10000 books in 0.26s
>
> That's pretty wild how create time gets a bit pathological by
> 10k, but read time is better than linear. This does give me
> some nice ballpark numbers which may be completely
> non-applicable to my situation but will help me feel better anyway :-)
I suspect the read time is helped by having a "warm" OS file cache and
helmas object cahce providing buffering to make the update operation
faster.
The code I've been using is incluced at the end of this email.
> If that's the only obstacle to using the XML DB on large
> datasets, though, I would think it would be easy enough to
> fix in Helma--just have a tiny bit of code to split
> predictably among n directories.
I had exactly the same thought!
Git (the linux kernel dev source control system) stores its files in
folders where the folders are the start of each file name
eg. if file has an id of "ff2acbefd....." is goes into folder "ff", file
"2b3aacbefd....." goes into folder "2b" etc...
I guess we could do the same by creating 100 folders named 00 to 99 and
then limiting the files in each folder to be from 000 to 999, though
that would be setting a "hard" limit of 100k objects - though I think
that if you were going to larger data seets, starting to use a sql db
would really make more sense.
> > and hema will recreate a new lcena db folder for the app!
>
> I've done this once already--nice indeed. What does "lcena" mean?
> (Google results suggest it was a typo, but I'm not trying to
> be obtuse--I really don't know what you meant.)
Sorry my typing is terrible! yes it was supposed to read: "clean" db
folder for the app.
Maks.
====================
if (req.data.count) {
var b1;
var startTime = new Date();
for(var i = 0; i < req.data.count; i++) {
b1 = new Book();
b1.title = "title 1";
b1.barcode = i;
b1.onLoan = false;
root.books.add(b1);
/*
if (i > 500) {
writeln("books:"+root.books.size());
res.abort();
}
*/
}
var stopTime = new Date();
res.writeln("created "+i+ " books in
"+(stopTime.getTime()-startTime.getTime())/1000+"s");
//writeln("books:"+root.books.size());
startTime = new Date();
for(var j = 0; j < req.data.count; j++) {
b1 = root.books.get(j);
if (b1 == null) {
res.writeln("FOUND NULL BOOK!");
res.abort();
}
b1.onLoan = true;
}
stopTime = new Date();
res.writeln("scanned "+j+ " books in
"+(stopTime.getTime()-startTime.getTime())/1000+"s");
}
More information about the Helma-user
mailing list