[Helma-user] managing helma collections

Joshua Paine joshua at papercrown.org
Tue Aug 7 04:54:58 CEST 2007


Maksim Lin for technical support mailling lists wrote:
> can this be implemented in the current persistance subsystem?

I am approximately nobody to Helma, but I've mapped out (in my head) a 
fairly memory-efficient way to do [most of] getOrderedView on 
arbitrarily large SQL-mapped collections (i.e., have the DB do the 
ordering but still use Helma's object cache and retrieve the rows as 
their proper type, etc., as though they had been retrieved directly from 
their Hop parent). This could be done just in JavaScript with the 
interfaces we have now, though some minor fiddling with the Java DB 
stuff instead of helma's DB class would make it more efficient.

getOrderedView returns a collection which has these properties (in the 
logical sense):
1) contains the same set of objects as the _children of the collection 
(typically a HopObject) that called it--no more nor less
2) its children may be in a different order from that in which they are 
mapped in the creator object
3) has [almost] all the usual methods of HopObject
4) but you can't extend it, b/c it's actually a Bean (or something) and 
created in Java code, not JavaScript code
5) stays in sync with its creator collection in that it sees deletes and 
adds in real time

It would be pretty easy to get 1, 2 and 3, improve on 4, and lose 5. 4 
is nice in principle, but not all that important for what appears to be 
the most common use case (get a subset of a sorted list of the children 
so we can easily display them).

First step is (in a new method, because the current method has its place 
and shouldn't be broken by changing the meaning of its arguments) create 
a simple object that knows its creator, and for every method called on 
it, call the same method with the same arguments on the creator and 
return the result. This useless proxy object already implements 1 and 3 
and improves on 4.

Now we just need 2. To implement it, the only method that need return 
anything different between the proxy and the original collection is 
get(). And there are three different invocations of get() depending on 
the argument type:

string: get the child with the accessname property equal to arg
number-as-string: get the child with _id equal to arg
number: get the arg-th child

Since it's only the order that changes between the orderedView/proxy and 
the creator, the proxy can just proxy for get(string) and 
get(number-as-string) as with all the other methods. It only has to do 
something different for get(number).

To do something useful with get(number), it can simply query the db for 
the column which maps to _id for the children with the select ordered as 
indicated when the proxy was created and an appropriate limit clause. 
Once it gets the _id value for the arg-th child under its sort order, it 
calls get(valueOf_idAsString) on its creator and returns the result.

Obviously you don't want to do 100 DB queries to show 100 sorted items 
on the page, so the first time someone asks the proxy for its nth child, 
it gets a list of that and the next 100 or 1000 _id values. Since these 
are only numbers, they won't take up much memory. Using the built-in 
helma.Database functionality would require each number to be wrapped in 
an object, but it still should be tolerable memory-wise since we're only 
getting the primary key value and only 1000 at a time. The list can be 
cached in the proxy's .cache, and so an ordered view done this way would 
involved one-or-few extra DB queries in normal cases, little extra 
memory, and the speed overhead of only an array index lookup and an 
extra function call per object read. And since when it comes down to it 
you're actually getting the HopObjects through the standard methods of 
their parent anyway, all the caching, etc. still works.

I don't actually need this at all myself ATM, and I'm not even using a 
SQL db on my first project, so I can't justify implementing it myself 
now. I'm happy to help others, though, or if you're patient and no one 
beats me to it I'll probably need it and write it myself eventually.

> Actually I'd also like to add that it would be great if this could be
> supported for both the sql *and* xml db's as that would really bring the
> xml db implementation upto par with the sql, in terms of features if not
> performance.

You can already do getOrderedView on the XML DB, but of course it has to 
read all the objects--there's no way around that without adding indexing 
to the XML DB, which probably increases the complexity level rather too 
much. If it's only a couple ordered views you need, you can maintain 
them yourself in other custom collections, but it's hard to think when 
that would be sensible to do that just using the getOrderedView that 
exists now wouldn't be sensible.

-- 
Joshua Paine
Chief Tower Builder
LetterBlock Software
http://letterblock.com/


More information about the Helma-user mailing list