05-31-2009 02:23 PM - edited 05-31-2009 02:32 PM
I did some tests and I have quite different results:
Test is performed using JDE 4.6.1 and BB 8900 simulator. doTest method is called from menu item and there is no other thread that might aquire look. I used loop to 100 to obtain some non-zero percent results. _cache is prefilled with 99 different objecs with keys from 1 to 99.
Do you have the same byte-code? I'm not sure that it is possible but doesn't compiler optimize your code by moving synchronization further?
Added:
I have one test I'd like you to perform. Could you add
entry.created = now;
line after
entry.accessed = now;
Will it take as long as the first assignment? Now I think that the issue is cache miss or something like this.
05-31-2009 02:48 PM
@SergGr: I appreciate your attempt to reproduce the behavior! Here is a screenshot showing the full bytecode as the Profiler shows it:

As you can see, the Profiler is clearly accounting the bulk of the clock ticks to the "lputfield" opcode. 6MM clock ticks for 8900 calls is ~680K ticks/call for "lputfield" alone - I sure hope that this is merely a flaw in the Profiler wrongly attributing where the real time is being spent and not what is really happening inside the simulator's VM, and I really hope this doesn't ever actually happen on real hardware.
05-31-2009 03:12 PM
SergGr wrote:I have one test I'd like you to perform. Could you add
entry.created = now;
line after
entry.accessed = now;
Will it take as long as the first assignment? Now I think that the issue is cache miss or something like this.
OK, here's what happens:

First assignment takes the bulk of the time, the second one is minimal.
The "cache" object is a LongHashtable of CacheEntry objects that has been stored and retrieved from PersistentStore and LongCache.get is being invoked by code that is enumerating over the entire hashtable using LongEnumeration ...
I don't think this is a cache miss (seriously, MMU cache miss on BB hardware is THIS slow? no way ...) but maybe VM (virtual memory, not virtual machine) paging in/out from RAM to [wherever].
Ugh ... :-)
05-31-2009 04:09 PM
Wow- did you see my earlier post on this re IA-32? It may be worth looking at the RIM hardware.
I'm doing a lot with persistent store and so far hadn't seen this be an issue except for commits
taking up a lot of time. But, also, I'm usually IO bound in any case. I'll have to see if I can produce something
like this, if in fact the ARM cache is not that severe a penalty. You really can't under estimate the neumann bottleneck-
going off chip can be a big deal but it is still surprising...
I have a recurring tirade on the boost list about memory accesses but with java you don't have a lot of control over
many aspects of memory allocation.
05-31-2009 05:25 PM
@marchywka: So, in an application that does heavy I/O and holds onto a significant amount of RAM, is there anything that can be done?
I wonder if I should use RecordStore instead of PersistentStore? Hmm.
Actually, the only reason I ever enumerate over the entire cache is to expire old entries. Maybe, instead of my Hashtable and CacheEntry approach, I should implement my own proper Cache interface and TimedCache, SizedCache and TimedSizedCache classes. Of course, doing so in Java might drive me to gouge my own eyes out with my fingers ...
I _so_ hate Java.
05-31-2009 06:13 PM
I can't really comment without knowing more about the hardware.
However, I eliminated all non-debug hash iterators in favor of my b-tree classes. Indexing things is now
pretty automatic. And, sure, I'm still playing with it and need to drop in some reorg code but profiling so far
as not been all that suprising. The interface doesn't hide everything but live and learn...
I'll have to play with some of these issues with simple test cases. The writes to persistent objects have not
been obvious time sinks in the emulator but it is possible I just didn't look in the right places. I have a number of speed issues
and profiled my b-trees under j2se but again nothing gross came up in the JDE profiler.My app in the emulator is probably
limited by RIM rendering classes rather than IO but in any case I'll have to give it more thought.
06-01-2009 01:44 AM
Really doubt this has anything to do with von neumann bottleneck scenario as the device memory bus and cpu aren't nearly fast enough...
I think it really has more to do with how the VM handles loading object tables before variable assignments. Note the segor was always using the same key so that table was already loaded in. But I'm just guessing...
06-01-2009 06:04 AM
Quite likely to be a jvm issue, again it all depends on details. I guess the lack of programmer control over memory
usage discouraged me from looking at the hardware but it may be worth looking at jvm too. For that matter,
what would profiler do with a last_pointer_to_large_thing=null if it ever provoked gc? Or would it really reflect
all the extra gc-time from things like string+=s2+s3+s4.... ?
The JVM is a big issue notably due to gc and other housekeeping.