Welcome!

Welcome to the official BlackBerry Support Community Forums.

This is your resource to discuss support topics with your peers, and learn from each other.

inside custom component

Java Development

Reply
Developer
LMcRae
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

VM Byte code optimizations

I have being doing some tests using JDE 5.0 to browse code generated.  I am trying to find some best practices for fast code and I am kind of surprised by some results.  Since I haven't a ton of Java experience I am wondering if I am mistaken.  I get the byte code by debugging the cod and selecting View->VM Byte Code and then I use Window->Copy Window to Clipboard.  BTW is there an easier offline way to get this?

 

Here are two functions.  The first some code I found of the net and the second one just has the intermediate variables removed.

 

 

	static float InvSqrt1( float x )
{
float xhalf = 0.5f * x;

int i = Float.floatToIntBits( x );

i = 0x5f3759df - (i >> 1);
x = Float.intBitsToFloat(i);
x = x * (1.5f - xhalf * x * x);

return x;
}
static float InvSqrt2( float x )
{
final float v = Float.intBitsToFloat( 0x5f3759df - (Float.floatToIntBits(x) >> 1) );

return v * (1.5f - (0.5f * x) * v * v);
}

 I am using Netbeans but I am compiling with rapc.  The only switch I use is

 

warnkey=0x52424200;0x52525400;0x52435200.

 

Look at the difference in byte code. The second one is 12 instructions shorter.  Shouldn't the compiler be smart enough to do what I did?  Am I missing a optimization switch?

 

 

static float InvSqrt1( float x )	42 instructions
1f:bdc enter
1f:bdd isreal
1f:bde iipush 00 00 00 3F
1f:be3 isreal
1f:be4 iload_0
1f:be5 fmul
1f:be7 isreal
1f:be8 istore_1
1f:be9 isreal
1f:bea iload_0
1f:beb invokestaticqc_lib 01 1A 68 50
1f:bf0 istore_2
1f:bf1 iipush DF 59 37 5F
1f:bf6 iload_2
1f:bf7 iconst_1
1f:bf8 ishr
1f:bf9 isub
1f:bfa istore_2
1f:bfb iload_2
1f:bfc invokestaticqc_lib 01 1A 75 50
1f:c01 isreal
1f:c02 istore_0
1f:c03 isreal
1f:c04 iload_0
1f:c05 isreal
1f:c06 iipush 00 00 C0 3F
1f:c0b isreal
1f:c0c iload_1
1f:c0d isreal
1f:c0e iload_0
1f:c0f fmul
1f:c11 isreal
1f:c12 iload_0
1f:c13 fmul
1f:c15 fsub
1f:c17 fmul
1f:c19 isreal
1f:c1a istore_0
1f:c1b isreal
1f:c1c iload_0
1f:c1d isreal
1f:c1e ireturn

static float InvSqrt2( float x ) 30 instructions
1f:c2d enter
1f:c2e iipush DF 59 37 5F
1f:c33 isreal
1f:c34 iload_0
1f:c35 invokestaticqc_lib 01 1A 68 50
1f:c3a iconst_1
1f:c3b ishr
1f:c3c isub
1f:c3d invokestaticqc_lib 01 1A 75 50
1f:c42 isreal
1f:c43 istore_1
1f:c44 isreal
1f:c45 iload_1
1f:c46 isreal
1f:c47 iipush 00 00 C0 3F
1f:c4c isreal
1f:c4d iipush 00 00 00 3F
1f:c52 isreal
1f:c53 iload_0
1f:c54 fmul
1f:c56 isreal
1f:c57 iload_1
1f:c58 fmul
1f:c5a isreal
1f:c5b iload_1
1f:c5c fmul
1f:c5e fsub
1f:c60 fmul
1f:c62 isreal
1f:c63 ireturn

 

 

I realize that the number of instructions doesn't always translate into faster but when they both have the same instructions plus the longer one has extra loads and stores, then it must.  In fact I have measure it on device and it is.

 

For the record, this fast InvSqrt( x ) is slower than 1.0f / (float)Math.sqrt( x ).  Using the Fixed32.sqrt is 10-15% faster even with the conversions to and from fixed.

 

 

 

 

 

Please use plain text.
Developer
rcmaniac25
Posts: 1,805
Registered: ‎04-28-2009
My Device: Z10 (STL100-4)-10.2.1.3253, Z10 (STL100-3)-10.3.1.634 Dev OS, Z30 (STA100-5)-10.3.1.634 Dev OS, Passport (SQW100-1)-10.3.0.1418, PlayBook (16GB)-2.1.0.1917

Re: VM Byte code optimizations

First; wow, so people have asked how to disassemble a COD and this is probably the most direct and "official" way to do it. Wonder why no one seems to have mentioned it before. if they did I never found it.

 

Compilers don't need to be smart and if the compiler sees the value assigned to a variable then it will assign the variable even if it's not needed.

---Spends time in #blackberrydev on freenode (IRC)----
Three simple rules:
1. Please use the search bar before making new posts.
2. "Like" posts that you find helpful.
3. If a solution has been found for your post, mark it as solved.
--I code too much. Well, too bad.
Please use plain text.
Developer
LMcRae
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

Re: VM Byte code optimizations

 


rcmaniac25 wrote:

Compilers don't need to be smart and if the compiler sees the value assigned to a variable then it will assign the variable even if it's not needed.


 

 

Sorry but I can't agree with that.  For the most part with compilers I leave it up to them to do the right thing and they do.  Only in the tight inner loops and other special cases do I look at the asm the compiler is producing to see if something could be better.

 

We are talking about variables that are PODs and local variables within a static function.  So there shouldn't be any issues with threads or aliased objects or other stuff.  If the compiler can't skip the extra loads and stores, then that  is scarey.  So I am assuming that either this is an unoptimized cod or I just don't have enough of an understanding of how Java and the JVM works.  Both are very possible. 

 

 

Please use plain text.
Developer
LMcRae
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

Re: VM Byte code optimizations

I found this link that talks about how rapc has optimizations on by default

 

http://supportforums.blackberry.com/t5/Java-Development/How-do-I-do-a-release-build-with-Optimizatio...

 

I did a few more tests, some easy ones.

 

 

    public static final float       PI      = 3.1415926535897932384626433832795f;

    static float TestD( float fValue )
    {
        final float fTwoPI  = PI * 2.0f;

        return fTwoPI * fValue;
    }
    static float TestE( float fValue )
    {
        final float fTwoPI  = 3.1415926535897932384626433832795f * 2.0f;

        return fTwoPI * fValue;
    }
    static float TestF( float fValue )
    {
        final float fTwoPI  = 6.283185307179586476925286766559f;

        return fTwoPI * fValue;
    }
    static float TestG( float fValue )
    {
        return 6.283185307179586476925286766559f * fValue;
    }

 

1f:d42 enter_narrow				float TestD( float fValue ), E and F the same
1f:d43 isreal
1f:d44 iipush DB 0F C9 40
1f:d49 isreal
1f:d4a istore_1
1f:d4b isreal
1f:d4c iipush DB 0F C9 40
1f:d51 isreal
1f:d52 iload_0
1f:d53 fmul
1f:d55 isreal
1f:d56 ireturn

1f:d9c enter_narrow				float TestG( float fValue )
1f:d9d isreal
1f:d9e iipush DB 0F C9 40
1f:da3 isreal
1f:da4 iload_0
1f:da5 fmul
1f:da7 isreal
1f:da8 ireturn

 

 

This is shocking considering optimizations are on.  Tell me I am wrong, please. 

 

 

Please use plain text.
Developer
ydaraishy
Posts: 562
Registered: ‎09-30-2009
My Device: Not Specified

Re: VM Byte code optimizations

[ Edited ]

I think it might have something to do with the fact that the first function modifies its argument. I can't quite do testing on this myself right now, so any thoughts I might have would be just guessing.

 

ed. this is for your first samples.

Please use plain text.
Developer
rcmaniac25
Posts: 1,805
Registered: ‎04-28-2009
My Device: Z10 (STL100-4)-10.2.1.3253, Z10 (STL100-3)-10.3.1.634 Dev OS, Z30 (STA100-5)-10.3.1.634 Dev OS, Passport (SQW100-1)-10.3.0.1418, PlayBook (16GB)-2.1.0.1917

Re: VM Byte code optimizations

[ Edited ]

It's OK if you don't agree, it's my opinion. I, for whatever reason, find low level codes such as x64 ASM, ByteCode, IL, etc. fascinating and "study" them.

 

One thing I learned is that compilers, no matter how powerful, professional, and well made they are can still take the obvious, such as what you are most likely thinking the output should be, and return pretty "unclean" code.

 

I myself have written secondary compilers to take code that is produced from "efficient" compilers that optimize the code and optimize them a second time and still can't get it as good as doing it by hand. Or at least get it to the point that I like.

 

My personal hatred is of low level code that basically says:

Line 10: //do something

Line 11: goto line 12

Line 12: //do something

 

If line 11 was to be removed it would save space, reduce clock cycles, etc. Pretty obvious and yet professional compilers still do that.

 

Edit on the second post:

Maybe the CPU on the BlackBerry has issues with floating point numbers and needs to constantly check to make sure that it is a real number (assuming that is what you are shocked about in the second post).

---Spends time in #blackberrydev on freenode (IRC)----
Three simple rules:
1. Please use the search bar before making new posts.
2. "Like" posts that you find helpful.
3. If a solution has been found for your post, mark it as solved.
--I code too much. Well, too bad.
Please use plain text.
Developer
ydaraishy
Posts: 562
Registered: ‎09-30-2009
My Device: Not Specified

Re: VM Byte code optimizations

I think what the OP is shocked about is that the extra variable isn't being optimized away even though it's constant.

 

The thing that puzzles me is the istore_1 but an iload_0 is used later. My understanding of bytecode is rusty, however...

Please use plain text.
Developer
Posts: 1,474
Registered: ‎04-14-2009
My Device: Not Specified

Re: VM Byte code optimizations

Keep in mind that the JDE uses a standard Java compiler (javac) to transform source code into .class files. RAPC then takes these .class files and transforms them into .cod files.

 

Now, javac doesn't optimize much on purpose (I'm not sure, but I think this is because it would hide too much information from the Just-in-Time (JIT) compiler otherwise). RAPC can't optimize much either as it starts with .class files (relatively low-level JVM byte code with a level of abstraction very similar to BlackBerry JVM byte code). RAPC can find unused methods and skip them, but that's not really the optimization you're interested in.

Please use plain text.
Developer
rcmaniac25
Posts: 1,805
Registered: ‎04-28-2009
My Device: Z10 (STL100-4)-10.2.1.3253, Z10 (STL100-3)-10.3.1.634 Dev OS, Z30 (STA100-5)-10.3.1.634 Dev OS, Passport (SQW100-1)-10.3.0.1418, PlayBook (16GB)-2.1.0.1917

Re: VM Byte code optimizations

If memory serves me right (taken from a reading from one of the Sun Java developers), javac up to and including JDK 4 did optimizations, javac v5 and above just keeps it there for backwards compatibility but it doesn't actually do anything. I don't really think that it doesn't do anything but that computers are so fast today that no one can really find a difference.

---Spends time in #blackberrydev on freenode (IRC)----
Three simple rules:
1. Please use the search bar before making new posts.
2. "Like" posts that you find helpful.
3. If a solution has been found for your post, mark it as solved.
--I code too much. Well, too bad.
Please use plain text.
Developer
LMcRae
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

Re: VM Byte code optimizations

[ Edited ]

 


klyubin wrote:

Keep in mind that the JDE uses a standard Java compiler (javac) to transform source code into .class files. RAPC then takes these .class files and transforms them into .cod files.


 

I noticed that when I had a compile error that it would show you the command line to javac.  It does use -g -O that enabled debugging info and optimizations.  They do warn about using the two together but the warning is about code that could go missing.  So I guess optimizations are on, there might be some concern with the -g being used but I doubt it. 

 


klyubin wrote:

Now, javac doesn't optimize much on purpose (I'm not sure, but I think this is because it would hide too much information from the Just-in-Time (JIT) compiler otherwise). RAPC can't optimize much either as it starts with .class files (relatively low-level JVM byte code with a level of abstraction very similar to BlackBerry JVM byte code). RAPC can find unused methods and skip them, but that's not really the optimization you're interested in.


 

Now this makes very good sense.  As I said I don't have a deep understanding of Java bytecode and the JVM itself.  Now the video I watched about how the Android compiler being so good is starting to make sense.  I just wish I could get the compiler to get rid of single local variables that are only used once.  I hate having to make these crazy long lines of code that are hard to maintain and debug. 

 

 

 

 

Please use plain text.