Welcome!

Welcome to the official BlackBerry Support Community Forums.

This is your resource to discuss support topics with your peers, and learn from each other.

inside custom component

Java Development

Reply
Developer
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

Here is my small optimization guide.

[ Edited ]

Here is a list of little rules I have compiled over the months.  Since I haven't contributed much lately here is something. 

 

These have all been checked by running code in the JDE debugger and logging the byte code from the JVM and comparing.

 

----------------------------------------------------------------------------------------------------
1. Don't mul by -1 to change sign.
----------------------------------------------------------------------------------------------------

fValue * -1.0f

generates more code than

-fValue


----------------------------------------------------------------------------------------------------
2. Compiler won't remove variables for you
----------------------------------------------------------------------------------------------------

int    foo = (bar * CONST_VALUE) / PI;
if ( foo < 0.0f )
    return foo;

generates more code than

if ( ((bar * CONST_VALUE) / PI) < 0.0f )
    return foo;

----------------------------------------------------------------------------------------------------
3. Chaining assignment generates less code.
----------------------------------------------------------------------------------------------------

x[0]  = 0.0f;
x[5]  = 0.0f;
x[10] = 0.0f;

generates more code than

x[0] = x[5] = x[10] = 0.0f;

----------------------------------------------------------------------------------------------------
4. Computing one over and storing it in a variable isn't a win.  It just doesn't pay to introduce a
   new variable to change a div into mul.
----------------------------------------------------------------------------------------------------

final float ood = 1.0f / vRayDir[i];
float       t1  = (vMin[i] - vRayPos[i]) * ood;
float       t2  = (vMax[i] - vRayPos[i]) * ood;

generates more code than

float   t1  = (vMin[i] - vRayPos[i]) / vRayDir[i];
float   t2  = (vMax[i] - vRayPos[i]) / vRayDir[i];

----------------------------------------------------------------------------------------------------
5. I am not convienced that using final for PODs has any effect.
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
6. Vector3f.distanceSquared() is 30% faster than my own java version.
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
7. Declaring a tmp variable inside of a loop doesn't effect anything.
----------------------------------------------------------------------------------------------------
float   fDiffX;
float   fDiffY;
float   fDiffZ;

for( int index = 0; index < 100; ++index )
{
    fDiffX = RayPos[0] - RayDir[0];
    fDiffY = RayPos[1] - RayDir[1];
    fDiffZ = RayPos[2] - RayDir[2];

    fSum += ((fDiffX * fDiffX) + (fDiffY * fDiffY) + (fDiffZ * fDiffZ));
}

is the same as the following for generated butecode

for( int index = 0; index < 100; ++index )
{
    final float fDiffX = RayPos[0] - RayDir[0];
    final float fDiffY = RayPos[1] - RayDir[1];
    final float fDiffZ = RayPos[2] - RayDir[2];

    fSum += ((fDiffX * fDiffX) + (fDiffY * fDiffY) + (fDiffZ * fDiffZ));
}
----------------------------------------------------------------------------------------------------
8. Don't test for truth using a boolean and == true.
----------------------------------------------------------------------------------------------------
boolean    bTrue = false;

if ( bTrue == true )
{
}

generates more code than

if ( bTrue )
{
}

Contributor
Posts: 34
Registered: ‎03-14-2010
My Device: N/A
My Carrier: Fledge

Re: Here is my small optimization guide.

I don't understand how creating a local variable in a loop is any different than creating a local variable within a method.  

 

Do you think the amount of byte code generated has an impact on performance?

 

It's a shame that we have to sacrifice clear & concise coding style for performance.  

Developer
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

Re: Here is my small optimization guide.

Assuming you're referring to #7.  I am pretty sure I have seen other guides or people suggestion that introducing a local variable within the scope of a loop is a bad thing.  I am assuming they feel it's incurring some kind of init cost for each iteration of the loop.  From what I have seen there is no difference in the byte code so there is likely no cost.

 

I think my assumptions for these test are likely correct since I do a diff on the byte code.  I am not just comparing raw number of instructions as there is no way of telling how long an instruction takes to execute.

 

If function B has all the same byte code as function A plus a couple extra, I assume that A is going to run slower.  How much it's hard to tell.  Really most of these tips are for your tight inner loops. 

 

With all this said, I see Android has added a JIT to their runtime.  People should understand that the optimization switch for javac has done nothing for several versions now.  Sun decided that individual platforms can better optimize on their own, which makes sense.  So the only optimizations you get is what rapc is doing.

Developer
Posts: 1,305
Registered: ‎01-21-2009
My Device: Not Specified

Re: Here is my small optimization guide.

Lots of things can be optimized. You seem to be focused only on byte code size. The other big one is, of course, execution speed, which in most cases, I think, is more important. Other things that one might want to optimize (perhaps not the right word any more) are maintainability, ease of coding, reusability, generality, etc.

 

These objectives usually work against each other. Unrolling loops, for instance, and maybe even replacing arrays (and the associated subscripting) with multiple variables, is often a huge win in speed, at the expense of bigger source and byte code. Likewise, specialized code is usually smaller, more efficient, easier to write, and easier to use than generalized code. But perhaps one general method that replaces many specialized methods is better for a particular application.

 

Of course, when you can improve everything at once (like your #s 2, 3 and 8), that's golden.

 

A couple of specific comments:

 

#1: Is this backwards?

 

#2: Alternative 2 returns what, exactly? Smiley Wink

 

#4: The first alternative generates more code, but it probably executes faster, particularly on processors without floating point hardware. It's 2 divides versus 1 divide + 2 multiplies + a few extra instructions. The timing specs for many IEEE 754 floating point libraries rate single precision divide time at more than twice that of multiply. If one were replacing more than two divides, the speed advantage would likely be with the first alternative for all libraries. (Beyond that, I would give strong consideration to using the Fixed32 class and avoiding floating point altogether.)




Solved? click "Accept as solution". Helpful? give kudos by clicking on the star.
Contributor
Posts: 28
Registered: ‎03-15-2010
My Device: none
My Carrier: none

Re: Here is my small optimization guide.

I don't want to be the illiterate one but what is a POD?

Highlighted
New Developer
Posts: 19
Registered: ‎07-14-2008
My Device: Not Specified

Re: Here is my small optimization guide.

Kudos Brother

 

/Hulk Hogan voice

Developer
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

Re: Here is my small optimization guide.

POD == Plain Old Data.  Basically anything that isn't an Object.

Contributor
Posts: 28
Registered: ‎03-15-2010
My Device: none
My Carrier: none

Re: Here is my small optimization guide.

Thank you.

Developer
Posts: 163
Registered: ‎04-16-2009
My Device: Not Specified

Re: Here is my small optimization guide.

 


Ted_Hopp wrote:

Lots of things can be optimized. You seem to be focused only on byte code size. The other big one is, of course, execution speed, which in most cases, I think, is more important. Other things that one might want to optimize (perhaps not the right word any more) are maintainability, ease of coding, reusability, generality, etc.

 

These objectives usually work against each other. Unrolling loops, for instance, and maybe even replacing arrays (and the associated subscripting) with multiple variables, is often a huge win in speed, at the expense of bigger source and byte code. Likewise, specialized code is usually smaller, more efficient, easier to write, and easier to use than generalized code. But perhaps one general method that replaces many specialized methods is better for a particular application.

 

Of course, when you can improve everything at once (like your #s 2, 3 and 8), that's golden.

 

A couple of specific comments:

 

#1: Is this backwards?

 

#2: Alternative 2 returns what, exactly? Smiley Wink

 

#4: The first alternative generates more code, but it probably executes faster, particularly on processors without floating point hardware. It's 2 divides versus 1 divide + 2 multiplies + a few extra instructions. The timing specs for many IEEE 754 floating point libraries rate single precision divide time at more than twice that of multiply. If one were replacing more than two divides, the speed advantage would likely be with the first alternative for all libraries. (Beyond that, I would give strong consideration to using the Fixed32 class and avoiding floating point altogether.)


 

You're correct about #1, thank you. I have made the change.

 

Agreed that #2 is pretty contrived but it illustrates the point.  Myself I will always lay down a local if it makes the code easier to read an maintain.  For a hot spot such as collision detection, it pays to cat it all together.

 

True enough for #4 but it's hard to say for sure.  The extra code could cause your icache to flush.  If it's being used enough maybe it's a win.  Hard to say. Either way it's a good point you make.

 

As for Fixed32.  I have done some tests, again kind of contrived as they are basiclly work in a loop but still.  Anyway I have found that unless your data is already in fixed point it's not always a win.  If you do use fixed point I would do the conversions yourself instead of using the functions as it's an extra cost.

 

 

 

 

 

 

 

 

Developer
Posts: 1,305
Registered: ‎01-21-2009
My Device: Not Specified

Re: Here is my small optimization guide.

You need to use final PODs if you want to do something like this (say, from a worker thread):

 

 

synchronized void updateStatus(final LabelField f) {
    final int status = getStatus(); // might return wrong value later
Application.getApplication().invokeLater(new Runnable() { public void run() { f.setText("Status=" + status); } } }

 

Without those 'final' qualifiers, this won't compile. Also, I think that the compiler replaces reference to final variables with their values if the values are compile-time constants.

 




Solved? click "Accept as solution". Helpful? give kudos by clicking on the star.