01-23-2013 02:07 AM
Hii i am trying to compress a file that is being written into through GZIPOutputStream.
My code was generating successive files of 56kb.But now my files that are being compressed and generated are of
random sizes.Some are 64kb,38kb,49kb,59kb.
I am not able to undersand why.
Could I get some help.
Here is my code
ByteArrayOutputStream os = new ByteArrayOutputStream();
GZIPOutputStream gz = new GZIPOutputStream(os,GZIPOutputStream.COMPRESSION_B EST);
Solved! Go to Solution.
01-23-2013 03:52 AM
01-23-2013 04:53 AM
Simmons' comment is definitely true. GZIP's compression algorithm depends on repetitions of sequences of bytes and relative frequency of bytes or byte sequences. If you are compressing text data these are frequently seen, there will be load of 'spaces' for example. If you are compressing binary data, this can be less so and if you are compressing already optimized data, you can enlarge the file because of the overhead associated with GZIP.
You should change your call to make sure that GZIP uses the maximum window space - this is the space it uses when looking for repetitions, i.e. code:
GZIPOutputStream gz = new GZIPOutputStream(os,GZIPOutputStream.COMPRESSION_B
If this still enlarges your file, and you must send a GZIP file, then I suggest you actually repackage the data using
COMPRESSION_NONE
This will produce output that is a little longer that the original but will be quick, and is very easy to decompress.
Try one of the many packages that will create GZIP output and test your actual files on it to see if you see similar variations in sizing.
If you want to investigate this further, I would first 'round-trip' a file on the BlackBerry to make sure that you do get back the original data.
Hope this helps.
01-23-2013 06:33 AM
01-23-2013 06:33 AM
01-23-2013 06:57 AM
01-23-2013 07:09 AM
Agree with Simon.
To answer the question:
"However I am not clear with MAX_LOG2_WINDOW_LENGTH, what length is it talking about."
I thought I answered this in my original post with this comment:
"You should change your call to make sure that GZIP uses the maximum window space - this is the space it uses when looking for repetitions"
By setting MAX_LOG2_WINDOW_LENGTH, you are telling GZIP to search as far as it can for repetitions. From memory, and it depends on the implementation, GZIP will usually search the next 4 K bytes. The maximum it can do is either 32K or 64K bytes. It can't go any further because the offset value is a restricted length. By setting MAX_LOG2_WINDOW_LENGTH you are telling it to search as far as it can. This means it takes longer (a lot longer) to compress files because it compares more bytes looking for the repetitions.
Hope that is clear,
01-23-2013 11:05 PM