04-24-2009 12:37 AM - edited 04-24-2009 01:38 PM
I am developing a dictionary on Blackberry and have some problem.What is the best solution for these problems below:
1) Compress file
2) When I compress each block of file, How to Get Size of block after compress ?
3) Reading and decompress a portion (or block) of compressed file
4) Random access (read) file
Share your solution with me! Thanks very much !
04-24-2009 01:27 PM
04-25-2009 01:18 AM
04-25-2009 07:06 AM
OK. I have solved my problems. But I want to ask some others :
1) GZIP vs Zlib
2) I have solved Random Access File problem by : reset() to go to beginning of file and skip(nByte) to go to n-byte in file. Is there a better solution ?
04-25-2009 10:36 AM
I'm not entirely sure what you are doing but you probably want a data structure, not a compression algorithm. For example, I have
a string indexed B-tree structure designed for highly-redundant string data (things with lots of common prefixes like http // domain blah blah).
This is vaguely similar to what many string compression systems do ( create a dictionary of common strings ) except that
the common strings are just fixed length chunks as common or uncommon as they may be. So, If I have 100 strings with the same
first 8 bytes ( this only works with ASCII due to some implementation issues ) I only have one long that describes
these 800 bytes ( compression factor 100:1 for the data, saving some space for indexing overhead which I needed anyway).
If you do this carefully ( which I have not ), you could have very compact non-terminal nodes ( your character segment
and left/right pointers) and special additional things for the final pieces in terminal nodes ( such as linked list pointers making sequential access
faster than tree traversal).
The best "compression" is to remove redundancy based on what you know about the data- often this mean storing model
parameters instead of specifics and your algorithm is you model ( like ACELP with vocal tract for speech). A
simple example being something like a sine wave- sure you can and sometimes do make a data file with this
but you can generate it ( more slowly ) algorithmically.
You could probably get source code for something like aspell if you are doing a spell checker etc.
04-26-2009 12:49 AM
04-26-2009 06:34 AM
I've asked the random access question a few times but no answer- skip would work I guess but it is normally implemented
as sequential stream reads. I guess you could just use the file system and make a lot of short files named for their starting
"address" or key.While I am targetting older devices, I've had pretty good luck with the persistent store only. I get the
impression files are treated like sectors or blocks- you are better off buffering yourself that trying to get random access into one.
You can grep the rendered javadocs for "compress" I suppose, but I don't think you will find additional magic methods.
You may be able to get open source zip or gzip and adapt the formats for your purposes but if you have specific data
types and access patterns in mind, this may be a difficult approach ( why not just use jpeg or something suited for a less
related data type ?).
These class docs reference the word "compress"
04-26-2009 09:10 AM - edited 04-26-2009 01:56 PM
To answer the compression question, there is no difference in terms of compression between GZIP and ZLIB, So if you have some data and compress it with both, you will get roughly the same compression. They both use the DEFLATE compression algorithm, which is, a few patented tweaks aside, the same as the original PKZIP compression.
To get more information I suggest you review the GZIP, ZLIB and DEFLATE Compression RFCs, which are as follows:
| find that GZIP is much more frequently implemented and is available in a number of add-ons to things like Web Servers. And most people understand it because they can get a GZIP tool for their PC. As a result, I would recommend GZIP in preference to ZLIB for most applications.
Edit: following marchwka's comments below, I reviewed the original post, rather that just answering the question about the comparison of GZIP and ZLIB.
In this case, if you intend to search the dictionary, I agree that this form of compression is unlikely to be appropriate for your application. You would not easily be able to search the compressed data, nor is it possible to start the decompression at a specific point midway through the file. This is a characteristic of the DEFLATE compression algorithm - it expects to compress and decompress an entire datastream, and does not expect the user to want a part of the datastream in isolation.
it might be possible to design a data structure knowing these restrictions and working around them, but this is not something I would immediately consider.
04-26-2009 09:37 AM
I guess I would also reiterate that a dictionary is no more an archive than it is a picture. Compression relies on finding various
redundancies in the data and often does things like adds dictionaries- I guess you could lift some of the gzip dictionary
stuff but it may be similar in name only to your interests. You can play with zip yourself on small files
and see what happens. I'd be very suprised if this is a good way to go without substantial modification. the reason I mention
the b-tree thing I just did is that it is good for a dictionary in some ways- it is difficult to reconstruct the key, but how often
do you need to ask, "what word am I looking up right now?" LOL. Further, if you actually have a dictionary and know
it to have certain orderings, those are likely to be exploitable both for compression and access.