01-04-2009 04:32 PM
I'm working on an app that will use a large data file of 10MB. Since it's all text data (the file is a plain .txt file) I guess I should compress it quite a lot. But, a few questions about this:
1) How can I compress/uncompress the data? Any example somewhere? I know nothing about the raw ZIP algorithms, and can't code one directly by myself..
2) If the data is compressed, how can I search inside it? I'd need at least to read a specific line...but I guess I can't unzip everything into memoery, it wold consume too much memory. So how can this be done??
01-04-2009 04:53 PM
Interesting questions - however I think the second might make the first irrelevant.
If you look at the API, the BB supports, out of the box, two compression methods, GZIP and ZLIB. These are actually just variations on the same compression method, which is (a few patentable tweaks aside) basically the same method (DEFLATE) as the standard method used by ZIP. This method works by removing repeated 'strings'. There are numerous other compression algorithms, but if you don't want to write code yourself, you are stuck with this method.
However this method will make it impossible for you to search it without decompressing which presumably removes the value of compressing it in the first place.
Also if you need to search this frequently, then the overhead of decompression might be significant, especially if you did it for each search.
Fortunately, the recent BB devices all have SDCard memory, on which 10MB is not significant . So I would consider putting your data in this and using the FileConnection API to read/search your uncompressed data.
Hope this helps.
01-04-2009 05:16 PM - edited 01-04-2009 05:17 PM
01-05-2009 01:11 AM
For zip data.use this code
byte doZipped(String strData)
ByteArrayOutputStream out = null;
ZLibOutputStream zout = null;
out = new ByteArrayOutputStream();
zout = new ZLibOutputStream(out, false,15,1);
return new String("Error").getBytes();
01-05-2009 08:03 AM
Someone else asked this as have I in the past but what random access facilities are available for large
files? I think someone tried skip which is normally implemented as read() but are there any random access facilities?
Presumably your text file would be read into some data structure designed according to your intended usage.
All the string based compression systems make some kind of dictionary and try to encode based on frequency
of occurence. You may or may not be able to do as well given what you know about your [ single ] file as you
don't need a general compression approach. For example, if your text is very repetitive, you can build your own
"dictionary" which could be the index your hypothetical app needs anyway, and achieve a smaller memory footprint
while giving you a useful data structure.
If your text is static, I'd recommend a code generator or at least some offline processing with things like sed/awk
to reformat your text as suits your program. Even if it has some volatility, these approaches could help.
01-05-2009 08:25 AM
like the replies of peter and archywka. I just like to ask you the kind of data you are going to compress/decompress.? I asking you cause I'd written some algorithms regarding the same kind of problem. But main problem is that those are data dependent. If you feel free you can forward the data to me, so that i can analyse and try to fit it with my data structure or can create a new module for you, if possible.