08-23-2010 10:56 PM
Hey. I'd like to learn best practices for code for removing html tags and converting ERs like & to & or their equivalent chars. My goal is legible text from an html page.
- First, if there is a utility I'm not aware of which would be best to do this
- Second, which is faster, using e.g.
indexOf("<div", index) == 0 or
substring(index, index + 4).equals("<div")
- Third, anything which reduces work for the processor would be great
I haven't yet gotten away from the basic way of jumping from one ampersand/lessthan to the next and checking over and over for html tags or " and other ERs. It's inefficient. Rather than start from my current effort, pls give advice on how to redo this.
08-24-2010 01:33 AM
08-24-2010 02:19 AM
08-24-2010 08:05 AM
Just a quick suggestion that may (or may not) help.
Why not parse the information from a webserver (there are plenty of free hosts) and then pull the data from their already parsed? Reduces data traffic as well if you are only pulling the information you need.
I've wrote a few parsers for BlackBerry and they work but they do take few seconds depending on the size of a file.
08-24-2010 05:09 PM
08-24-2010 05:17 PM
But if they constantly change, would it not be better to update the web app that strips excess data rather than having to release a App Update to fix changes to the RSS Feed?
I could have just completely mis-understood you there but just curious.
I use a PHP based web app and have the web app format the information it pulls from a database before it is read by the BB, that way if I wanted to re-format the info in any way I could do so from the web app.
Again this solution may not work for yourself but just thought it would be a good suggestion.
As for writing your own I can suggest the following and hope it works out.
Try reading character by character,
if you encounter an < then start ignoring characters until you reach a >
continue adding characters to the stringbuffer and repeat until you reach the end of a document.
Hope this helps.
08-24-2010 05:56 PM
08-24-2010 10:26 PM
The compiler basically converts String addition to StringBuffer appends. If you're doing all your appending in one expression, there's no difference in speed. On the other hand, if you're doing something like this:
String x = a + b;
String y = x + c;
String z = y + d;
then using your own explicit StringBuffer is more efficient. The reason is that the compiler will create a new StringBuffer for each of the above statements.
I should point out that performance gains at this level are going to be miniscule unless you're doing this thousands of times (such as in a loop of some sort).