Welcome!

Welcome to the official BlackBerry Support Community Forums.

This is your resource to discuss support topics with your peers, and learn from each other.

inside custom component

Java Development

Reply
Highlighted
Developer
Posts: 135
Registered: ‎04-25-2009
My Device: Z30
My Carrier: AT&T

Dispaying String taken from HTML

Hey all,

 

So i have my app. I pull the html code from a webpage. I then search that code with String manipulators to find a specific part of it I want to show in my app. Only issue is the formatting. For instance...

 

Original string taken from HTML:

"European bee-eaters in Málaga province, Andalusia, Spain"

 

This should be shown as:

"European bee-eaters in Málaga province, Andalusia, Spain"

 

So what can I do to the original string to make it appear in a Label component like above ^?

 

Thanks!

Theodore

Developer
Posts: 19,636
Registered: ‎07-14-2008
My Device: Not Specified

Re: Displaying String taken from HTML

Note sure about the ampersand and ";" as I don't think they are standard HTML, but this code is what you need to 'decode' the #nnn back to a single byte:

 

http://dev.telnic.org/trac/browser/apps/blackberry/trunk/blackberry/src/org/not/java/net/URLDecoder....

 

Here is the WiKi that explains its use:

http://en.wikipedia.org/wiki/Percent-encoding

 

I am surprised that the HTML you are being given contains this - I thought that the bytes are usually sent UTF-8 encoded and so this is not needed.

Developer
Posts: 135
Registered: ‎04-25-2009
My Device: Z30
My Carrier: AT&T

Re: Dispaying String taken from HTML

Thanks!

As for that URLDecoder class, do I run that against the entire html text or can u run it on just that snipped I posted? And that *should replace the unicode char to a regular letter?
Developer
Posts: 19,636
Registered: ‎07-14-2008
My Device: Not Specified

Re: Dispaying String taken from HTML

You run it against the snippet you have extracted.

 

But there is more to this.

 

Normally when you request an html page, it should return the data UTF-8 encoded.  The html page tells the Browser this with a meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

 

Now if your web page had a meta tag like this, then it should not need to convert those funny characters in the way you are seeing. 

 

So I think there is something wierd with the way you are asking for this page.  If you asked for it a different way, for example by specifying some header, I suspect you would get back different data that perhaps did not need to be decoded in this way. 

 

So I would look at your complete processing. Dump out all the data you see returned tp the BB program, compare that with the same data you see when a Browser requests it (do a view source' to see the raw data).  Check the bytes you get, and make sure you convert these to a String correctly. 

 

Perhaps you do not need to do this funny processing. 

Developer
Posts: 135
Registered: ‎04-25-2009
My Device: Z30
My Carrier: AT&T

Re: Dispaying String taken from HTML

So that snippet I posted in the begining was taken straight from a browser source view. And compared it to what I got in java, and its the same, and both have that header.

 

So now ill look into that class u provided.

Developer
Posts: 135
Registered: ‎04-25-2009
My Device: Z30
My Carrier: AT&T

Re: Dispaying String taken from HTML

Tried that class, didnt change anything. Any other ideas?

Developer
Posts: 19,636
Registered: ‎07-14-2008
My Device: Not Specified

Re: Displaying String taken from HTML

Yes.  What I told you was complete tosh.....  Ignore it.

 

You are seeing Unicode code points encoded in HTML.  For a description see this:

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

 

This is actual standard XML encoding.  So the question is how do you decode it?

 

I have been looking round for some code to point you at, but can't find anything and now have to do something else, so may be some time.  So I thought I would pass this on so you can have a look.

 

You could write a variation on the code I have already pointed you at, have it look for ";#" and then extract the identified number and convert it to a character - then append the character.

 

But I'm sure someone will find an official way to do it.