Welcome to the Official BlackBerry® Support Community Forums. This is your resource to discuss support topics with your peers, and learn from each other. New to the forum? Please visit the ‘Getting Started’ link below.
inside custom component

Java Development

Reply
Regular Contributor
rahul_kalidindi
Posts: 67
Registered: 06-30-2010
My Carrier: Airtel

Parsing HTML content from RSS url

Hi,

 

I have a RSS url and i am parsing the url using SAX parser. The url also consists of HTML content within the xml data  which has the image paths etc. The problem is i cant find a way to parse the html content and only display the attributes.

 

I have searched in the forums but didnt get a conclusive answers.Can anyone plz provide me some samples or tutorials of how to do it...

Please use plain text.
Developer
peter_strange
Posts: 14,611
Registered: 07-14-2008

Re: Parsing HTML content from RSS url

Sorry don't know the RSS format, but I think there are two issues:

 

a) Extracting the html from the XML.  Given that they both use similar tags, how are you doing this?

b) Parsing the html once you have extracted it.  Given that html is typically not well formed, at least not well enough for XML parsers, this can be a problem.

 

Which if these two are you having problems with?

Please use plain text.
Regular Contributor
rahul_kalidindi
Posts: 67
Registered: 06-30-2010
My Carrier: Airtel

Re: Parsing HTML content from RSS url

Frankly speaking i have problems with both because i cant figure out a way to extract the html content and parsing it.

Please use plain text.
Developer
peter_strange
Posts: 14,611
Registered: 07-14-2008

Re: Parsing HTML content from RSS url

[ Edited ]

OK, lets start with that.

 

Basically if you are embedding HTML (or even XML) content inside an XML documentation, you have to encode the content in some way before including it, so that it can be extracted without ambiguity.  So for example, you have to remove double quotes in any text that you include an attribute.

 

This encoding should, I believe, be done automatically by whatever is writing the XML, though I have done it manually for hand crafted XML.  Documentation on it should be found here:

 

http://en.wikipedia.org/wiki/XML#Characters_and_escaping

 

Having encoded your data (or attribute), in theory it should not confuse the XML, and you can parse it correctly.  When you get your characters (or attributes) from the parser, it should have converted the escaped characters back to their original form.  So that leads on to problem 2, well formed HTML.

 

Anyway, does this help?

Please use plain text.
Developer
carlostheone
Posts: 149
Registered: 01-20-2010
My Carrier: Telefonica

Re: Parsing HTML content from RSS url

To be honest I dont know about RSS format.

 

if you want parser a XML , you can use this:

 

 

public void parse(){	
 StreamConnection conn=null;
 String URL = "www...../";
 String filename = "file.xml";
 String fName=URL+filename;
 String _node=null;
 String _element=null;
 String _node2=null;
 String _element2=null;
 String _node3=null;
 String _element3=null;
 String _node4=null;
 String _element4=null;
		
 try{
			  
 conn=(StreamConnection) Connector.open(fName, Connector.READ_WRITE);
			
			//next few lines creates variables to open a
			//stream, parse it, collect XML data and
			//extract the data which is required.
			//In this case they are elements,
			//node and the values of an element
 DocumentBuilderFactory docBuilderFactory= DocumentBuilderFactory.   newInstance(); 

 DocumentBuilder docBuilder= docBuilderFactory.newDocumentBuilder();
			docBuilder.isValidating();
 Document doc = docBuilder.parse(conn.openInputStream());
 doc.getDocumentElement ().normalize ();
 NodeList list1=doc.getElementsByTagName("name  of the attribute ");  
 NodeList list2=doc.getElementsByTagName("nickname");
 NodeList list3=doc.getElementsByTagName("email");
 NodeList list4=doc.getElementsByTagName("photo");
			
  for (int i=0;i<list1.getLength();i++){
	Node value=list1.item(i).getChildNodes().item(0);
	Node value2=list2.item(i).getChildNodes().item(0);
	Node value3=list3.item(i).getChildNodes().item(0);
	Node value4=list4.item(i).getChildNodes().item(0);
	_node=list1.item(i).getNodeName();
	_element=value.getNodeValue();
	_node2=list2.item(i).getNodeName();
	_element2=value2.getNodeValue();
	_node3=list3.item(i).getNodeName();
	_element3=value3.getNodeValue();
	_node4=list4.item(i).getNodeName();
	_element4=value4.getNodeValue();
	          
.....

 

 

pd: Sorry for my english.

Please use plain text.
Regular Contributor
rahul_kalidindi
Posts: 67
Registered: 06-30-2010
My Carrier: Airtel

Re: Parsing HTML content from RSS url

Hi ...

 

I am able to parse the xml content well... But the main problem is with the HTML data that is embedded in it. So cant really find a way to parse ir since the HTML tags are not always well formatted. Can anyone specify a way to do it???

Please use plain text.
Developer
simon_hain
Posts: 10,780
Registered: 07-29-2008
My Carrier: O2 Germany

Re: Parsing HTML content from RSS url

parsing html is not supported through APIs and i don't know any libraries for it.

you can display it using a browserfield(1/2) or the browser directly.

other than that you can only analyze it manually - obviously a pain, especially with much of html being malformed.

----------------------------------------------------------
feel free to press the like button on the right side to thank the user that helped you.
please mark posts as solved if you found a solution.

peter_strange wrote:
"This process should happen traumatically for you in both JDE and Eclipse."
Please use plain text.
Regular Contributor
rahul_kalidindi
Posts: 67
Registered: 06-30-2010
My Carrier: Airtel

Re: Parsing HTML content from RSS url

Showing the html content in Broser is not a solution to my problem because i have to pass the extracted values to another class. Can i use any 3rd party libraries to achieve my task???

Please use plain text.
Developer
simon_hain
Posts: 10,780
Registered: 07-29-2008
My Carrier: O2 Germany

Re: Parsing HTML content from RSS url

i don't know of any, and previous posters with similar problems have never found one, so i assume there is no such library for j2me

----------------------------------------------------------
feel free to press the like button on the right side to thank the user that helped you.
please mark posts as solved if you found a solution.

peter_strange wrote:
"This process should happen traumatically for you in both JDE and Eclipse."
Please use plain text.
Developer
peter_strange
Posts: 14,611
Registered: 07-14-2008

Re: Parsing HTML content from RSS url

[ Edited ]

Agree with Simon, I would look for a lightweight HTML parser.

 

Alternatively you might find this useful, not for the display but for the extraction:

 

Edit: This time with the correct link....

http://supportforums.blackberry.com/t5/Java-Development/Simple-HTMLTextField-implementation/m-p/4543...

Please use plain text.