07-14-2010 08:37 AM
Hi,
I have a RSS url and i am parsing the url using SAX parser. The url also consists of HTML content within the xml data which has the image paths etc. The problem is i cant find a way to parse the html content and only display the attributes.
I have searched in the forums but didnt get a conclusive answers.Can anyone plz provide me some samples or tutorials of how to do it...
07-14-2010 08:57 AM
Sorry don't know the RSS format, but I think there are two issues:
a) Extracting the html from the XML. Given that they both use similar tags, how are you doing this?
b) Parsing the html once you have extracted it. Given that html is typically not well formed, at least not well enough for XML parsers, this can be a problem.
Which if these two are you having problems with?
07-14-2010 09:09 AM
Frankly speaking i have problems with both because i cant figure out a way to extract the html content and parsing it.
07-14-2010 10:00 AM - last edited on 07-14-2010 11:18 AM
OK, lets start with that.
Basically if you are embedding HTML (or even XML) content inside an XML documentation, you have to encode the content in some way before including it, so that it can be extracted without ambiguity. So for example, you have to remove double quotes in any text that you include an attribute.
This encoding should, I believe, be done automatically by whatever is writing the XML, though I have done it manually for hand crafted XML. Documentation on it should be found here:
http://en.wikipedia.org/wiki/XML#Characters_and_es
Having encoded your data (or attribute), in theory it should not confuse the XML, and you can parse it correctly. When you get your characters (or attributes) from the parser, it should have converted the escaped characters back to their original form. So that leads on to problem 2, well formed HTML.
Anyway, does this help?
07-14-2010 10:18 AM
To be honest I dont know about RSS format.
if you want parser a XML , you can use this:
public void parse(){
StreamConnection conn=null;
String URL = "www...../";
String filename = "file.xml";
String fName=URL+filename;
String _node=null;
String _element=null;
String _node2=null;
String _element2=null;
String _node3=null;
String _element3=null;
String _node4=null;
String _element4=null;
try{
conn=(StreamConnection) Connector.open(fName, Connector.READ_WRITE);
//next few lines creates variables to open a
//stream, parse it, collect XML data and
//extract the data which is required.
//In this case they are elements,
//node and the values of an element
DocumentBuilderFactory docBuilderFactory= DocumentBuilderFactory. newInstance();
DocumentBuilder docBuilder= docBuilderFactory.newDocumentBuilder();
docBuilder.isValidating();
Document doc = docBuilder.parse(conn.openInputStream());
doc.getDocumentElement ().normalize ();
NodeList list1=doc.getElementsByTagName("name of the attribute ");
NodeList list2=doc.getElementsByTagName("nickname");
NodeList list3=doc.getElementsByTagName("email");
NodeList list4=doc.getElementsByTagName("photo");
for (int i=0;i<list1.getLength();i++){
Node value=list1.item(i).getChildNodes().item(0);
Node value2=list2.item(i).getChildNodes().item(0);
Node value3=list3.item(i).getChildNodes().item(0);
Node value4=list4.item(i).getChildNodes().item(0);
_node=list1.item(i).getNodeName();
_element=value.getNodeValue();
_node2=list2.item(i).getNodeName();
_element2=value2.getNodeValue();
_node3=list3.item(i).getNodeName();
_element3=value3.getNodeValue();
_node4=list4.item(i).getNodeName();
_element4=value4.getNodeValue();
.....
pd: Sorry for my english.
07-15-2010 02:51 AM
Hi ...
I am able to parse the xml content well... But the main problem is with the HTML data that is embedded in it. So cant really find a way to parse ir since the HTML tags are not always well formatted. Can anyone specify a way to do it???
07-15-2010 03:07 AM
parsing html is not supported through APIs and i don't know any libraries for it.
you can display it using a browserfield(1/2) or the browser directly.
other than that you can only analyze it manually - obviously a pain, especially with much of html being malformed.
07-15-2010 04:46 AM
Showing the html content in Broser is not a solution to my problem because i have to pass the extracted values to another class. Can i use any 3rd party libraries to achieve my task???
07-15-2010 04:53 AM
i don't know of any, and previous posters with similar problems have never found one, so i assume there is no such library for j2me
07-15-2010 05:16 AM - last edited on 07-15-2010 05:18 AM
Agree with Simon, I would look for a lightweight HTML parser.
Alternatively you might find this useful, not for the display but for the extraction:
Edit: This time with the correct link....