09-18-2013 12:27 PM
I'm having difficulty parsing an XML feed with an ampersand '&' between two elements:
<title>I am hungry & tired</title>
QXmlStreamReader reports the following error:
Expected '#' or '[a-zA-Z]', but got ' '.
It then spins forever because it can't continue parsing the file.
Has anyone worked around this problem? It certainly seems to pop up here and there on stackoverflow and the nokia developer site. No solutions yet though.
Solved! Go to Solution.
09-18-2013 01:25 PM - edited 09-18-2013 01:25 PM
09-18-2013 01:36 PM - edited 09-18-2013 01:38 PM
Are you in control of the feed?
If yes then search the string and replace with & when encoding/writing it out.
If no then either report invalid XML or write your own XML reader.
09-18-2013 01:49 PM
Doing a quick search on google people have written XML readers that can cope with non-compliance...
One popular example (in Java though)
09-18-2013 01:52 PM
I could write a new XML parser but that stinks of effort. Any good programmer knows effort just means more bugs
I'm going to intercept the data between the time it's downloaded and parsed and replace all non-compliant characters, this seems like the easier way of going about it.
09-18-2013 02:03 PM - edited 09-18-2013 02:04 PM
I was going to suggest that if you had the option but even this isn't simple as you would need to look for valid and invalid replacements.
i.e don't replace & with &amp;
... and there are lots of valid encodings you would need to check; it's not just a find and replace.
In terms of effort there is probably not a lot in it.
The no effort solution is make the user do the work and just report "Invalid XML".
09-18-2013 02:10 PM
09-19-2013 07:48 PM
My RSS/Atom reader copes with this type of thing by loading the raw XML with a QNetworkRequest, copying the reply into a QByteArray, substituting all the invalid/illegal characters and codes in the QByteArray, then loading the QbyteArray into an XmlDataAccess with XmlDataAccess::loadFromBuffer().
Don't know if this is feasible in your situation.