03-20-2013 05:13 PM
I'm searching for an easy way to convert the pubdate from an rss source to a QDateTime object in order to reformat it later on in my view.
How I do it at the moment is as follows
QString pubDate("Wed, 20 Mar 2013 17:17:00 GMT"); pubDate = pubDate.remove(" GMT", Qt::CaseInsensitive); QDateTime pub = QDateTime::fromString(pubDate, "ddd, dd MMM yyyy hh:mm:ss");
In the example, the pubdate is hardcoded. But in my program it is loaded from an RSS feed. Is there an easier way to do this? Especially the pubDate.remove() part is not really a clean way of working offcourse...
03-21-2013 12:35 AM
Sorry to be the bearer of bad news, but converting RSS pubDates to real dates is a big ugly can of worms. I have written news reader apps for the Windows RainMeter utility, and for BB10, and I found RSS date processing to be one of the most frutrating parts.
The issue is that RSS dates are supposed to be in a slightly modified RFC1123 format, but in the real world many publishers cheat and use their own date format. RFC1123 specifies a format like:
Thu, 21 Mar 2013 04:02:10 GMT
...and mandates the time zone ALWAYS be GMT. Unfortunately, while a few publishers follow this rule, many (most?) use one of two other formats instead.
The first deviation from RFC1123 is when the publisher uses a time zone other than GMT. The big problem with this is that time zone abbreviations are not unique around the world, with many duplications. For instance, AMT could be "Amazon Time" which is UTC -4, or "Armenia Time" which is UTC +4, a full 8 hour disparity. Likewise, CDT could mean "Central Daylight Time" in North America, which is UTC -5, "Central Daylight Time" in Australia, which is UTC +10:30, or even "Cuba Daylight Time" (UTC -4). Many of the biggest RSS publishers cheat this way by specifying non-GMT time formats. I have not found a library function on any development platform which can parse an RFC1123 date that deviates from GMT. For more info on time zone abbreviations see this link.
If this isn't bad enough, many other publishers decide to specify the offset from UTC using the email offset syntax (i.e. -0500). The nice thing about this format is that it is unambiguous compared to time zone abbreviations, but again there are no library functions that will parse a pseudo RFC1123 date like this.
For these reasons I have had to roll my own parser for pubDates, which correctly resolves the date in almost all cases. For those situations where a publisher uses an ambiguous time zone abbreviation, and my parser gets the conversion wrong, I provide my users with a "UTC offset override" feature which just reads the raw datetime, ignores the time zone suffix, and instead uses the user specified offset.
Just in case you are planning to handle Atom feeds too, things are a little easier there, since the Atom format is much more rigidly specified, but some publishers still manage to deviate from the RFC822 format it mandates. RFC822 is nice since it is already in sortable format:
...where the Z signifies UMT, and the T is optional but usually included (if not, it is replaced with a space). Almost all Atom feeds I've seen follow the rules and for a while I thought I could just used the library date parsers that handle RFC822 well, but after a while I began to see a few that omitted the Z (which is supposed to signify local time instead of UTC) and instead added an email style UTC offset. As you might guess I had to alter my algorithm to handle this.
All-in-all, date handling with news feeds is nothing short of ugly-Betty, but by inspecting as many different feeds as you can lay your hands on to learn what publishers are doing, and then parsing it yourself, it can be done in most cases. Just make sure you test... test... test... test...
03-21-2013 02:22 AM
Thanks for the information. The good thing in my case is that I only read an RSS feed form 1 datasource. And luckily they always use the format I specified above. And I guess that's the only correct format for RSS?
I'm not building an RSS reader (and when reading your post, I'm happy with that ). It's just that I need that specific data from the RSS feed in my bigger application.
Thanks for the info though, it will come in handy one day, I'm sure of that.