Mark Pilgrim on parsing Bad RSS
Mark Pilgrim's new article, Parsing RSS At All Costs, is up at xml.com.
On average, at any given time, about 10% of all RSS feeds are not well-formed XML. Some errors are systemic, due to bugs in publishing software. It took Movable Type a year to properly escape ampersands and entities, and most users are still using old versions or new versions with old buggy templates. Other errors are transient, due to rough edges in authored content that the publishing tools are unable or unwilling to fix on the fly.
I think using a real xml parser and using some regexp's to fix up common problems might be a better approach although Mark's ultra-liberal RSS parser is short enough that maybe it's a moot point.
07:25 AM, 23 Jan 2003 by Jeff Davis Permalink | Comments (0)
Archive
| January 2003 | ||||||
| S | M | T | W | T | F | S |
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 | |
March 2005
February 2005
June 2004
May 2004
April 2004
March 2004
February 2004
December 2003
November 2003
October 2003
September 2003
July 2003
June 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
