Mark Pilgrim on parsing Bad RSS

Mark Pilgrim's new article, Parsing RSS At All Costs, is up at xml.com.

On average, at any given time, about 10% of all RSS feeds are not well-formed XML. Some errors are systemic, due to bugs in publishing software. It took Movable Type a year to properly escape ampersands and entities, and most users are still using old versions or new versions with old buggy templates. Other errors are transient, due to rough edges in authored content that the publishing tools are unable or unwilling to fix on the fly.

I think using a real xml parser and using some regexp's to fix up common problems might be a better approach although Mark's ultra-liberal RSS parser is short enough that maybe it's a moot point.

07:25 AM, 23 Jan 2003 by Jeff Davis Permalink | Comments (0)

XML

Archive

January 2003
S M T W T F S
     
7  8  9  10  11 
12  13  14  15  16  17  18 
19  20  21  22  23  24  25 
26  27  28  29  30  31   
April 2005
March 2005
February 2005
June 2004
May 2004
April 2004
March 2004
February 2004
December 2003
November 2003
October 2003
September 2003
July 2003
June 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002

Syndication Feed

XML

Recent Comments

  1. Mark Aufflick: I've seen an md5 collision!
  2. Ashok Argent-Katwala: Parents
  3. Jeff Davis: parent selectors...
  4. Ashok Argent-Katwala: Named anchors
  5. Jeff Davis: Works vs. head (5.2) for openacs
  6. Carl Robert Blesius: PostgreSQL 8.0 + OpenACS?
  7. Jeff Davis: Shockingly it is in fact "grout"
  8. Jade Rubick: So I wasn't the only one!
  9. Jarkko Laine: Contrast
  10. Ashok Argent-Katwala: Car