It just keeps mutating! Ben Hammersley had the idea and implemented it in Perl, I made a ColdFusion version, Robby Lansaw came up with not one but two improvements (see the comments on my post), and now Bill Rawlinson has made an uber-version, complete with multiple output formats.
I’m still thinking of how to make a crawling version… I played around with the W3C Link Validator – unfortunately it doesn’t return xml. I did some screen scraping and parsing but my biggest handicap is that I suck at regular expressions. I have read some of Ben Forta’s Regex book but it’s not working yet. If anyone wants to give me a hand by showing how I can read a string and stick everything that’s between h2 tags into an array, please feel free to add a comment. If you’d prefer to make me work it out myself, that’s OK too, but a hint wouldn’t hurt!