I have been looking at XML and HTML libraries on rubyforge for a simple way to pull data out of a web page. For example if I want to parse a user page on stackoverflow how
Something I ran into trying to do this before is that few web pages are well-formed XML documents. Hpricot may be able to deal with that (I haven't used it) but when I was doing a similar project in the past (using Python and its library's built in parsing functions) it helped to have a pre-processor to clean up the HTML. I used the python bindings for HTML Tidy as this and it made life a lot easier. Ruby bindings are here but I haven't tried them.
Good luck!