What is the best way to parse a web page in Ruby?

前端 未结 6 1826
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-24 08:41

I have been looking at XML and HTML libraries on rubyforge for a simple way to pull data out of a web page. For example if I want to parse a user page on stackoverflow how

6条回答
  •  庸人自扰
    2020-12-24 09:10

    Something I ran into trying to do this before is that few web pages are well-formed XML documents. Hpricot may be able to deal with that (I haven't used it) but when I was doing a similar project in the past (using Python and its library's built in parsing functions) it helped to have a pre-processor to clean up the HTML. I used the python bindings for HTML Tidy as this and it made life a lot easier. Ruby bindings are here but I haven't tried them.

    Good luck!

提交回复
热议问题