What is the best way to parse a web page in Ruby?

前端未结

关注

 6  1841

佛祖请我去吃肉 2020-12-24 08:41

I have been looking at XML and HTML libraries on rubyforge for a simple way to pull data out of a web page. For example if I want to parse a user page on stackoverflow how

6条回答

庸人自扰 (楼主)

2020-12-24 09:10

Something I ran into trying to do this before is that few web pages are well-formed XML documents. Hpricot may be able to deal with that (I haven't used it) but when I was doing a similar project in the past (using Python and its library's built in parsing functions) it helped to have a pre-processor to clean up the HTML. I used the python bindings for HTML Tidy as this and it made life a lot easier. Ruby bindings are here but I haven't tried them.

Good luck!

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...