Parsing web pages

后端 未结 3 1427
伪装坚强ぢ
伪装坚强ぢ 2021-01-07 07:14

I have a question about parsing HTML pages, specificaly forums, i want to parse a forum or thread containing certain post criterias, i havent defined the algorithm yet, si

3条回答
  •  无人及你
    2021-01-07 08:04

    1 / yes

    2 / Use some compact language like python or ruby for prototyping.

    • For python there is a neat library for HTML/XML parsing called beautifulsoup

    • For ruby, you could try: nokogiri or hpricot

    3 / A Java tool to consider: htmlparser

    4 / If you are interested only in some particular text or some special classes, a regular expression might be sufficient. But as soon as you want to dig deeper into the structure of the content, you'll need some kind of model to hold your data, and hence a parser, which, in the best case, can cope with the occuring incosistencies of real world html.

提交回复
热议问题