How can I see all notes of a Tumblr post from Python?

后端 未结 4 1181
旧巷少年郎
旧巷少年郎 2020-12-19 03:27

Say I look at the following Tumblr post: http://ronbarak.tumblr.com/post/40692813…
It (currently) has 292 notes.

I\'d like to get all the above n

4条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-19 04:14

    Like Fabio implies, it is better to use the API.

    If for whatever reasons you cannot, then the tools you will use will depend on what you want to do with the data in the posts.

    • for a data dump: urllib will return a string of the page you want
    • looking for a specific section in the html: lxml is pretty good
    • looking for something in unruly html: definitely beautifulsoup
    • looking for a specific item in a section: beautifulsoup, lxml, text parsing is what you need.
    • need to put the data in a database/file: use scrapy

    Tumblr url scheme is simple: url/scheme/1, url/scheme/2, url/scheme/3, etc... until you get to the end of the posts and the servers just does not return any data anymore.

    So if you are going to brute force your way to scraping, you can easily tell your script to dump all the data on your hard drive until, say the contents tag, is empty.

    One last word of advice, please remember to put a small sleep(1000) in your script, because you could put some stress on Tumblr servers.

提交回复
热议问题