发表新帖

发表新帖

Scrapy, hash tag on URLs

后端未结

关注

 3  1005

半阙折子戏 2020-12-21 05:10

I\'m on the middle of a scrapping project using Scrapy.

I realized that Scrapy strips the URL from a hash tag to the end.

Here\'s the output from the shell:<

3条回答

情书的邮戳 (楼主)

2020-12-21 06:00

This isn't something scrapy itself can change--the portion following the hash in the url is the fragment identifier which is used by the client (scrapy here, usually a browser) instead of the server.

What probably happens when you fetch the page in a browser is that the page includes some JavaScript that looks at the fragment identifier and loads some additional data via AJAX and updates the page. You'll need to look at what the browser does and see if you can emulate it--developer tools like Firebug or the Chrome or Safari inspector make this easy.

For example, if you navigate to http://twitter.com/also, you are redirected to http://twitter.com/#!/also. The actual URL loaded by the browser here is just http://twitter.com/, but that page then loads data (http://twitter.com/users/show_for_profile.json?screen_name=also) which is used to generate the page, and is, in this case, just JSON data you could parse yourself. You can see this happen using the Network Inspector in Chrome.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题