driver.page_source returns only meta name=“ROBOTS” content=“NOINDEX, NOFOLLOW” using Selenium

若如初见. 提交于 2020-01-21 19:29:05

问题


I want to scrape one website, to get the page content with this code:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Remote("http://adress:4444/wd/hub", DesiredCapabilities.CHROME)
link = 'website_url'
driver.get(link)
s = driver.page_source
print((s.encode("utf-8")))
driver.quit()

this is what receive:

<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">

I also tried a lot of different ways, Luminati, proxy newipnow, phantomjs, but does not work, any suggestions what else i can try to solve this?


回答1:


<meta name="ROBOTS" content="value">

This meta tag conveys the different search engines about the actions they are allowed and not allowed to take on a certain page. This meta tag can be placed anywhere within the <head> and </head> tags.

Note:: As this <meta> tag does not have a site-wide effect it can contain different values on different pages of the same website.

The valid values are:

  • Index (default value)
  • Noindex
  • None
  • Follow
  • Nofollow
  • Noarchive
  • Nosnippet

These values can be combined as well to form the desired valid meta robots tag.

Example:

  • <meta name="robots" content="noindex" />
  • <meta name="robots" content="index,follow" />
  • <meta name="robots" content="index,follow,noarchive" />

content="NOINDEX, NOFOLLOW"

The NOINDEX value conveys the search engines NOT to index the page, so the page should not show up in search results. The NOFOLLOW value conveys the search engines NOT to follow or discover the pages that are LINKED TO on this page.

Web developers adds the NOINDEX , NOFOLLOW meta robots tag on development websites, so the search engines accidentally doesn't start sending traffic to a website that is still under construction.


Why are you seeing?

The reason can be either of the following:

  • You are trying to execute your auomated tests within Development Environment.
  • Development Team have accidentally added this tag to live website.
  • Development Team have forgot to remove it from live websites after going live.

Reference

What is the meaning of the meta name "robots" tag


Outro

Using the robots meta tag



来源:https://stackoverflow.com/questions/57638195/driver-page-source-returns-only-meta-name-robots-content-noindex-nofollow-u

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!