google-crawlers

How to make dynamic links crawable through google

北城余情 提交于 2020-01-17 04:08:23
问题 I have question/answer website where each question has a link. My problem is how do I fed this link to google ? Should I write link in "site.xml" or "robot.xml" ? What is standard solution to this problem ?? Thanks Amit Aggarwal 回答1: Some advices: First make sure your website is SEO friendly and is crawl-able by search engines. Second make sure to publish your webpage site-map to Google. To do that add your site to Google Webmaster and submit your sitemap (XML, RSS, ATOM feed formats).

How to Crawl a PHP generated image

安稳与你 提交于 2020-01-07 08:29:35
问题 I have a website textscloud.com In this website i make the image with the PHP GD library. Here is a link to a demo: In this page i allow the user to download the image on which text will pe printed. download link is like This download.php file has a header for making the image with PHP GD Library and download the file like this header("Content-type: image/png"); But google didn't crawl these images. Does anyone know the solution? I can't store these image in server. 回答1: You don't mention how

How HTML5 page structure affects W3C validation and SEO [closed]

别等时光非礼了梦想. 提交于 2020-01-06 14:55:57
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 years ago . If we declare page as HTML5, Is it mandatory to follow HTML5 page structure ? . Below are two examples shows the ideal HTML5 page and page which is not following HTML5 structure. But when i validated these two pages using w3c validator, these pages successfully checked as HTML5

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=503 (google scholar ban?)

别等时光非礼了梦想. 提交于 2020-01-02 13:56:13
问题 I am working on crawler and I have to extract data from 200-300 links on Google Scholar. I have working parser which is getting data from pages (on every pages are 1-10 people profiles as result of my query. I'm extracting proper links, go to another page and do it again). During run of my program I spotted above error: org.jsoup.HttpStatusException: HTTP error fetching URL. Status=503, URL=https://ipv4.google.com/sorry/IndexRedirect?continue=https://scholar.google.pl/citations%3Fmauthors

Why do search engine crawlers not run javascript? [closed]

血红的双手。 提交于 2019-12-30 08:09:03
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I have been working with some advanced javascript applications using a lot of ajax requests to render my page. To make the applications crawlable (by google), I have to follow https://developers.google.com/webmasters/ajax-crawling/?hl=fr . This tells us to do something like: redesigning our links, creating html

Why do search engine crawlers not run javascript? [closed]

こ雲淡風輕ζ 提交于 2019-12-30 08:08:03
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I have been working with some advanced javascript applications using a lot of ajax requests to render my page. To make the applications crawlable (by google), I have to follow https://developers.google.com/webmasters/ajax-crawling/?hl=fr . This tells us to do something like: redesigning our links, creating html

How robots.txt file should be properly written for subdomains?

China☆狼群 提交于 2019-12-25 05:21:54
问题 Can someone explain me how should i write a robots.txt file if i want that all crawlers index root and some specific subdomains User-agent: * Allow: / Allow: /subdomain1/ Allow: /subdomain2/ Is this right? And where should i put it? In the root (public_html) folder or in each subdomain folder? 回答1: There is no way to specify rules for different subdomains within a single robots.txt file. A given robots.txt file will only control crawling of the subdomain it was requested from. If you want to

How robots.txt file should be properly written for subdomains?

我是研究僧i 提交于 2019-12-25 05:21:46
问题 Can someone explain me how should i write a robots.txt file if i want that all crawlers index root and some specific subdomains User-agent: * Allow: / Allow: /subdomain1/ Allow: /subdomain2/ Is this right? And where should i put it? In the root (public_html) folder or in each subdomain folder? 回答1: There is no way to specify rules for different subdomains within a single robots.txt file. A given robots.txt file will only control crawling of the subdomain it was requested from. If you want to

Verifying Googlebot in Rails

泪湿孤枕 提交于 2019-12-24 15:51:08
问题 I am looking to implement First Click Free in my rails application. Google has this information on how to verify a if a googlebot is viewing your site here. I have been searching to see if there is anything existing for Rails to do this but I have been unable to find anything. So firstly, does anyone know of anything? If not, could anyone point me in the right direction of how to go about implementing what they have suggested in that page about how to verify? Also, in that solution, it has to

Does Google's crawler index asynchronously loaded elements?

自古美人都是妖i 提交于 2019-12-23 13:39:46
问题 I've built some widget for websites which is asynchronously loaded after the page is loaded: <html> <head>...</head> <body> <div>...</div> <script type="text/javascript"> (function(){ var ns = document.createElement("script"); ns.type = "text/javascript"; ns.async = true; ns.src = "http://mydomain.com/myjavascript.js"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(ns, s); })(); </script> </body> </html> Is there anyway to notify Google's crawler to index the