Do Google's crawlers interpret Javascript? What if I load a page through AJAX? [closed]

不羁岁月 提交于 2019-11-26 18:56:47
jldupont

Updated: From the answer to this question about "Ajax generated content, crawling and black listing" I found this document about the way Google crawls AJAX requests which is part of a collection of documents about Making AJAX Applications Crawlable.

In short, it means you need to use <a href="#!data">...</a> rather than <a href="#data">...</a> and then supply a real server-side answer to the URL path/to/path?_escaped_fragment_=data.

Also consider a <link/> tag to supply crawlers with a hint to SEO-friendly content. <link rel="canonical"/>, which this article explains a bit, is a good candidate

Note: I took the answer from: https://stackoverflow.com/questions/10006825/search-engine-misunderstanting/10006925#comment12792862_10006925 because it seems I can't delete mine here.

Despite the answers above, apparently it does interpret JavaScript, to an extent, according to Matt Cutts:

"For a while, we were scanning within JavaScript, and we were looking for links. Google has gotten smarter about JavaScript and can execute some JavaScript. I wouldn't say that we execute all JavaScript, so there are some conditions in which we don't execute JavaScript. Certainly there are some common, well-known JavaScript things like Google Analytics, which you wouldn't even want to execute because you wouldn't want to try to generate phantom visits from Googlebot into your Google Analytics".

(Why answer an answered question? Mostly because I just saw it because of a duplicate question posted today, and didn't see this info here.)

Actually... Google does have a solution for crawling Ajax applications...

http://code.google.com/web/ajaxcrawling/docs/getting-started.html

What I do in this situation is always initially populate the page with content based upon the default parameters of whatever the Ajax call is doing. Then I only use the ajax javascript to do updates to the page.

As other answers say, Google's crawler (and I believe those of other search engines) does not interpret Javascript -- and you should not try to differentiate by user-agent or the like (at the risk of having your site downgraded or blocked for presenting different contents to users vs robots). Rather, do offer some (perhaps minimal) level of content to visitors that have Javascript blocked for whatever reason (including the cases where the reason is "being robots";-) -- after all, that's the very reason the noscript tag exists... to make it very, very easy to offer such "minimal level of content" (or, more than minimal, if you so choose;-) to non-users of Javascript!

Web crawlers have a difficult time with ajax and javascript that dynamically loads content. This site has some ideas that show you how to help google index your site http://www.softwaredeveloper.com/features/google-ajax-play-nice-061907/

If you make your pages such that they will work with OR without javascript (i.e. fall back to using frames or standard GET / POST requests to the server if javascript fails, either automatically, or via a "display as plain html" link ), it will be much easier for search engines to crawl the page.

It makes sense for them not to crawl "dynamic" content - because it is just that...dynamic.

My understanding is that in most situations, Google does not crawl the client-side-dynamic-content.

Now It looks ike Google bot is not limited to simple lynx like browser.

Google bot tries to grab the Humanly visible and Humanly contrasting text to give importance in different sectors of page. So it renders the page with a Layout Engine just like another browser like FF or Chrome have.

It might even have v8 Javascript Engine support. and the bot might load the page and wait till dom is ready and may even wait for few seconds for the page to come to a stable view. and then crop the contrasting text.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!