google-search

Why is Google indexing Friendly URL mixed with hyphens and %20?

两盒软妹~` 提交于 2019-12-10 11:03:28
问题 I developed a blog from scratch and things has gone great so far. I finally got around to writing my first post/article, and I've been waiting for Google to index this specific page to make sure there aren't any issue with it. Well, google is currently indexing the same page 4 times, I have (with the help of users from stackoverflow) a mod_rewrite on my htaccess to rewrite all urls to hyphens coming from a specific file (article.php). My currently article page stands as followed. example: www

GAE development server keep full text search indexes after restart?

不羁岁月 提交于 2019-12-10 03:12:40
问题 Is there anyway of forcing the GAE dev server to keep full text search indexes after restart? I am finding that the index is lost whenever the dev server is restarted. I am already using a static datastore path when I launch the dev server (the --datastore_path option). 回答1: This functionality was added a few releases ago (in either 1.7.1 or 1.7.2, I think). If you're using an SDK from the last few months it should be working. You can try explicitly setting the --search_indexes_path flag on

Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-09 15:14:28
问题 Is there an easy way to scrape Google and write the text (just the text) of the top N (say, 1000) .html (or whatever) documents for a given search? As an example, imagine searching for the phrase "big bad wolf" and downloading just the text from the top 1000 hits -- i.e., actually downloading the text from those 1000 web pages (but just those pages, not the entire site). I'm assuming this would use the urllib2 library? I use Python 3.1 if that helps. 回答1: The official way to get results from

Google Search API site limit

谁都会走 提交于 2019-12-09 09:16:37
问题 According to the Google custom search API's docs: http://code.google.com/apis/customsearch/docs/start.html#sites there is a limit of up to 5000 sites that you can search. This is pretty lame. Is there any way around this so that I can search the entire web using Google's results? Also if you include a bunch of url patterns that matches greater than 5000 websites, how would the API pick and choose which sites to include and which to exclude? 回答1: This is for a custom search, not a normal

Disable styling on Google Search with Selenium FirefoxDriver

怎甘沉沦 提交于 2019-12-09 05:56:27
问题 The following code disables stylesheets and images on a page loaded with Selenium Firefox webdriver: from selenium import webdriver firefox_profile = webdriver.FirefoxProfile() firefox_profile.set_preference('permissions.default.stylesheet', 2) firefox_profile.set_preference('permissions.default.image', 2) driver = webdriver.Firefox(firefox_profile) driver.get('http://www.stackoverflow.com/') driver.close() It works fine with stackoverflow.com , facebook.com , yahoo.com ... but interestingly

How does Google hide HTML source of search results?

青春壹個敷衍的年華 提交于 2019-12-09 03:41:57
问题 When you try to view the source code of a Google search results page you just see a bunch of javascript code instead of readable text. How does Google do that? I have searched through the web but couldn't find a good explanation, only thing I found was this: http://goo.gl/FIvD6 and it is not really helpful. I am not a web developer but I just got curious. A brief explanation would be nice. Thanks. 回答1: Google builds the DOM with the javascript you noted. It does this for a number of reasons:

How does Google sets HTTP Referrer after a search result click

做~自己de王妃 提交于 2019-12-08 12:54:33
问题 For example, the first search result on this page leads to the older SO question, with the following HTTP request: GET /questions/4402502/how-does-google-set-the-http-referrer-when-someone-clicks-on-a-search-result-lin HTTP/1.1 Host stackoverflow.com Referer https://www.google.ru Note, that: Only the domain is included in the Referer header, no query string. Google is open via HTTPS, while SO is open via plain HTTP - nevertherless, the Referer header is sent by the browser. There are no

Can I search Google with XMLHttpRequest()?

老子叫甜甜 提交于 2019-12-08 06:24:06
问题 Can I search Google with a cross-origin XMHhttpRequest()? var xhr = XMLHttpRequest(); xhr.open("GET", www.google.com/?q=what+you+want+to+search, true); 回答1: Try it: curl -H "Origin: http://domain.com" -X OPTIONS --head https://www.google.com/ This currently gives you: HTTP/1.1 405 Method Not Allowed Content-Type: text/html; charset=UTF-8 Content-Length: 962 Date: Fri, 21 Jun 2013 17:58:45 GMT Server: GFE/2.0 So no, you can't, at least not with their public facing website. There would be an

Using sitelinks search bar google

这一生的挚爱 提交于 2019-12-08 02:44:58
问题 I am following this guide by Google to add a sitelinks search bar on my website. The structured data markup tool shows everything to be correct. But: The search bar hasn't appeared in the search results for my website. Google has indexed the page www.example.com/search?q=%7Bsearch_term_string%7D instead: I have the exact same code as explained in the example, except the site URL of course. What am I doing wrong? Or is this expected behaviour? <script type="application/ld+json"> { "@context":

Will Google see rel=nofollow if it is added by jQuery?

*爱你&永不变心* 提交于 2019-12-08 02:36:06
问题 I'm adding a rel=nofollow attribute to links via jQuery after the page load. Will Google see this attribute? I can't find anything in Google's official documentation. 回答1: Although Google processes JavaScript and can index a lot of dynamic content, there's a special behavior when inserting rel=nofollow dynamically. It was tested[1] and they came up with this result: The nofollow in the DOM did not work (the link was followed, and the page indexed). Why? Because the modification of the a href