search-engine

High level explanation of Similarity Class for Lucene?

扶醉桌前 提交于 2019-12-04 05:04:25
Do you know where I can find a high level explanation of Lucene Similarity Class algorithm. I will like to understand it without having to decipher all the math and terms involved with searching and indexing. Lucene's built-in Similarity is a fairly standard "Inverse Document Frequency" scoring algorithm. The Wikipedia article is brief, but covers the basics. The book Lucene in Action breaks down the Lucene formula in more detail; it doesn't mirror the current Lucene formula perfectly, but all of the main concepts are explained. Primarily, the score varies with number of times that term occurs

Getting more search results per page via URL

送分小仙女□ 提交于 2019-12-04 04:32:37
I've been writing a program which extracts data from web searches. To get more data, I'd ideally like to extract more results per query through a script (let's say 100 or so). My question is, is there a way to modify the URL for Google, Yahoo, or Bing (preference in that order) so that I can get more than 10 results per query? For Google, appending &num=99 used to work at one point but no longer works :( I saw a similar append of &count=50 but that didn't work on any of the search engines either. The reason num=99 doesn't work for Google is because the num parameter's actual value isn't used,

solr vs xapian: which one gives you the most meaningful results?

房东的猫 提交于 2019-12-04 04:23:57
I am currently using whoosh to dev a website, and I'll need to choose something more powerful once the website will be in production. If anyone of you used both of these engines, which one gave you the most meaningful results one the long road? Rui Carneiro Solr is the best option. Its well documented and the community is huge. Almost a year ago I benchmarked Xapian vs Solr: My dataset had +8000 emails: Solr index time: 3s index size: 5.2mb Xapian index time: 30s index size: 154mb Another great reading about benchmarks between Xapian and Solr is this document: Cross-instance Search System -

Regular expression to detect the search engine and search words

时光怂恿深爱的人放手 提交于 2019-12-03 21:44:44
I need to detect search engines that refers to my website. Since every search engine has different query strings for searching(e.g. google uses 'q=', yahoo uses 'p=') I created a database for search engines with their url regex patterns. As an example: http://www.google.com/search?q=blabla&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-GB:official&client=firefox-a the regex I created for google is: (http:)(\\/)(\\/)(www)(\\.)(google)(\\.).*(\\/)(search).*(&q=|\\?q=).* (I am a newbie on regex, but so far it works) This detects that the url belongs to Google. My problem is that I need to extract the

Do I need to use http redirect code 302 or 307?

浪尽此生 提交于 2019-12-03 19:28:18
问题 Suppose I have a page on my website to show media releases for the current month http://www.mysite.com/mediareleases.aspx And for reasons which it's mundane to go into*, this page MUST be given a query string with the current day of the month in order to produce this list: http://www.mysite.com/mediareleases.aspx?prevDays=18 As such I need to redirect clients requesting http://www.mysite.com/mediareleases.aspx to http://www.mysite.com/mediareleases.aspx?prevDays=whateverDayOfTheMonthItIs My

Internationalization and Search Engine Optimization

℡╲_俬逩灬. 提交于 2019-12-03 18:35:57
问题 I'd like to internationalize my site such that it's accessible in many languages. The language setting will be detected in the request data automatically, and can be overridden in the user's settings / stored in the session. My question pertains to how I should display the various versions of the same page based upon language in terms of the pages' URL's. Let's say we're just looking at the index page of http://www.example.com/ , which defaults to English. Now if a French-speaker loads the

Is there a better way to find set intersection for Search engine code?

怎甘沉沦 提交于 2019-12-03 17:34:01
I have been coding up a small search engine and need to find out if there is a faster way to find set intersections. Currently, I am using a Sorted linked list as explained in most search engine algorithms. i.e for every word I have a list of documents sorted in a list and then find the intersection among the lists. The performance profiling of the case is here . Any other ideas for a faster set intersection? An efficient way to do it is by "zig-zag": Assume your terms is a list T : lastDoc <- 0 //the first doc in the collection currTerm <- 0 //the first term in T while (lastDoc != infinity):

Set the Default Search Engine Provider of IE with IOpenServiceManager::InstallService

末鹿安然 提交于 2019-12-03 16:56:16
I would like to set the Default Search Engine Provider of IE with IOpenServiceManager::InstallService: Belong to the link http://www.opensearch.org/Specifications/OpenSearch/1.1#OpenSearch_description_elements . I created the SearchProviderInfo.xml like this: <?xml version="1.0" encoding="UTF-8"?> <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"> <ShortName>Web Search</ShortName> <Description>Use Example.com to search the Web.</Description> <Tags>example web</Tags> <Contact>admin@example.com</Contact> <Url type="application/atom+xml" template="http://example.com/?q=

programmer-friendly search engine? [duplicate]

假装没事ソ 提交于 2019-12-03 16:39:08
问题 This question already has answers here : Closed 8 years ago . Possible Duplicate: Programmer-friendly search engine? Google is unfriendly to searching for verbatim strings with characters like $ and #. Is there a search engine that supports searching for verbatim strings? 回答1: Try http://www.google.com/codesearch Just remember that it's a REGEXP language, so to search for $value, use \$value. For example: http://www.google.com/codesearch?hl=en&lr=&q=\%24value&sbtn=Search 回答2: this one would

SOLR Permissions / Filtering Results depending on Access Rights

十年热恋 提交于 2019-12-03 15:35:20
问题 For example I have Documents A, B, C. User 1 must only be able to see Documents A, B. User 2 must only be able to see Document C. Is it possible to do it in SOLR without filtering by metadata? If I use metadata filter, everytime there are access right changes, I have to reindex. [update 2/14/2012] Unfortunately, in the client's case, change is frequent. Data is confidential and usually only managed by the owners which are internal users. Then the specific case is they need to be able to share