search-engine | 易学教程

mg4j vs. apache lucene

阅读更多关于 mg4j vs. apache lucene

问题 Can anyone provide a simple comparative analysis of these search engines? What advantages does either framework have? BTW, I've seen the following basic explanations of choosing mg4j from several academic papers: combining indices over the same collection multi-index queries Update: These slides (from mir2ed.org) contain a more fresh overview of open source search engines including Lucene and mg4j on benchmarking various aspects: memory & CPU, index size, search performance, search quality

Google sitelink search box [closed]

阅读更多关于 Google sitelink search box [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last month . We are implementing search box within the search result in Google for our site. We have our own search feature on website, and dont want to use Google custom search. We are following instructions on the following page, but finding it difficult to set it up. Google developer site. I added the following JSON-LD in

AND multiple values of a filter in sphinx

阅读更多关于 AND multiple values of a filter in sphinx

问题 I have an attribute in my sphinx index tag_id and now I want to fetch all record that have tag_id 10 and 11 When I do $sphinxClient->setFilter('tag_id', array(10,11)) it fetches all have tag_id 10 or 11 Is it possible to AND both values rather than OR? 回答1: $sphinxClient->setFilter('tag_id', array(10)); $sphinxClient->setFilter('tag_id', array(11)); Multiple calls to setFilter are ANDed :) 回答2: Why would two different values return result with AND?? It's like WHERE id = 1 AND id = 2 WHICH

Magento Search Engine Relevance Issues

阅读更多关于 Magento Search Engine Relevance Issues

问题 We currently have a Magento website with a large inventory, we are having some issues with relevance of ON SITE search results. We are currently set to 'combine like and fulltext' but the results are aren't what we expected. For example searching for 'Lee Child' (the author), brings up three Lee Child books, then three books with author as 'Lauren Child' and then the rest of the Lee Child books. So essentially we want to give preference to the full text search and view those results BEFORE

robots.txt allow all except few sub-directories

阅读更多关于 robots.txt allow all except few sub-directories

问题 I want my site to be indexed in search engines except few sub-directories. Following are my robots.txt settings: robots.txt in the root directory User-agent: * Allow: / Separate robots.txt in the sub-directory (to be excluded) User-agent: * Disallow: / Is it the correct way or the root directory rule will override the sub-directory rule? 回答1: No, this is wrong. You can’t have a robots.txt in a sub-directory. Your robots.txt must be placed in the document root of your host. If you want to

Pass variable to Google Custom Search Engine

阅读更多关于 Pass variable to Google Custom Search Engine

问题 Is it possible to pass a search variable into the Google Custom Search Engine that I have embedded on my website? I can get the search engine to work, but I can't pass it a term via POST (it's coming from a search button on other pages of the website) I tried to hack the code I found here: http://code.google.com/apis/ajax/playground/?exp=search#hello_world And this is what I have so far... ($q is the term I am passing to it) <script type="text/javascript"> google.load('search', '1', {language

what is the best way to build inverted index?

阅读更多关于 what is the best way to build inverted index?

问题 I'm building a small web search engine for searching about 1 million web pages and I want to know What is the best way to build the inverted index ? using the DBMS or What …? from many different views like storage cost, performance, speed of indexing and query? and I don't want to use any open source project for that I want to make my own one! 回答1: Perhaps you might want to elaborate why you do not wish to use F/OSS tools like Lucene or Sphinx. 回答2: Most of the current closed-source database

Need help with SQL for ranking search results

阅读更多关于 Need help with SQL for ranking search results

问题 I am trying to build a tiny exercise search engine using mysql. Each exercise can have an arbitrary number of search tags. Here is my data structure: TABLE exercises ID title TABLE searchtags ID title TABLE exerciseSearchtags exerciseID -> exercises.ID searchtagID -> searchtags.ID ...where exerciseSearchtags is a many to many join table expressing the relationship between exercises and searchtags. The search engine accepts an unknown number of user inputted keywords. I would like to rank

Regular expression to detect the search engine and search words

阅读更多关于 Regular expression to detect the search engine and search words

问题 I need to detect search engines that refers to my website. Since every search engine has different query strings for searching(e.g. google uses 'q=', yahoo uses 'p=') I created a database for search engines with their url regex patterns. As an example: http://www.google.com/search?q=blabla&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-GB:official&client=firefox-a the regex I created for google is: (http:)(\\/)(\\/)(www)(\\.)(google)(\\.).*(\\/)(search).*(&q=|\\?q=).* (I am a newbie on regex, but

using elasticsearch to filter through tags with whitespace

阅读更多关于 using elasticsearch to filter through tags with whitespace

问题 I am using tire (https://github.com/karmi/tire) with mongoid. Here is my model definition: class SomethingWithTag include Mongoid::Document include Mongoid::Timestamps field :tags_array, type: Array include Tire::Model::Search include Tire::Model::Callbacks mapping do indexes :tags_array, type: :array, index: :not_analyzed end end Say I have a document {tags_array: ["hello world"]}. Then the following queries work fine: SomethingWithTag.tire.search { filter :terms, :tags_array => ["hello"] }