search-engine

What is a good Web search and web crawling engine for Java?

有些话、适合烂在心里 提交于 2019-12-06 07:39:05
问题 I am working on an application where I need to integrate the search engine. This should do crawling also. Please suggest a good Java based search engine. Thank you in advance. 回答1: Nutch (Lucene) is an Open Source engine which should satisfy your needs. 回答2: In the past I worked with terrier, a search engine written in Java: Terrier is a highly flexible, efficient, effective, and robust search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the

How to implement an Enterprise Search

好久不见. 提交于 2019-12-06 07:31:38
问题 We are searching disparate data sources in our company. We have information in multiple databases that need to be searched from our Intranet. Initial experiments with Full Text Search (FTS) proved disappointing. We've implemented a custom search engine that works very well for our purposes. However, we want to make sure we are doing "the right thing" and aren't missing any great tools that would make our job easier. What we need: Column search ability to search by column we flag which columns

Magento Search Engine Relevance Issues

核能气质少年 提交于 2019-12-06 07:13:19
We currently have a Magento website with a large inventory, we are having some issues with relevance of ON SITE search results. We are currently set to 'combine like and fulltext' but the results are aren't what we expected. For example searching for 'Lee Child' (the author), brings up three Lee Child books, then three books with author as 'Lauren Child' and then the rest of the Lee Child books. So essentially we want to give preference to the full text search and view those results BEFORE the like search results. We also want to display in stock products before out of stock products. We have

Is it a good idea to use <a href=“http://name.com” rel=“noindex, nofollow”>name</a> in this situation? [closed]

梦想与她 提交于 2019-12-06 06:53:03
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . I have a network of about 200 blogs (Wordpress Multisite), and all of them show links to all the other ones on a sidebar on the right hand side (basically 200+ links on the right hand side of every single page). I have it set to rel="nofollow" now, but I was wondering if changing it to rel="noindex, nofollow"

Solr Suggester Lookup Class for Predictive Search

◇◆丶佛笑我妖孽 提交于 2019-12-06 06:20:11
问题 I'm working on Solr 3.6 for auto suggestions. I've been referring to the Solr Suggester component (http://wiki.apache.org/solr/Suggester) for the same. But I'm unable to decide which Lookup class should I use for Suggester, moreover there isn't any good documentation provided either from which I can find the best one. I have choose among these four Lookup Class: JaspellLookup - tree-based representation based on Jaspell, TSTLookup - ternary tree based representation, capable of immediate data

ignore accents in elastic search with haystack

十年热恋 提交于 2019-12-06 05:36:05
I am using elasticsearch along with haystack in order to provide search. I want user to search in language other than english. E.g. currently trying with Greek. How can I ignore the accents while searching for anything. E.g. let's say if I enter Ανδρέας ( with accents), its returning results matched with it. But when I enter Ανδρεας, its not returning any results. The search engine should bring any results that have "Ανδρέας" but also "Ανδρεας" as well (the second one is not accented). Can someone point out how to resolve the issue? Please let me know if I need post settings for elastic search

How to replace robots.txt with .htaccess

a 夏天 提交于 2019-12-06 04:37:24
I have a small situation where i have to remove my robots.txt file because i don't want and robots crawlers to get the links. Also i want them to be accessible by the user and i don't want them to be cached by the search engines. Also i cannot add any user authentications for various reasons. So i am thinking about using mod-rewrite to disable search engine crawlers from crawling it while allowing all others to do it. The logic i am trying to implement is write a condition to check if the incomming user agent is a search engine and if yes then re-direct them to 401. The only problem is i don't

Lucene scoring: in what context is queryNorm used?

六眼飞鱼酱① 提交于 2019-12-06 03:52:22
问题 I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like: score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d)) I understand every component in this formula except queryNorm(q) . As explained by the official documentation, queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the

How to create a basic semantic search in python

喜夏-厌秋 提交于 2019-12-06 03:06:01
问题 I want to write a basic semantic web crawler using Python, I know that semantic apps use RDF files, but what else? I have some Python RDF modules installed and I started learning how they work. Could you introduce me to the technologies and techniques used in a semantic application? 回答1: The next things you might want to learn are: embedding samantic data in HTML - RDFa, microformats, microdata. Some stats: microformats and RDFa deployment across the Web via DAM.co.uk: querying RDF data -

How can I find what search terms (if any) brought a user to my site?

拜拜、爱过 提交于 2019-12-06 01:11:31
I want to create dynamic content based on this. I know it's somewhere, as web analytics engines can get this data to determine how people got to your site (referrer, search terms used, etc.), but I don't know how to get at it myself. You can use the "referer" part of the request that the user sent to figure out what he searched for. Example from Google: http://www.google.no/search?q=stack%20overflow So you must search the string (in ASP(.NET) this can be found be looking in Request.Referer ) for "q=" and then URLDecode the contents. Also, you should take a look at this article that talks more