search-engine | 易学教程

How to perform a wildcard search in Lucene

阅读更多关于 How to perform a wildcard search in Lucene

I know that Lucene has extensive support for wildcard searches and I know you can search for things like: Stackover* (which will return Stackoverflow ) That said, my users aren't interested in learning a query syntax. Can Lucene perform this type of wildcard search using an out-of-box Analyzer? Or should I append "*" to every search query? Doing this with string manipulations is tricky to get right, especially since the QueryParser supports boosting, phrases, etc. You could use a QueryVisitor that rewrites TermQuery into PrefixQuery. public class PrefixRewriter : QueryVisitor { protected

How to do simple boolean query in sunspot solr

阅读更多关于 How to do simple boolean query in sunspot solr

问题 >>> marketing = User.search do |s| >>> s.fulltext "Marketing" >>> end >>> marketing.total 1448 >>> sales = User.search do |s| >>> s.fulltext "Sales" >>> end >>> sales.total 567 >>> marketing_and_sales = User.search do |s| >>> s.fulltext "Marketing AND Sales" >>> end >>> marketing_and_sales.total 945 >>> marketing_or_sales = User.search do |s| >>> s.fulltext "Marketing OR Sales" >>> end >>> marketing_or_sales.total 945 <Sunspot::Search:{:fq=>["type:User"], :q=>"Marketing AND Sales", :fl=>"*

What is a good Web search and web crawling engine for Java?

阅读更多关于 What is a good Web search and web crawling engine for Java?

I am working on an application where I need to integrate the search engine. This should do crawling also. Please suggest a good Java based search engine. Thank you in advance. Ajay Nutch ( Lucene ) is an Open Source engine which should satisfy your needs. In the past I worked with terrier , a search engine written in Java: Terrier is a highly flexible, efficient, effective, and robust search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities. Terrier provides an ideal platform for the rapid development

Is it a good idea to use <a href=“http://name.com” rel=“noindex, nofollow”>name</a> in this situation? [closed]

阅读更多关于 Is it a good idea to use name in this situation? [closed]

Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . I have a network of about 200 blogs (Wordpress Multisite), and all of them show links to all the other ones on a sidebar on the right hand side (basically 200+ links on the right hand side of every single page). I have it set to rel="nofollow" now, but I was wondering if changing it to rel="noindex, nofollow" would be a good idea? Thank you for any input. nofollow nofollow only means that a bot

Meaning of parameters in a Google query?

阅读更多关于 Meaning of parameters in a Google query?

问题 Are there any ressources on what the parameters in a Google query mean? Any analysis how the Google search pages work internally? Examples would be http://www.google.com/#hl=en&source=hp&q=lol&aq=f&aqi=&aql=&oq=&fp=45675624562456 or http://www.google.com/url?sa=t&source=web&ct=res&cd=11&ved=KJSGHFKSDJF&url=sfdgagasdgasdgasgasg&rct=j&q=fghthwrteghedgf&ei=asdfasdfsa&usg=asdfasdfasf 回答1: q=searchstring is the search string source=something is where the search originated (www.google.com webpage,

how to configure the synonyms_path in elasticsearch

阅读更多关于 how to configure the synonyms_path in elasticsearch

问题 i'm pretty new to elasticsearch and i want to use synonyms, i added these lines in the configuration file: index : analysis : analyzer : synonym : type : custom tokenizer : whitespace filter : [synonym] filter : synonym : type : synonym synonyms_path: synonyms.txt then i created an index test: "mappings" : { "test" : { "properties" : { "text_1" : { "type" : "string", "analyzer" : "synonym" }, "text_2" : { "search_analyzer" : "standard", "index_analyzer" : "synonym", "type" : "string" }, "text

Lucene scoring: in what context is queryNorm used?

阅读更多关于 Lucene scoring: in what context is queryNorm used?

I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like: score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d)) I understand every component in this formula except queryNorm(q) . As explained by the official documentation, queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different

How to create a basic semantic search in python

阅读更多关于 How to create a basic semantic search in python

I want to write a basic semantic web crawler using Python, I know that semantic apps use RDF files, but what else? I have some Python RDF modules installed and I started learning how they work. Could you introduce me to the technologies and techniques used in a semantic application? Lev Khomich The next things you might want to learn are: embedding samantic data in HTML - RDFa , microformats , microdata . Some stats: microformats and RDFa deployment across the Web via DAM.co.uk : querying RDF data - SPARQL most popular ontologies currently in use list of available SPARQL endpoints You can find

How to block bad unidentified bots crawling my website?

阅读更多关于 How to block bad unidentified bots crawling my website?

How can I resist the bad unidentified bots to crawl my website? Some bad bots whose name is not present in cPanel of Apache are badly accessing my website bandwidth. I had tried robots.txt on batgap.com/robots.txt and also blocked with .htaccess but there is no improvement in bandwidth usage. I don't know the IP of those bots so unable to block them by IP address. These bots are consuming too much bandwidth of site and hence a result I need to increase it from server. I'm from Incapsula and we deal with bad bots on a regular basis. We've recently release a bot-related research that provides

AttributeError: 'ElasticSearch' object has no attribute 'bulk_index'"

阅读更多关于 AttributeError: 'ElasticSearch' object has no attribute 'bulk_index'"

问题 When I try python manage.py rebuild_index , error occur: self.conn.bulk_index(self.index_name, 'modelresult', prepped_docs, id_field=ID) AttributeError: 'ElasticSearch' object has no attribute 'bulk_index' I found the link https://github.com/toastdriven/pyelasticsearch/blob/master/pyelasticsearch.py#L424-469 with pyelasticsearch.py, and I dont know which edition it is. Anyway there is bulk_index in that code, buy my pyelasticsearch.py is not. Anyone has the same experience? thanks for ur time