solr | 易学教程

Solr open document after searching a keyword

阅读更多关于 Solr open document after searching a keyword

问题 I am trying to index some PDF documents and then create a Search UI . This question is somewhat related to Solr Index PDF documents and post them to a remote server 1) Indexing PDF Docs - > I use tika jar to convert PDF to text files and then use curl command to index them. 2) Search UI --> I m using Solritas browse feature and its built in UI. Objective : When I search for a word say "Lucene" in the list of indexed documents and when I get a result set for the given query I want a link to be

Converting the solr response (solrdocumentlist) to json (or xml)

阅读更多关于 Converting the solr response (solrdocumentlist) to json (or xml)

问题 I am working on a search in which I am trying to convert the response from a HttpSolrServer to json format. The response comes as a SolrDocumentList . The code I have right now is: SolrQuery solrQuery = new SolrQuery(query); solrQuery.setParam("wt", "json"); //doesn't affect the return format QueryResponse rsp = solrServer.query(solrQuery); SolrDocumentList docs = rsp.getResults(); return docs.toString(); When I print out the return, it comes back as: {numFound=2,start=0,docs=[SolrDocument

Converting the solr response (solrdocumentlist) to json (or xml)

阅读更多关于 Converting the solr response (solrdocumentlist) to json (or xml)

Custom full-text index stored in Cassandra

阅读更多关于 Custom full-text index stored in Cassandra

问题 I've got a situation where I'm using Cassandra for DB and I need full-text search capability. Now I'm aware of Apache Solr, Apache Cassandra, and DSE search. However, I do not want to use a costly and proprietary software(DSE search). The reason I do not want to use Apache Solr is because I don't want to deal with HA, sharding, and redundency for it. Cassandra is perfect for HA, sharding, and redundency; I would like to store my full-text index in the existing Cassandra DB. So what I'm

Solr index and search multilingual data

阅读更多关于 Solr index and search multilingual data

问题 In my Solr schema during indexing Solr detects a language of the data being indexed and applies different indexing rules according to the language it's detected. All data is stored in language specific fields, for example: English titles are stored in title_en field. Spanish titles are stored in title_es field. - <field name="title_en" type="text_en" indexed="true" stored="true"/> <field name="title_es" type="text_es" indexed="true" stored="true"/> All searches are made against one catch-all

Find all the web pages in a domain and its subdomains

阅读更多关于 Find all the web pages in a domain and its subdomains

问题 I am looking for a way to find all the web pages and sub domains in a domain. For example, in the uoregon.edu domain, I would like to find all the web pages in this domain and in all the sub domains (e.g., cs.uoregon.edu). I have been looking at nutch, and I think it can do the job. But, it seems that nutch downloads entire web pages and indexes them for later search. But, I want a crawler that only scans a web page for URLs that belong to the same domain. Furthermore, it seems that nutch

Sunspot/Solr queries ending with logical operators AND/OR/NOT result in error

阅读更多关于 Sunspot/Solr queries ending with logical operators AND/OR/NOT result in error

问题 I noticed that queries ending with logical operators like AND/OR/NOT example ('this AND') will result in an error. Now what would be the best way to handle this? Just trim out or escape all the queries ending with one of those? Note that it also happens for queries starting with one of these words. And sometimes, valid names end with such words, like Oregon OR. 回答1: I believe escaping any AND/OR/NOT instances in your query that aren't meant to be boolean logic would be your best bet: Article

field listing in solr with “fl” parameter for a field having space in between

阅读更多关于 field listing in solr with “fl” parameter for a field having space in between

问题 I have a field in my solr schema as "Post Date"(exclude the quotes). when i fire a query with "fl" (field list) parameter in order to view only Post Date of search results, since this field contains a space I am not getting anything in the docs responses. I tried using +, %20 but still i get no results. Please help. 回答1: I would like to inform that i have found a solution to this. I tried experimenting and hence came up with a solution on putting \+ as the substitute for white space in the

Solr, block updating of existing document

阅读更多关于 Solr, block updating of existing document

问题 When a document is sent to solr and such document already exists in the index (by its ID) then the new one replaces old one. But I don't want to automatically replace documents. Just ignore and proceed to the next. How can I configure solr. Of course I can query solr to check if it has the document already but it's bad for me since I do bulk updates and this will complicate the process and increase amount of request. So are there any ways to configure solr to ignore duplicates? 回答1: You can

Can I protect short words from an n-gram filter in Solr?

阅读更多关于 Can I protect short words from an n-gram filter in Solr?

问题 I have seen this question about searching for short words in Solr. I am wondering if there is another possible solution to a similar problem. I am using the EdgeNGramFilter with a minGramSize of 3. I want to protect a specific set of shorter words (two-letter acronyms, mainly) from being ignored, but I'd like to keep that minGramSize of 3 for everything else. EdgeNGramFilter doesn't support a protected words list. Is there any filter or setting that makes this possible within a single field