solr

Solr open document after searching a keyword

旧巷老猫 提交于 2020-01-02 10:58:33
问题 I am trying to index some PDF documents and then create a Search UI . This question is somewhat related to Solr Index PDF documents and post them to a remote server 1) Indexing PDF Docs - > I use tika jar to convert PDF to text files and then use curl command to index them. 2) Search UI --> I m using Solritas browse feature and its built in UI. Objective : When I search for a word say "Lucene" in the list of indexed documents and when I get a result set for the given query I want a link to be

Converting the solr response (solrdocumentlist) to json (or xml)

ε祈祈猫儿з 提交于 2020-01-02 10:05:41
问题 I am working on a search in which I am trying to convert the response from a HttpSolrServer to json format. The response comes as a SolrDocumentList . The code I have right now is: SolrQuery solrQuery = new SolrQuery(query); solrQuery.setParam("wt", "json"); //doesn't affect the return format QueryResponse rsp = solrServer.query(solrQuery); SolrDocumentList docs = rsp.getResults(); return docs.toString(); When I print out the return, it comes back as: {numFound=2,start=0,docs=[SolrDocument

Converting the solr response (solrdocumentlist) to json (or xml)

旧街凉风 提交于 2020-01-02 10:05:31
问题 I am working on a search in which I am trying to convert the response from a HttpSolrServer to json format. The response comes as a SolrDocumentList . The code I have right now is: SolrQuery solrQuery = new SolrQuery(query); solrQuery.setParam("wt", "json"); //doesn't affect the return format QueryResponse rsp = solrServer.query(solrQuery); SolrDocumentList docs = rsp.getResults(); return docs.toString(); When I print out the return, it comes back as: {numFound=2,start=0,docs=[SolrDocument

Custom full-text index stored in Cassandra

╄→尐↘猪︶ㄣ 提交于 2020-01-02 09:38:29
问题 I've got a situation where I'm using Cassandra for DB and I need full-text search capability. Now I'm aware of Apache Solr, Apache Cassandra, and DSE search. However, I do not want to use a costly and proprietary software(DSE search). The reason I do not want to use Apache Solr is because I don't want to deal with HA, sharding, and redundency for it. Cassandra is perfect for HA, sharding, and redundency; I would like to store my full-text index in the existing Cassandra DB. So what I'm

Solr index and search multilingual data

a 夏天 提交于 2020-01-02 09:09:29
问题 In my Solr schema during indexing Solr detects a language of the data being indexed and applies different indexing rules according to the language it's detected. All data is stored in language specific fields, for example: English titles are stored in title_en field. Spanish titles are stored in title_es field. - <field name="title_en" type="text_en" indexed="true" stored="true"/> <field name="title_es" type="text_es" indexed="true" stored="true"/> All searches are made against one catch-all

Find all the web pages in a domain and its subdomains

浪尽此生 提交于 2020-01-02 08:04:09
问题 I am looking for a way to find all the web pages and sub domains in a domain. For example, in the uoregon.edu domain, I would like to find all the web pages in this domain and in all the sub domains (e.g., cs.uoregon.edu). I have been looking at nutch, and I think it can do the job. But, it seems that nutch downloads entire web pages and indexes them for later search. But, I want a crawler that only scans a web page for URLs that belong to the same domain. Furthermore, it seems that nutch

Sunspot/Solr queries ending with logical operators AND/OR/NOT result in error

人走茶凉 提交于 2020-01-02 07:42:21
问题 I noticed that queries ending with logical operators like AND/OR/NOT example ('this AND') will result in an error. Now what would be the best way to handle this? Just trim out or escape all the queries ending with one of those? Note that it also happens for queries starting with one of these words. And sometimes, valid names end with such words, like Oregon OR. 回答1: I believe escaping any AND/OR/NOT instances in your query that aren't meant to be boolean logic would be your best bet: Article

field listing in solr with “fl” parameter for a field having space in between

时光总嘲笑我的痴心妄想 提交于 2020-01-02 07:21:11
问题 I have a field in my solr schema as "Post Date"(exclude the quotes). when i fire a query with "fl" (field list) parameter in order to view only Post Date of search results, since this field contains a space I am not getting anything in the docs responses. I tried using +, %20 but still i get no results. Please help. 回答1: I would like to inform that i have found a solution to this. I tried experimenting and hence came up with a solution on putting \+ as the substitute for white space in the

Solr, block updating of existing document

别等时光非礼了梦想. 提交于 2020-01-02 06:40:34
问题 When a document is sent to solr and such document already exists in the index (by its ID) then the new one replaces old one. But I don't want to automatically replace documents. Just ignore and proceed to the next. How can I configure solr. Of course I can query solr to check if it has the document already but it's bad for me since I do bulk updates and this will complicate the process and increase amount of request. So are there any ways to configure solr to ignore duplicates? 回答1: You can

Can I protect short words from an n-gram filter in Solr?

霸气de小男生 提交于 2020-01-02 05:59:14
问题 I have seen this question about searching for short words in Solr. I am wondering if there is another possible solution to a similar problem. I am using the EdgeNGramFilter with a minGramSize of 3. I want to protect a specific set of shorter words (two-letter acronyms, mainly) from being ignored, but I'd like to keep that minGramSize of 3 for everything else. EdgeNGramFilter doesn't support a protected words list. Is there any filter or setting that makes this possible within a single field