solr

How to connect Spark Streaming to standalone Solr on windows?

早过忘川 提交于 2020-01-07 04:10:30
问题 I want to integrate Spark Streaming with Standalone Solr. I am using Spark 1.6.1 and Solr 5.2 standalone on windows with no Zookeeper configuration. I am able to find some solution where they are connecting to Solr from spark by passing the Zookeeper config. How can I connect my spark program to standalone Solr? 回答1: Please see if this example is helpful http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd From example, you will need to

Using StandardTokenizerFactory with currency

。_饼干妹妹 提交于 2020-01-07 02:48:11
问题 The fieldType config descrived in this question works for me to detect currency (eg. docs containing "$30" ). However, we wish to use the StandardTokenizerFactory, rather than the WhiteSpaceTokenizerFactory - and this config returns false positives with the StandardTokenizerFactory (eg. docs containing the digits 30 without the $ symbol). What is the solution? Thanks How do I find documents containing digits and dollar signs in Solr? 回答1: Solved via a post to the solr user group http://lucene

JSON returned by Solr

笑着哭i 提交于 2020-01-07 02:36:10
问题 I'm using Solr in order to index my data. Through the Solr's UI I added, in the Schema window, two fields: word, messageid After I made the following query post: curl -X POST -H "Content-Type: application/json" 'http://localhost:8983/solr/messenger/update.json/docs' --data-binary '{"word":"hello","messageid":"23523}' I received the following JSON: { "responseHeader": { "status": 0, "QTime": 55 } } When I'm going to the Query Window in the API and Execute a query without parameters I get the

How to optimize solr indexes

岁酱吖の 提交于 2020-01-07 02:32:08
问题 when i run solr/admin page i got this information, it shows optimize=true, but i have not set optimize=true in configuration file than how it is optimizing the indexes. and how can i set it to false then . Schema Information Unique Key: UID_PK Default Search Field: text numDocs: 2881 maxDoc: 2881 numTerms: 41960 version: 1309429290159 optimized: true current: true hasDeletions: false directory: org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@ C:\apache-solr

SolrClient python update document

天涯浪子 提交于 2020-01-07 02:29:33
问题 I'm currently trying to create a small python program using SolrClient to index some files. My need is that I want to index some file content and then add some attributes to enrich the document. I used the post command line tool to index the files. Then I use a python program trying to enrich documents, something like this: doc = solr.get('collection', id) doc['new_attribute'] = 'value' solr.index_json('collection',json.dumps([doc])) solr.commit(openSearcher=True) Problem is that I have the

Does CKAN have a limit size of data to upload?

二次信任 提交于 2020-01-07 02:26:06
问题 I have set CKAN and it is running fine, but have two questions. Both problems below happen only if uploading file . If I add a new resource by a URL, everything runs fine. 1) I can upload small files (around 4kb) to a given dataset, but when trying with bigger files (65 kb) I get Error 500 An Internal Server Error Occurred . So is there a size limit for uploading files? What can I do to be able to upload bigger files? 2) I get another error, for the small uploaded files, and that is: when

Finding Solr documents that intersect with a defined Radius

回眸只為那壹抹淺笑 提交于 2020-01-07 00:36:15
问题 We are using Apache Solr 5.x, and we currently have a bunch of defined shapes. Polygons, Circles, etc. These all correspond to a document, each shape of coordinates does. What I want to know is - is it possible to provide a circle , that is - a (lat,lng) pair along with a Radius for that circle - and then find all documents that have an intersection with that circle? I have tried a variety of options, most recently this one: solr_index_wkt:"IsWithin(CIRCLE((149.39999999999998 -34.92 d=0

crawling with Nutch 2.3, Cassandra 2.0, and solr 4.10.3 returns 0 results

守給你的承諾、 提交于 2020-01-06 23:43:25
问题 I mainly followed the guide on this page. I installed Nutch 2.3, Cassandra 2.0, and solr 4.10.3. Set up went well. But when I executed the following command. No urls were fetched. ./bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2 Below are my settings. nutch-site.xml http://ideone.com/H8MPcl regex-urlfilter.txt +^http://([a-z0-9]*\.)*nutch.apache.org/ hadoop.log http://ideone.com/LnpAw4 I don't see any errors in the log file. I am really lost. Any help would be appreciated.

How to add data to the solr's schema

纵然是瞬间 提交于 2020-01-06 20:18:13
问题 I try to add new data to the solandra according to the solr's schema but I can't find any example about this. My ultimate goal is to integrate solandra with django-solr. What I understand about the insert and updating in the solr based on the original solr and django-solr is to send the new data on the http protocol to the decent path, for example: http://localhost:8983/solandra/wikipedia/update/json However, when I access the url, the browser keep telling me HTTP ERROR: 404 . Can you help me

How to add data to the solr's schema

假装没事ソ 提交于 2020-01-06 20:16:25
问题 I try to add new data to the solandra according to the solr's schema but I can't find any example about this. My ultimate goal is to integrate solandra with django-solr. What I understand about the insert and updating in the solr based on the original solr and django-solr is to send the new data on the http protocol to the decent path, for example: http://localhost:8983/solandra/wikipedia/update/json However, when I access the url, the browser keep telling me HTTP ERROR: 404 . Can you help me