solr | 易学教程

How do I index documents in SOLR?

阅读更多关于 How do I index documents in SOLR?

问题 Im running Solr 1.4 on Ubuntu 10.04 (installed via apt-get solr-tomcat) and it seems to be working fine. Im having some difficulty finding any coherent info on how to index documents though. Im new to SOLR so bear with me! I have a folder (/mnt/folder) that is a mounted windows share, which contains Word and PDF files that I would like indexed, whats the easiest way to get SOLR to index the entire folder? The documentation for SOLR is pretty poor, its impossilbe to find any decent tutorials

Request handle solrconfig.xml Spellchecker

阅读更多关于 Request handle solrconfig.xml Spellchecker

问题 I am trying to set up spellchecker, according to solr documentation. But when I am testing, I don't have any suggestion. My piece of code follows: <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textSpell</str> <lst name="spellchecker"> <str name="classname">solr.IndexBasedSpellChecker</str> <str name="name">default</str> <str name="field">name</str> <str name="spellcheckIndexDir">./spellchecker</str> </lst> <str name=

Solr初始化源码分析-Solr初始化与启动

阅读更多关于 Solr初始化源码分析-Solr初始化与启动

用solr做项目已经有一年有余，但都是使用层面，只是利用solr现有机制，修改参数，然后监控调优，从没有对solr进行源码级别的研究。但是，最近手头的一个项目，让我感觉必须把solrn内部原理和扩展机制弄熟，才能把这个项目做好。今天分享的就是：Solr是如何启动并且初始化的。大家知道，部署solr时，分两部分：一、solr的配置文件。二、solr相关的程序、插件、依赖lucene相关的jar包、日志方面的jar。因此，在研究solr也可以顺着这个思路：加载配置文件、初始化各个core、初始化各个core中的requesthandler... 　　研究solr的启动，首先从solr war程序的web.xml分析开始，下面是solr的web.xml片段： <web-app xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version="2.5" metadata-complete="true" > <!-- Uncomment if

Timing out a query in Solr

阅读更多关于 Timing out a query in Solr

问题 I hitting queries to solr through a custom developed layer and few queries which i time out in my layer are still in the solr instance. Is there a parameter in solr which can be used to time out an particular query 回答1: As stated in Solr query continues after client disconnects? and written in the Solr FAQ Internally, Solr does nothing to time out any requests -- it lets both updates and queries take however long they need to take to be processed fully. But at the same spot in the FAQ is

Timing out a query in Solr

阅读更多关于 Timing out a query in Solr

Full text search options for MongoDB setup

阅读更多关于 Full text search options for MongoDB setup

问题 We are planning to store millions of documents in MongoDB and full text search is very much required. I read Elasticsearch and Solr are the best available solutions for full text search. Is Elastic search is mature enough to be used for Mongodb full text search? We also be sharding the collections. Does Elasticsearch works with Sharded collections? What are the advantages and disadvantages of using Elasticsearch or Solr? Is MongoDB capable of doing full text search? 回答1: There are some search

why the tikaEntityProcesor does not index the Text field in the following data-config file?

阅读更多关于 why the tikaEntityProcesor does not index the Text field in the following data-config file?

问题 <dataConfig> <dataSource name="test1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/ACL" user="root" password="" /> <dataSource name="test2" type="BinFileDataSource" /> <document> <entity name="files" dataSource="null" rootEntity="false" processor="FileListEntityProcessor" transformer="RegexTransformer" baseDir="/home/shah/ResearchTestData/TestScore3" fileName="\.(txt)|(pdf)|(docx)" onError="skip" recursive="true"> <field column="fileAbsolutePath" name="ID" /> <field column=

Adding child documents to existing Solr 6.4 collection documents creates duplicate documents

阅读更多关于 Adding child documents to existing Solr 6.4 collection documents creates duplicate documents

问题 This question is similar to Solr doesn't overwrite - duplicated uniqueKey entries, but I am in a situation where I have a large body of existing documents that have already been added to the collection with no child documents, and I am using (standalone not cloud) Solr 6.4 rather than 5.3.1. We recently enabled child documents so that we could store richer data. We use SolrJ to load data into and query Solr, but to isolate the issue we're seeing, I used the command line Solr post tool to

Indexing wikipedia with solr

阅读更多关于 Indexing wikipedia with solr

问题 I've installed solr 4.6.0 and follow the tutorial available at Solr's home page. Everything was fine, untill I need to do a real job that I'm about to do. I have to get a fast access to wikipedia content and I was advised to use Solr. Well, I was trying to follow the example in the link http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia, but I couldn't get the example. I am newbie, and I don't know what means data_config.xml! <dataConfig> <dataSource type=

Configuring Solr to use UUID as a key

阅读更多关于 Configuring Solr to use UUID as a key

问题 I am trying to configure Solr 4 to work with UUID and so far I am unsuccessful From reading the documentation I have seen two different ways to configure schema.xml to work with UUID (both do not work) for both I need to write <fieldType name="uuid" class="solr.UUIDField" indexed="true" /> option 1: add: <field name="id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/> and make sure to remove the line <uniqueKey>id</uniqueKey> option 2 add: <field name="id" type=