solr | 易学教程

SOLR/LUCENE Experts, please help me design a simple keyword search from PDF index?

阅读更多关于 SOLR/LUCENE Experts, please help me design a simple keyword search from PDF index?

问题 I dabbled with solr but couldn't figure out a way to tailor it to my reuqirement. What I have : A bunch of PDF files. A set of keywords. What I am trying to achieve : Index the PDF files (solrcell - done) Search for a keyword (works ok) Tailor the output to spit out the names of the PDF files, an excerpt where the keyword occurred (No clue/idea how to) Tried manipulating ResponseHandler/Schema.xml/Solrconfig.xml to no avail. Lucene/solr experts, do you think what I am trying to achieve is

Non-English Language support via SolrNet

阅读更多关于 Non-English Language support via SolrNet

问题 I am using SolrNet to search over Solr from an .NET application. Everything works fine when I search over English words. However if I use spanish words like español , I get no search result though I have indexed them. When I debugged over Solr, I found that the query was parsed as espaA+ol . Do I have to do some UTF-8 encoding or does SolrNet supports search over only ASCII characters? 回答1: This is not a SolrNet issue, it is related to how Solr handles characters that are not in the first 127

Apache tomcat server is stopped after several commit of documents in Solr 4.4

阅读更多关于 Apache tomcat server is stopped after several commit of documents in Solr 4.4

问题 We have successfully installed Solr 4.4. We have done setup in windows 7. With the tomcat 8.0, Java jre 7 and Solr 4.4. We have done commit for some documents. But unfortunately after some commit of document our Apache tomcat server is stopped. And at time we get the following error in application. System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing

solr installation, cannot start examples

阅读更多关于 solr installation, cannot start examples

问题 First: I want to learn solr. So I want to execute the quickstart tutorial from http://lucene.apache.org/solr/quickstart.html On ubuntu 14.04 64 bit I installed solr the following way: In /opt per wget I took the latest version 6.3.0. This was made as root. Then I extracted the service installation file by tar xzf solr-6.3.0.tgz solr-6.3.0/bin/install_solr_service.sh --strip-components=2 Also by root! I let it run by sudo ./install_solr_service.sh solr-6.3.0.tgz that led to user/group solr,

Exception while integrating openNLP with Solr

阅读更多关于 Exception while integrating openNLP with Solr

问题 I am trying to integrate openNLP with Solr 6.1.0.I configured the schema and solrconfig files as detailed in the wiki link: https://wiki.apache.org/solr/OpenNLP . changes made in solrconfig.xml file : <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lucene-libs" regex=".*\.jar" /> <lib dir="${solr.install.dir:../../../..}/contrib/analysis-extras/lib" regex="opennlp-.*\.jar" /> changes made in schema file : <fieldType name="text_opennlp_nvf" class="solr.TextField"

Solr Indexing My SQL Timestamp or Date Time field

阅读更多关于 Solr Indexing My SQL Timestamp or Date Time field

问题 To index Date in Solr, Date should be in ISO format. Can we index MySQL Timestamp or Date Time feild with out modifying SQL Select Statement ? I have used <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/> <field name="CreatedDate" type="tdate" indexed="true" stored="true" /> CreatedDate is of Type Date Time in MySQL I am getting following exception 11:23:39,117 WARN [org.apache.solr.handler.dataimport.DateFormatTransformer]

浅谈Solr和ElasticSearch建索引性能优化策略

阅读更多关于浅谈Solr和ElasticSearch建索引性能优化策略

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 由于Solr和ElasticSearch都是基于Lucene构建的，所以他们之间有很大程度的相似性，故而他们的一些优化策略基本也是通用的，面对越来越多的海量数据，如何优化全量索引的写入性能呢？散仙简单总结了下面几个方向的优化策略，如有疑问，欢迎拍砖。（一）硬件优化：（1）CPU加大，有利于并发写入（2）内存提升，加大写入缓冲（3）磁盘IO，使用SSD或者IO读写更快的磁盘（4）网络IO，保证客户端与服务端的通信带宽充足（二）服务端框架优化：（1）加大shard的数目，理论上shard越多，写入速度越快（2）设置较大的索引flush触发条件，ramBufferSizeMB 或者 maxBufferedDocs （3）写索引时，关闭副本，因为同步索引会大大降低写入速度（4）监控GC，调整JVM参数如果Full GC频繁，加大JVM堆内存，如果Yong GC频繁，加大新生代的比例，如果使用的是CMS垃圾收集器，必要时，可以关闭survive区，避免survive区和Eden区来回拷贝（5）尽量使用稳定的新版本如JDK和框架本身（6）内存大的，可以尝试G1垃圾收集器（三）客户端优化（1）如果公司有大数据部门，可以使用Hadoop或者Spark分布式集群构建索引（2

Solr实时创建增量或全量索引

阅读更多关于 Solr实时创建增量或全量索引

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 1，为了支持增量建索引，我们需要把上述文中的mysql-data-config.xml内容改为 <dataConfig> <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/basic" user="root" password="123"/> <document> <entity name="article" transformer="HTMLStripTransformer" query="SELECT id, title, content FROM article" deltaImportQuery="SELECT id, title, content FROM article WHERE id='${dataimporter.delta.id}'" deltaQuery="SELECT id FROM article WHERE update_time > '${dataimporter.last_index_time}'"> <field column="id" name="id" /> <field column="title" name="title" /> <field column=

How to boost a search if the result of a function is less than a field?

阅读更多关于 How to boost a search if the result of a function is less than a field?

问题 I have a form where someone who make a deliver choose, on a map, his location and then he sets the radius that he is attending. When a user comes to my site, he can perform a query and I want to boost the sellers that attend the location of the client. Basicaly, I need to do a HSIN function on solr between the seller and the client point and boost if the result is less then the radius. Boost Function allows me to boost by the result of the query (witch is not the case) and Boost Query does

Haystack more_like_this returns all

阅读更多关于 Haystack more_like_this returns all

问题 I am using Django, haystack, solr, to do searching. Ive am able to search and now I would like to find similar items using more_like_this. When I try to use the more_like_this functionality I get back all of the objects that are of that model type instead of just the ones that closely match it. Here is some code to show you how I am using it: def resource_view(request, slug): resource = Resource.objects.get(slug=slug) versions = Version.objects.get_for_object(resource) related =