solr

When should we apply Hard commit and Soft commit in SOLR?

早过忘川 提交于 2020-01-11 09:32:04
问题 I want to know when we should do hard commit and when we should do soft commit in SOLR. Thanks 回答1: In the same vein as the question you just asked but deleted, this is explained thoroughly on the internet: Soft commit when you want something to be made available as soon as possible without waiting for it to be written to disk. Hard commit when you want make sure its being persisted to disk. From the link above: Soft commits Soft commits are about visibility, hard commits are about durability

Solr - most frequent searched words

℡╲_俬逩灬. 提交于 2020-01-11 09:18:05
问题 I'm trying to organize a solr search engine. I've already set up the misspelling system and the suggestions. However I can't seem to find how to retrieve the top 10 most searched words/terms/keywords in solr/lucene. How can I get this? I want to display those on my homepage. 回答1: Solr does not provide this kind of feature out of the box. There is the StatsComponent, that provides you with all kind of statistics, but all of those are numeric only. Depending on how you access solr (directly or

(四)DIH导入结构化数据

…衆ロ難τιáo~ 提交于 2020-01-11 07:38:39
(四)DIH导入结构化数据   目前大多数的应用程序将数据存储在关系数据库(如oracle、sql server 、mysql等)、xml文件中。对这样的数据进行搜索是很常见的应用。所谓的DataImportHandler提供一种可配置的方式向solr导入数据,即可以一次全部导入,也可以增量导入。还可以声明式提供可配置的任务调度,让数据定时的从关系型数据库更新数据到solr服务器。 一、环境   1、windows 7   2、jdk1.8   3、tomcat8   4、solr7.1.0   5、mysql5.5  6、IK分词器(支持solr7的最优版本,ik低版本不支持solr7) 二、配置ik分词器   solr7.1中进行IK中文分词器的配置和solr低版本中最大不同点在于IK 分词器中jar包的引用。   一般的IK分词jar包都是不能用的(如:IKAnalyzer2012FF_u1.jar这种就不能用),因为IK分词中传统的jar不支持solr7.1这个高版本的,所以就会发送运行错误的界面。   下面的这个错误是血淋漓的教训。 solr7.1下IK中文分词器的安装配置。   1、下载支持solr7.1的IK版本   下载solr7.1专用的IK 分词器的jar包和相应的配置文件。   2、拷贝jar文件到 solr项目下的WEB-INF\lib目录  

Solr: how to highlight the whole search phrase only?

南楼画角 提交于 2020-01-11 03:27:08
问题 A I need to perform a phrase search. On the search results Im getting the exact phrase matches but looking at the highlighted parts I see that the phrase are tokenized i.e This is what I get when I search for the prase "Day 1" : <arr name="post"> <str><em>Day</em> <em>1</em> We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str> </arr> This is what I want to receive as a result: <arr name="post"> <str><em>Day 1</em> We have begun a new adventure! An early

PDFBox adding white spaces within words

我的梦境 提交于 2020-01-10 23:37:40
问题 When I try to extract text from my PDF files, it seems to insert white spaces between severl words randomly. I am using pdfbox-app-1.6.0.jar (latest version) on following sample file in Downloads section of this page : http://www.sheffield.gov.uk/roads/children/parents/6-11/pedestrian-training I've tried with several other PDF files and it seems to be doing same on several pages. I do the following: java -jar pdfbox-app-1.6.0.jar ExtractText -force -console ~/Desktop/ped training pdf.pdf on

PDFBox adding white spaces within words

别说谁变了你拦得住时间么 提交于 2020-01-10 23:33:06
问题 When I try to extract text from my PDF files, it seems to insert white spaces between severl words randomly. I am using pdfbox-app-1.6.0.jar (latest version) on following sample file in Downloads section of this page : http://www.sheffield.gov.uk/roads/children/parents/6-11/pedestrian-training I've tried with several other PDF files and it seems to be doing same on several pages. I do the following: java -jar pdfbox-app-1.6.0.jar ExtractText -force -console ~/Desktop/ped training pdf.pdf on

How to configure stemming in Solr?

不问归期 提交于 2020-01-10 19:35:32
问题 I add to solr index: "American". When I search by "America" there is no results. How should schema.xml be configured to get results? current configuration: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

springboot整合solr实现增删改查

放肆的年华 提交于 2020-01-10 16:19:16
在上篇文章中我已经讲了solr的安装与配置、DIH全量导入、ik分词器的配置,今天给小伙伴们分享一下springboot如何整合solr实现增删改查。先赞后看,已成习惯,点赞!!! 目录 一、导包 二、配置solrHost 三、实体类映射 四、增删改 五、solr查询 六、实际应用 一、导包 日常导包,不必多言 <!-- solr--> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-solr</artifactId> <version>2.2.0.RELEASE</version> </dependency> 版本的话你们自己根据自己情况而定。 二、配置solrHost 配置solrhost分两种情况,一种是只有一个core的,一种是有多个core的。只有一个core的同学配置就很简单了,在springboot的主配置文件application.properties中添加配置即可。 至于有多个core的同学,就不需要在主配置文件 配置host了,你需要在每次调用solr操作的时候指明用哪个core的solrClient去操作。通过HttpSolrClient solrClient = new HttpSolrClient(url)

How to manage “paging” with Solr?

别说谁变了你拦得住时间么 提交于 2020-01-10 15:38:42
问题 I have a classifieds website... I have Solr doing the searching of the classifieds, and then return ID:nrs which I then use to put into an array. Then I use this array to find any classifieds in a MySql db where the ID:s match the ID:s in the array returned by Solr. Now, because this array can be very very big (100thousand records or more) then I would need to "page" the results so that maybe 100 where returned at a time. And then use those 100 ID:s in MySql to find the classifieds. So, is it

springboot整合solr

江枫思渺然 提交于 2020-01-10 06:54:15
上一篇博客中简要写了solr在windows的安装与配置,这一篇接上文写一下springboot整合solr,代码已经上传到github, 传送门 。 1、新建core并配置schema 上篇博客中已经有了相关内容,就不在展开叙述了,具体仿照3.2和3.3的配置schema,原文地址 https://www.cnblogs.com/wdfordream/p/11352053.html solr create -c "book_core" ,配置分词器并且field类型定义为分词器类型。 <fieldType name="ik_word" class="solr.TextField"> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" conf="ik.conf"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true" conf="ik.conf"/>