solr

一步一步学solr:什么是solr?

微笑、不失礼 提交于 2019-12-17 22:17:21
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 简介 Solr是一个独立的 企业级搜索 应用服务器,它对外提供类似于Web-service的API接口。用户可以通过http请求,向搜索引擎服务器提交一定格式的XML文件,生成索引;也可以通过Http G et操作提出查找请求,并得到XML格式的返回结果; 特点 Solr是一个独立的企业搜索服务器REST-like API。 你把文件(称为“索引”) 通过XML、JSON、CSV通过HTTP或二进制。 你查询它通过HTTP GET和接收XML、JSON、CSV或二进制的结果。 高级全文搜索功能 优化了高容量的网络流量 基于标准的开放接口——XML、JSON和HTTP 综合HTML管理接口 服务器统计数据暴露在JMX监控 线性可伸缩、自动索引复制,自动故障转移和恢复 接近实时索引 灵活和适应性强的XML配置 可扩展的插件体系结构 Solr使用Lucene TM 搜索库和扩展了它! 真正的数据模式,数值类型、动态字段,独特的钥匙 强大的扩展Lucene查询语言 面向方面的搜索和过滤 地理空间搜索支持多个分文档和geo多边形 先进、可配置的文本分析 高度可配置和用户可扩展的缓存 性能优化 外部配置通过XML 一个基于AJAX的管理界面 可监控日志 快接近实时增量索引和索引复制 高度可伸缩的分布式搜索分散指数跨多个主机

SolrEntityProcessor is called only once for sub-entities

穿精又带淫゛_ 提交于 2019-12-17 21:11:37
问题 I'm using Solr 4.2, and I am trying to call SolrEntityProcessor as a sub-entity . So far, only one call is made to Solr and a single document is indexed while all others are ignored. This should be possible, but it doesn't seem to work... Any ideas? Code snippist: <document> <entity dataSource="psql" name="user" query="SELECT * FROM users";> <field column="id" name="user_id" /> <entity name="liked_items" processor="SolrEntityProcessor" url="http://localhost:8983/solr/items" query="user_liking

solr-创建core(二)

亡梦爱人 提交于 2019-12-17 20:23:08
创建solr核心core [julong@localhost bin]$ ./solr create -c julong Copying configuration to new core instance directory: /home/julong/solr-5.5.2/server/solr/julong Creating new core 'julong' using command: http://localhost:8983/solr/admin/cores?action=CREATE&name=julong&instanceDir=julong { "responseHeader":{ "status":0, "QTime":3587}, "core":"julong"} 删除solr核心core [julong@localhost bin]$ ./solr delete -c julong Deleting core 'julong' using command: http://localhost:8983/solr/admin/cores?action=UNLOAD&core=julong&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true {"responseHeader":{ "status"

Solr join “not in” subselect

*爱你&永不变心* 提交于 2019-12-17 19:54:32
问题 In the Solr join documentation Solr Join they say that: /solr/collection1/select ? fl=xxx,yyy & q={!join from=inner_id to=outer_id}zzz:vvv is equivalent to: SELECT xxx, yyy FROM collection1 WHERE outer_id IN (SELECT inner_id FROM collection1 where zzz = "vvv") How do I write in Solr (see the NOT): SELECT xxx, yyy FROM collection1 WHERE outer_id NOT IN (SELECT inner_id FROM collection1 where zzz = "vvv") Lets consider the following example: People Records: 1. name='a', id=1, teacherId=4 2.

Solr/Lucene fieldCache OutOfMemory error sorting on dynamic field

▼魔方 西西 提交于 2019-12-17 19:29:39
问题 We have a Solr core that has about 250 TrieIntField s (declared as dynamicField ). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time. The issue we are facing is that the underlying lucene fieldCache gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we

solr suggester not returning any results

守給你的承諾、 提交于 2019-12-17 18:04:22
问题 I've followed the solr wiki article for suggester almost to the T here: http://wiki.apache.org/solr/Suggester. I have the following xml in my solrconfig.xml: <searchComponent class="solr.SpellCheckComponent" name="suggest"> <lst name="spellchecker"> <str name="name">suggest</str> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str> <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str> <str name="field">description</str> <float name="threshold">0.05<

Solr search for hashtag or mentions

我是研究僧i 提交于 2019-12-17 17:48:12
问题 We are using solr version 3.5 to search though Tweets, I am using WordDelimiterFactory with the following setting, to be able to search for @username or #hashtags : <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1" handleAsChar="@#"/> I saw the following patch but this doesn’t seem to be working as I expected, am I missing something?

HTTP ERROR: 404 missing core name in path with solr

佐手、 提交于 2019-12-17 16:01:36
问题 I am new to Solr, after installing it in ubuntu 8.10, when I was trying exampledocs to index , as per this link, I got this error: HTTP ERROR: 404 missing core name in path This is in Jetty. What shall I do, in order to solve this? 回答1: I've gotten the same error: HTTP ERROR: 404 missing core name in path In my case I've forgotten so set the solr/home value in the WEB-INF/web.xml file <env-entry> <env-entry-name>solr/home</env-entry-name> <env-entry-value>/put/your/solr/home/here</env-entry

How can I Schedule data imports in Solr

人走茶凉 提交于 2019-12-17 15:32:09
问题 The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c 回答1: On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler. UPDATE To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod. If you need to

Remove results below a certain score threshold in Solr/Lucene?

六月ゝ 毕业季﹏ 提交于 2019-12-17 10:49:35
问题 Is there a built-in functionalities in solr/lucene to filter the results if they fall below a certain score threshold? Let's say if I provide a score threshold of .2, then all documents with score less than .2 will be removed from my results. My intuition is that this is possible by updating/customizing solr or lucene. Could you point me to right direction on how to do this? Thanks in advance! 回答1: You could write your own Collector that would ignore collecting those documents that the scorer