solr

Update SOLR document without adding deleted documents

余生颓废 提交于 2020-01-25 12:13:17
问题 I'm running a lot of SOLR document updates which results in 100s of thousands of deleted documents and a significant increase in disk usage (100s of Gb). I'm able to remove all deleted document by doing an optimize curl http://localhost:8983/solr/core_name/update?optimize=true But this takes hours to run and requires a lot of RAM and disk space. Is there a better way to remove deleted documents from the SOLR index or to update a document without creating a deleted one? Thanks for your help!

在Windows的Tomcat环境下部署Solr 4.7.0

怎甘沉沦 提交于 2020-01-25 09:35:07
主要步骤如下: 1、下载solr-4.7.0.tgz; 2、解压缩solr-4.7.0.tgz,解压后目录结构如下: 3、将example/webapps目录下的solr.war复制到tomcat的webapps目录中; 4、启动tomcat服务器,这时候会报错,暂时不用管,只是为了解压war包,启动完成后关闭Tomcat; 5、新建一个tomcat-solr文件夹(名称与位置随意),我建在了D盘下。 6、继续到solr-4.7.0目录,将example/solr目录下的所有文件和目录拷贝到新建的solr-tomcat目录下: ( 注意solr.xml文件只需要一份即可,这个在配置多份索引时 无须多拷贝) 7、将example/lib/ext/下的所有jar包复制到tomcat/webapps/solr/WEB-INF的lib目录中,一共5个,是solr的独立日志处理模块; 8、在tomcat/webapps/solr/WEB-INF/下新建一个classes目录,将example/resources下的 log4j.properties 文件复制到该classes目录中,否则日志模块无法正常工作; 9、找到web.xml文件中用于配置环境变量的标签,去掉注释,并修改环境变量为 <env-entry>   <env-entry-name>solr/home</env-entry

Remove diacritics at index time into Solr

核能气质少年 提交于 2020-01-25 05:55:17
问题 I am working on a Solr search fine tuning. I'm using Solr 4.0. Normally, I worked with language analyzers and tokenizers for English language, however this time I'm working with Portuguese language and I'm facing issue as it doesn't really give the expected result I need. For example: I'm searching for word 'proteses' but what is indexed is 'próteses' which is with diacritics. So it gives wrong results! What I need to do is remove all diacritics before indexing and search, so it gives correct

Indexing documents using Solr results in Expected mime type application/octet-stream but got text/html

∥☆過路亽.° 提交于 2020-01-24 18:03:37
问题 What I am trying to do is to index document using Solr. I have installed and started Solr server on a Windows environment and I am trying to index using SolrJ. However when I try to add the solr document to the server as shown below it results in the an error server.add(indexDoc); Error Error from server at http://localhost:8983/solr: Expected mime type application/octet-stream but got text/html <body><h2>HTTP ERROR 404</h2> <p>Problem accessing /solr/update. Reason: <pre> Not Found</pre></p>

Indexing documents using Solr results in Expected mime type application/octet-stream but got text/html

五迷三道 提交于 2020-01-24 18:03:10
问题 What I am trying to do is to index document using Solr. I have installed and started Solr server on a Windows environment and I am trying to index using SolrJ. However when I try to add the solr document to the server as shown below it results in the an error server.add(indexDoc); Error Error from server at http://localhost:8983/solr: Expected mime type application/octet-stream but got text/html <body><h2>HTTP ERROR 404</h2> <p>Problem accessing /solr/update. Reason: <pre> Not Found</pre></p>

Lucene Query on a DateField indexed by Solr

北城余情 提交于 2020-01-24 11:01:49
问题 We are using a solr index for various search applications. In most cases we use it just as you would with the admin interface. for example: +text:Mr +text:burns +publish_date[2012-09-10T00:00:00Z TO 2012-10-10T00:00:00Z] This works fine. My problem is that in one app we use complex lucene Queries directly against the index (without using solr) and in these queries i cant find how to search on a date field. In schema.xml: <field name="publish_date" type="date" indexed="true" stored="true"/> It

Lucene Query on a DateField indexed by Solr

随声附和 提交于 2020-01-24 11:01:48
问题 We are using a solr index for various search applications. In most cases we use it just as you would with the admin interface. for example: +text:Mr +text:burns +publish_date[2012-09-10T00:00:00Z TO 2012-10-10T00:00:00Z] This works fine. My problem is that in one app we use complex lucene Queries directly against the index (without using solr) and in these queries i cant find how to search on a date field. In schema.xml: <field name="publish_date" type="date" indexed="true" stored="true"/> It

Solr环境搭建

删除回忆录丶 提交于 2020-01-24 05:46:27
Solr搜索引起环境的搭建 solr下载 http://archive.apache.org/dist/lucene/solr solr环境搭建 1.将solr-4.9.1\dist\solr-4.9.1.war复制到tomcat的webapp目录下, 并将其名改为solr.war 2.复制solr-4.9.1\example\lib\ext目录下的所有jar包到tomcat的lib目录下 3.在本地计算机新建一个目录名字为solr_home(D:/solr_home), 然后solr-4.9.1\example\solr目录下的所有文件复制到solr_home目录下 4.启动tomcat将solr.war解压,然后关闭tomcat,然后将solr.war删除。 5.修改solr项目的web.xml文件(D:\apache-tomcat-7.0.84\webapps\solr\WEB-INF\web.xml) < env-entry > < env-entry-name > solr/home </ env-entry-name > < env-entry-value > D:/solr_home </ env-entry-value > < env-entry-type > java.lang.String </ env-entry-type > </ env-entry > 6

docker 安装solr8.4.0 配置IK分词

戏子无情 提交于 2020-01-23 23:55:05
Docker 安装solr8.4.0 docker拉取solr docker pull solr 创建并运行的solr容器 docker run --name solr -d -p 8983:8983 solr 1.run 运行容器 2.-d 后台运行 3.-p 容器端口和宿机端口映射 4.-- name 容器名称 5. solr 指镜像名称 注意:如果没有开启防火墙则跳过,如果开启防火墙了需要执行下面的代码; 运行之后我们防火墙放行端口号的命令: firewall-cmd --zone=public --add-port=8983/tcp --permanent 加载刚刚防火墙放行端口: firewall-cmd --reload 使用服务器的需要去安全组设置端口即可: 创建核心 docker exec -it --user=solr solr bin/solr create_core -c Ik_core 之后我们可以在浏览器输入IP:8389 看看有没有页面出现; solr 配置IK中文分词 jar的百度网盘: https://pan.baidu.com/s/1ExTcCVfn_zltmGJDhxWhgQ 提取码: zxxp 推荐在本地先解压然后把四个jar传到linux //usr/local/IK (IK需要自己创建文件夹) mkdir -p /usr/local/IK

How to index data in a specific shard using solrj

陌路散爱 提交于 2020-01-23 13:35:13
问题 I am using solrj as client to index documents into solr cloud (Using solr4.5) I had a requirement to save documents based on tenant_id, so i am trying to do document routing . Which is possible only if the collection is created using numShards parameter (http://searchhub.org/2013/06/13/solr-cloud-document-routing/) I have two instances of solr in solr cloud(example1/solr and example2/solr) and exrenal zookeeper which is running in 2181 port. Both the instances consist collection called