solr

Could not connect to ZooKeeper using Solr in localhost

自古美人都是妖i 提交于 2021-01-28 11:38:21
问题 I'm using Solr 6 and I'm trying to populate it. Here's the main scala I put in place : object testChildDocToSolr { def main(args: Array[String]): Unit = { setProperty("hadoop.home.dir", "c:\\winutils\\") val sparkSession = SparkSession.builder() .appName("spark-solr-tester") .master("local") .config("spark.ui.enabled", "false") .config("spark.default.parallelism", "1") .getOrCreate() val sc = sparkSession.sparkContext val collectionName = "testChildDocument" val testDf = sparkSession.read

Autoscaling solr - Add pull replicas, not NRT replicas

*爱你&永不变心* 提交于 2021-01-28 11:22:04
问题 I have a specific requirement where I just want to create only pull replicas using the solr autoscaling feature whenever the cluster starts the recovey process after a node failure. However Using solrcloud autoscaling creates NRT type replicas when node goes down and brought up. I have gone through the examples given in the policy specifcation list : https://lucene.apache.org/solr/guide/7_4/solrcloud-autoscaling-policy-preferences.html#policy-specification but I am not able to find a example

Multiword synonyms with Solr and Hibernate Search

女生的网名这么多〃 提交于 2021-01-28 08:05:47
问题 I have a synonyms.txt file with content as below car accessories, gadi marmat and I am indexing car accessories as a single token so that it will expand to car accessories and gadi marmat . i want the whole synonyms to match so that when query for gadi marmat , the record with car accessories to be returned. I am using shingle filter factory to expand query so that when searching for gadi marmat , it will be expanded to gadi , gadi marmat and marmat , and since gadi marmat is queried as a

solr DIH: RegExTransformer

拜拜、爱过 提交于 2021-01-28 05:12:19
问题 Currently, I need to apply a transformation on bellow third column: ACAC | 0 | 01 ACAC | 0 | 0101 ACAC | 0 | 0102 ACAC | 0 | 010201 I need to transform "010201" to "01/02/01" . So first I need to: trim all ending 0 characters split each 2 numbers and add "/" character. The context of this transformation is inside solr data import handler transformers, but it's using java regex library internally. Is there anyway to get that? I've tried using this regex: Currently, I need to apply a

Sort Facets by Index with non-ASCII values

混江龙づ霸主 提交于 2021-01-28 03:26:31
问题 We have a field 'facet_tag' that contains tags describing a product. Since the tags are in german, they may contain non-ASCII characters (like umlauts). Here are some possible values: "Zelte" "Tunnelzelte" "Äxte" "Sägen" "Softshells" Now if we query solr for the facets with a query like: http://<solr_host>:<solr_port>/solr/select?q=*&facet=on&facet.field=facet_tag&facet.sort=index The sorted result looks like this: <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"

Position Based Rank in SOLR

只愿长相守 提交于 2021-01-28 01:57:33
问题 I need to sort SOLR search results based on position of search query.For example I have 4 documents 1.demo of solr lucene 2.lucene focuses mainly on text indexing 3.explain lucene with example 4.lucene is an open source when I will search with query text lucene then I need result in following order 2.lucene focuses mainly on text indexing 4.lucene is an open source 3.explain lucene with example 1.demo of solr lucene i.e. boost search query in first and second position higher than other

Solr TF vs All Terms match

你。 提交于 2021-01-27 17:23:19
问题 I have observed that Solr/Lucene gives too much weightage to matching all the query terms over tf of a particular query term. e.g. Say our query is : text: ("red" "jacket" "red jacket") Document A -> contains "jacket" 40 times Document B -> contains "red jacket" 1 time (and because of this "red" 1 time and "jacket" 1 time as well) Document B is getting much higher score as its containing all the three terms of the query but just once whereas Document A is getting very low score even though it

how to get a list of all tokens from Solr/Lucene index?

微笑、不失礼 提交于 2021-01-27 10:21:59
问题 I am wondering is there a way to get all tokens from particular record in Lucene/Solr index? Thank you. 回答1: You can use IndexReader.terms() to get an enumeration of all terms that occur in the inverted index. This method returns a TermEnum. 来源: https://stackoverflow.com/questions/4356037/how-to-get-a-list-of-all-tokens-from-solr-lucene-index

Jersey web service proxy

不羁岁月 提交于 2021-01-27 06:58:48
问题 I am trying to implement a web service that proxies another service that I want to hide from external users of the API. Basically I want to play the middle man to have ability to add functionality to the hidden api which is solr. I have to following code: @POST @Path("/update/{collection}") public Response update(@PathParam("collection") String collection, @Context Request request) { //extract URL params //update URL to target internal web service //put body from incoming request to outgoing

针对海量数据和高并发的主要解决方案

百般思念 提交于 2021-01-27 02:06:31
一、网站应用背景 开发一个网站的应用程序,当用户规模比较小的时候,使用简单的:一台应用服务器+一台数据库服务器+一台文件服务器,这样的话完全可以解决一部分问题,也可以通过堆硬件的方式来提高网站应用的访问性能,当然,也要考虑成本的问题。 当问题的规模在经济条件下通过堆硬件的方式解决不了的时候,我们应该通过其他的思路去解决问题,互联网发展至今,已经提供了很多成熟的解决方案,但并不是都具有适用性,你把淘宝的技术全部都搬过来也不一定达到现在淘宝的水平,道理很简单。 2.1、海量数据的解决方案: 使用缓存; 页面静态化技术; 数据库优化; 分离数据库中活跃的数据; 批量读取和延迟修改; 读写分离; 使用NoSQL和Hadoop等技术; 分布式部署数据库; 应用服务和数据服务分离; 使用搜索引擎搜索数据库中的数据; 进行业务的拆分; 2.2、高并发情况下的解决方案: 应用程序和静态资源文件进行分离; 页面缓存; 集群与分布式; 反向代理; CDN; 三、海量数据的解决方案 3.1、使用缓存 网站访问数据的特点大多数呈现为“二八定律”:80%的业务访问集中在20%的数据上。 例如:在某一段时间内百度的搜索热词可能集中在少部分的热门词汇上;新浪微博某一时期也可能大家广泛关注的主题也是少部分事件。 总的来说就是用户只用到了总数据条目的一小部分,当网站发展到一定规模,数据库IO操作成为性能瓶颈的时候