solr | 易学教程

Could not connect to ZooKeeper using Solr in localhost

阅读更多关于 Could not connect to ZooKeeper using Solr in localhost

问题 I'm using Solr 6 and I'm trying to populate it. Here's the main scala I put in place : object testChildDocToSolr { def main(args: Array[String]): Unit = { setProperty("hadoop.home.dir", "c:\\winutils\\") val sparkSession = SparkSession.builder() .appName("spark-solr-tester") .master("local") .config("spark.ui.enabled", "false") .config("spark.default.parallelism", "1") .getOrCreate() val sc = sparkSession.sparkContext val collectionName = "testChildDocument" val testDf = sparkSession.read

Autoscaling solr - Add pull replicas, not NRT replicas

阅读更多关于 Autoscaling solr - Add pull replicas, not NRT replicas

问题 I have a specific requirement where I just want to create only pull replicas using the solr autoscaling feature whenever the cluster starts the recovey process after a node failure. However Using solrcloud autoscaling creates NRT type replicas when node goes down and brought up. I have gone through the examples given in the policy specifcation list : https://lucene.apache.org/solr/guide/7_4/solrcloud-autoscaling-policy-preferences.html#policy-specification but I am not able to find a example

Multiword synonyms with Solr and Hibernate Search

阅读更多关于 Multiword synonyms with Solr and Hibernate Search

问题 I have a synonyms.txt file with content as below car accessories, gadi marmat and I am indexing car accessories as a single token so that it will expand to car accessories and gadi marmat . i want the whole synonyms to match so that when query for gadi marmat , the record with car accessories to be returned. I am using shingle filter factory to expand query so that when searching for gadi marmat , it will be expanded to gadi , gadi marmat and marmat , and since gadi marmat is queried as a

solr DIH: RegExTransformer

阅读更多关于 solr DIH: RegExTransformer

问题 Currently, I need to apply a transformation on bellow third column: ACAC | 0 | 01 ACAC | 0 | 0101 ACAC | 0 | 0102 ACAC | 0 | 010201 I need to transform "010201" to "01/02/01" . So first I need to: trim all ending 0 characters split each 2 numbers and add "/" character. The context of this transformation is inside solr data import handler transformers, but it's using java regex library internally. Is there anyway to get that? I've tried using this regex: Currently, I need to apply a

Sort Facets by Index with non-ASCII values

阅读更多关于 Sort Facets by Index with non-ASCII values

问题 We have a field 'facet_tag' that contains tags describing a product. Since the tags are in german, they may contain non-ASCII characters (like umlauts). Here are some possible values: "Zelte" "Tunnelzelte" "Äxte" "Sägen" "Softshells" Now if we query solr for the facets with a query like: http://<solr_host>:<solr_port>/solr/select?q=*&facet=on&facet.field=facet_tag&facet.sort=index The sorted result looks like this: <lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"

Position Based Rank in SOLR

阅读更多关于 Position Based Rank in SOLR

问题 I need to sort SOLR search results based on position of search query.For example I have 4 documents 1.demo of solr lucene 2.lucene focuses mainly on text indexing 3.explain lucene with example 4.lucene is an open source when I will search with query text lucene then I need result in following order 2.lucene focuses mainly on text indexing 4.lucene is an open source 3.explain lucene with example 1.demo of solr lucene i.e. boost search query in first and second position higher than other

Solr TF vs All Terms match

阅读更多关于 Solr TF vs All Terms match

问题 I have observed that Solr/Lucene gives too much weightage to matching all the query terms over tf of a particular query term. e.g. Say our query is : text: ("red" "jacket" "red jacket") Document A -> contains "jacket" 40 times Document B -> contains "red jacket" 1 time (and because of this "red" 1 time and "jacket" 1 time as well) Document B is getting much higher score as its containing all the three terms of the query but just once whereas Document A is getting very low score even though it

how to get a list of all tokens from Solr/Lucene index?

阅读更多关于 how to get a list of all tokens from Solr/Lucene index?

问题 I am wondering is there a way to get all tokens from particular record in Lucene/Solr index? Thank you. 回答1: You can use IndexReader.terms() to get an enumeration of all terms that occur in the inverted index. This method returns a TermEnum. 来源： https://stackoverflow.com/questions/4356037/how-to-get-a-list-of-all-tokens-from-solr-lucene-index

Jersey web service proxy

阅读更多关于 Jersey web service proxy

问题 I am trying to implement a web service that proxies another service that I want to hide from external users of the API. Basically I want to play the middle man to have ability to add functionality to the hidden api which is solr. I have to following code: @POST @Path("/update/{collection}") public Response update(@PathParam("collection") String collection, @Context Request request) { //extract URL params //update URL to target internal web service //put body from incoming request to outgoing

针对海量数据和高并发的主要解决方案

阅读更多关于针对海量数据和高并发的主要解决方案

一、网站应用背景开发一个网站的应用程序，当用户规模比较小的时候，使用简单的：一台应用服务器+一台数据库服务器+一台文件服务器，这样的话完全可以解决一部分问题，也可以通过堆硬件的方式来提高网站应用的访问性能，当然，也要考虑成本的问题。当问题的规模在经济条件下通过堆硬件的方式解决不了的时候，我们应该通过其他的思路去解决问题，互联网发展至今，已经提供了很多成熟的解决方案，但并不是都具有适用性，你把淘宝的技术全部都搬过来也不一定达到现在淘宝的水平，道理很简单。 2.1、海量数据的解决方案：使用缓存；页面静态化技术；数据库优化；分离数据库中活跃的数据；批量读取和延迟修改；读写分离；使用NoSQL和Hadoop等技术；分布式部署数据库；应用服务和数据服务分离；使用搜索引擎搜索数据库中的数据；进行业务的拆分； 2.2、高并发情况下的解决方案：应用程序和静态资源文件进行分离；页面缓存；集群与分布式；反向代理； CDN；三、海量数据的解决方案 3.1、使用缓存网站访问数据的特点大多数呈现为“二八定律”：80%的业务访问集中在20%的数据上。例如：在某一段时间内百度的搜索热词可能集中在少部分的热门词汇上；新浪微博某一时期也可能大家广泛关注的主题也是少部分事件。总的来说就是用户只用到了总数据条目的一小部分，当网站发展到一定规模，数据库IO操作成为性能瓶颈的时候