lucene | 易学教程

Solr/Lucene Scorer

阅读更多关于 Solr/Lucene Scorer

问题 We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring. Problem is that they want scores that make results fall in buckets: Bucket 1: exact match on category (score = 4) Bucket 2: exact match on name (score = 3) Bucket 3: partial match on category (score = 2) Bucket 4: partial match on name (score = 1) First thing we did was develop a custom similarity class that would return the correct score

What is the default list of stopwords used in Lucene's StopFilter?

阅读更多关于 What is the default list of stopwords used in Lucene's StopFilter?

问题 Lucene have a default stopfilter (http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html), does anyone know which are words in the list? 回答1: The default stop words set in StandardAnalyzer and EnglishAnalyzer is from StopAnalyzer.ENGLISH_STOP_WORDS_SET , and they are: "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these"

Is Solr 4.0 capable of using 'join" for multiple core?

阅读更多关于 Is Solr 4.0 capable of using 'join" for multiple core?

问题 I notice Solr 4.0 has introduced 'join' feature for documents having relationships. this is great, however, I notice examples given by http://wiki.apache.org/solr/Join are for single core which all documents are in single index. Does anybody know if I can use 'join' for multiple core? 回答1: This comment says it's possible by using: {!join from=fromField to=toField fromIndex=fromCoreName}fromQuery I tried it myself, and here's a more detailed example: Have two cores brands {id,name} products

How to control Indexing a field in lucene 4.0

阅读更多关于 How to control Indexing a field in lucene 4.0

问题 Until Lucene version 3.9 , we could specify to index or not to index a field by using FIELD.INDEX.NO or FIELD.INDEX.ANALYZED etc. But in lucene 4.0 there is no constructor available, in which we may define this . How do we control indexing in this version? I mean if i want a field "name" to be stored in index but doesn't want to index it, then how can i do it in lucene 4.0? 回答1: Constructors taking Field.Index arguments are available, but are deprecated in 4.0, and should not be used. Instead

How to wisely combine shingles and edgeNgram to provide flexible full text search?

阅读更多关于 How to wisely combine shingles and edgeNgram to provide flexible full text search?

问题 We have an OData-compliant API that delegates some of its full text search needs to an Elasticsearch cluster. Since OData expressions can get quite complex, we decided to simply translate them into their equivalent Lucene query syntax and feed it into a query_string query. We do support some text-related OData filter expressions, such as: startswith(field,'bla') endswith(field,'bla') substringof('bla',field) name eq 'bla' The fields we're matching against can be analyzed , not_analyzed or

ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? [closed]

阅读更多关于 ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . I'm currently looking at other search methods rather than having a huge SQL query. I saw elasticsearch recently and played with whoosh

Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]

阅读更多关于 Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . I'm looking for a stand-alone full-text search server with the following properties: Must operate as a stand-alone server that can

Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]

阅读更多关于 Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . I'm looking for a stand-alone full-text search server with the following properties: Must operate as a stand-alone server that can

Can't open lucene index (Java heap space)

阅读更多关于 Can't open lucene index (Java heap space)

问题 I want to grab some data from lucene index file. But I can't read it. I try to use Luke , but it always crashes with java.lang.OutOfMemoryError: Java heap space . Note -Xmx can't help me. I try -Xmx512, -Xmx1024 and even -Xmx2048. I try to use Solr also, but gets java.lang.OutOfMemoryError: Java heap space too. Any ideas how I can extract some data from Lucene? P. S. I use lucene 2.3.0. My index file is 1.8 Gb size. 回答1: What size is the data you are trying to fetch? Maybe the result set is

Elasticsearch学习笔记-三、Elasticsearch核心概念

阅读更多关于 Elasticsearch学习笔记-三、Elasticsearch核心概念

课程大纲 1、lucene和elasticsearch的前世今生 2、elasticsearch的核心概念 3、elasticsearch核心概念 vs. 数据库核心概念 ---------------------------------------------------------------------------------------------------------------------------------------- 1、lucene和elasticsearch的前世今生 lucene，最先进、功能最强大的搜索库，直接基于lucene开发，非常复杂，api复杂（实现一些简单的功能，写大量的java代码），需要深入理解原理（各种索引结构） elasticsearch，基于lucene，隐藏复杂性，提供简单易用的restful api接口、java api接口（还有其他语言的api接口）（1）分布式的文档存储引擎（2）分布式的搜索引擎和分析引擎（3）分布式，支持PB级数据开箱即用，优秀的默认参数，不需要任何额外设置，完全开源关于elasticsearch的一个传说，有一个程序员失业了，陪着自己老婆去英国伦敦学习厨师课程。程序员在失业期间想给老婆写一个菜谱搜索引擎，觉得lucene实在太复杂了，就开发了一个封装了lucene的开源项目，compass