lucene | 易学教程

Solr search with Mysql Database, any utility for data importing

阅读更多关于 Solr search with Mysql Database, any utility for data importing

问题 We are looking at ways of improving "search" functionality in our large business application which currently uses SQL Like syntax to do it. So we started evaluating Solr server and were able to index few of our database tables and search. But I am newbie and wanted to know if 1) We have large number of tables in our application. Is there any utility that generates schema xml in solr using the database tables? 2) Our current search lists the database row that meets the search criteria (this

Client side javascript equivalent to Lucene

阅读更多关于 Client side javascript equivalent to Lucene

问题 I was wondering if there's a Javascript equivalent to the Lucene API, designed to be used on client side to index a relatively small data set. An example use case would be a static site (generated for instance) with the ability to search content without server side processing. 回答1: I've found this : http://lunrjs.com/ It looks like what I'm searching for but doesn't seem to support fuzzy searches. 回答2: Theoretically, you could use Search-index in conjunction with node-browserify or another

eXist-db - basic Lucene query returns empty sequence

阅读更多关于 eXist-db - basic Lucene query returns empty sequence

问题 In eXist-db 4.4 I am attempting to implement a basic Lucene query structure, but it is returning no results. In /db/apps/deheresi/data I have a collection of tei-xml documents which have the same structure, and I want to apply my query only to the text content found within the element tei:seg and its descendants. A typical sample would be: <TEI> <text> [...] <seg type="dep_event" subtype="event" xml:id="MS609-0001-1"> <pb n="1r"/> <lb break="n" n="1"/> <date type="deposition_date" when="1245

Solr / rdbms, where to store additonal data

阅读更多关于 Solr / rdbms, where to store additonal data

问题 What would be considered best practice when you need additional data about facet results. ie. i need a friendlyname / image / meta keywords / description / and more.. for product categories. (when faceting on categories) include it in the document? (can lead to looots of duplication) introduce category as a new index in solr (or fake by doctype=category field in solr) use a rdbms to lookup additional data using a SELECT WHERE IN (..category facet result ids..) Thanks, Remco 回答1: I would think

Neo4j: Native Java API(or equivalent cypher query) in Spring Data Neo4j

阅读更多关于 Neo4j: Native Java API(or equivalent cypher query) in Spring Data Neo4j

问题 I was experimenting with Neo4j embedded-db in the past few days for a DEMO and was thoroughly impressed with the Native Java API's for indexing, lucene queries and even managed to do fuzzy search. I then decided to take this POC to production with Spring Data Neo4j 4.0 but ran into issues with Cypher queries and fuzzy search. My domain class "Team" looks like this: @NodeEntity public class Team { @GraphId Long nodeId; /** The team name. */ @Indexed(indexType = IndexType.FULLTEXT,indexName =

German 'ue' -> 'u' conversion in Lucene

阅读更多关于 German 'ue' -> 'u' conversion in Lucene

问题 I have two questions regarding handling German umlauts in Lucene: I'm trying to find a way to convert German Umlauts written as 'ue', 'ae', etc to folded form 'u', 'a' and so on. This is done by GermanAnalyzer (and German2StemFilter used by it), but unfortunately it also does stemming which is very undesired in my case. Is there any other filter that can do only the 'ue' -> 'u' conversion? Is there any filter that does 'ü' -> 'ue' (NOT 'u' like ASCIIFoldingFilter does) conversion? What I'm

Create analyzer with Edge N Gram analyzer and char filter which replaces space with new line

阅读更多关于 Create analyzer with Edge N Gram analyzer and char filter which replaces space with new line

问题 I have below type of text coming in. foo bar , hello world etc. I created an analyzer using Edge NGram tokenizer and using the analyze api it creates below token. { "tokens": [ { "token": "f", "start_offset": 0, "end_offset": 1, "type": "word", "position": 1 }, { "token": "fo", "start_offset": 0, "end_offset": 2, "type": "word", "position": 2 }, { "token": "foo", "start_offset": 0, "end_offset": 3, "type": "word", "position": 3 }, { "token": "b", "start_offset": 4, "end_offset": 5, "type":

Boosting an elasticsearch result based on a boolean field value

阅读更多关于 Boosting an elasticsearch result based on a boolean field value

问题 I'm getting a lot of "static" when searching for the correct way to boost a result when a "Boolean" field type is TRUE, most results are talking about boolean searches. N.B. We're using the php elastica library but if you can only provide json that's fine, I can manage from that. I have an index with 5 fields where we have some built in boosting going on as you can see here: array( 'article_id' => array('type' => 'string', 'include_in_all' => FALSE), 'subject' => array('type' => 'string',

Rebuild of label index in neo4j

阅读更多关于 Rebuild of label index in neo4j

问题 My Neo4J instance suddenly stopped working, I think my drive ran out of space due to some unrelated logfiles. Anyway, now I cannot start Neo4J, it start, over and over again. If i check the consistency of the database I get the following message. (It does not work neither on version 3.3.5 or 3.4.1) WARN : Label index was not properly shutdown and rebuild is required. Label index: neostore.labelscanstore.db WARN : Index was not properly shutdown and rebuild is required. Index[ IndexRule[id=1,

[Lucene]-Lucene基本概述以及简单实例

阅读更多关于 [Lucene]-Lucene基本概述以及简单实例

一、Lucene基本介绍：基本信息：Lucene 是 Apache 软件基金会的一个开放源代码的全文检索引擎工具包，是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎。Lucene 的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。文件结构：自上而下树形展开，一对多。索引Index：相当于库或者表。段Segment:相当于分库或者分表。文档Document:相当一条数据，如小说吞噬星空域Field：一片文档可以分为多个域，相当于字段，如：小说作者，标题，内容。。。词元Term:一个域又可以分为多个词元，词元是做引搜索的最小单位，标准分词下得词元是一个个单词和汉字。正向信息：索引->段->文档->域->词反向信息：词->文档。　二、Lucene全文检索： 1、数据分类：结构化数据：数据库，固定长度和格式的数据。半结构化数据：如xml,html,等..。非结构化数据：长度和格式都不固定的数据，如文本... 2、检索过程：Luncene检索过程可以分为两个部分，一个部分是上图左侧结构化，半结构化，非结构化数据的索引建立过程，另一部分是右侧索引查询过程。索引过程：有一系列被索引文件被索引文件经过语法分析和语言处理形成一系列词(Term)。