Neo4j Fast way to match fuzzy text property

与世无争的帅哥 提交于 2019-12-03 07:48:24

In Neo4j 3.5 (currently on beta03), there are FTS (Full-Text Search) capabilities.

EDIT : I have written a detailed blog post about FTS in Neo4j : https://graphaware.com/neo4j/2019/01/11/neo4j-full-text-search-deep-dive.html

You can query then your documents using the Lucene Classic Query Parser Syntax.

Create the index :

CALL db.index.fulltext.createNodeIndex('documents', ['Document'], ['title','text'])

Import some documents :

LOAD CSV WITH HEADERS FROM "file:///docs.csv" AS row
CREATE (n:Document) SET n = row

Query documents with title containing "heavy toll"

CALL db.index.fulltext.queryNodes('documents', 'title: "heavy toll"')
YIELD node, score
RETURN node.title, score

╒══════════════════════════════════════════════════════════════════════╤══════════════════╕
│"node.title"                                                          │"score"           │
╞══════════════════════════════════════════════════════════════════════╪══════════════════╡
│"Among Deaths in 2016, a Heavy Toll in Pop Music - The New York Times"│3.7325966358184814│
└──────────────────────────────────────────────────────────────────────┴──────────────────┘

Query for same title with a typo :

CALL db.index.fulltext.queryNodes('documents', 'title: \\"heavy~ tall~\\"')
YIELD node, score
RETURN node.title, score

Notice the escaping of the quotes => \" , the string passed to the underlying parser should contain the quotes in order to perform a phrase query instead of a boolean query.

Also the tidle next to the terms indicate to perform a fuzzy search using the Damarau-Levenshtein algo.

╒══════════════════════════════════════════════════════════════════════╤═════════════════════╕
│"node.title"                                                          │"score"              │
╞══════════════════════════════════════════════════════════════════════╪═════════════════════╡
│"Among Deaths in 2016, a Heavy Toll in Pop Music - The New York Times"│0.868073046207428    │
├──────────────────────────────────────────────────────────────────────┼─────────────────────┤
│"Prisons Run by C.E.O.s? Privatization Under Trump Could Carry a Heavy│0.4014900326728821   │
│ Price - The New York Times"                                          │                     │
├──────────────────────────────────────────────────────────────────────┼─────────────────────┤
│"‘All Talk,’ ‘No Action,’ Says Trump in Twitter Attack on Civil Rights│0.28181418776512146  │
│ Icon - The New York Times"                                           │                     │
├──────────────────────────────────────────────────────────────────────┼─────────────────────┤
│"Immigrants Head to Washington to Rally While Obama Is Still There - T│0.24634429812431335  │
│he New York Times"                                                    │                     │
├──────────────────────────────────────────────────────────────────────┼─────────────────────┤

Indexing as noted in the answer by Christophe Willemsen is definitely needed for speeding up the search but I would also like to point a another historic function that might be a better fit with your "fuzzy find":

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!