trigram

Optimizing a postgres similarity query (pg_trgm + gin index)

喜夏-厌秋 提交于 2021-02-17 22:52:16
问题 I have defined the following index: CREATE INDEX users_search_idx ON auth_user USING gin( username gin_trgm_ops, first_name gin_trgm_ops, last_name gin_trgm_ops ); I am performing the following query: PREPARE user_search (TEXT, INT) AS SELECT username, email, first_name, last_name, ( -- would probably do per-field weightings here s_username + s_first_name + s_last_name ) rank FROM auth_user, similarity(username, $1) s_username, similarity(first_name, $1) s_first_name, similarity(last_name, $1

Poor Performance when trigram similarity and full-text-search were combined with Q ind django using postgres

落爺英雄遲暮 提交于 2021-01-04 07:50:31
问题 I'm creating a web application to search people with their properties such as education, experience, etc. I can't use full-text-search for all the fields, because, some has to be a fuzzy match. (Eg: if we search for biotech, it should pick bio tech, biotech and also bio-tech). My database has about 200 entries in the profile model, which is to appear in the search results. Other models like education and experience are connected to profile through foreign key Therefore, I decided to be

PostgreSQL full text search abbreviations

筅森魡賤 提交于 2020-04-16 11:48:12
问题 I created a Postgresql full text search using 'german'. How can I configer, that when I search for "Bezirk", lines containing "Bez." are also a match? (And vice-versa) 回答1: @pozs is right. You need to use a synonym dictionary. 1 - In the directory $SHAREDIR/tsearch_data create the file german.syn with the following contents: Bez Bezirk 2 - Execute the query: CREATE TEXT SEARCH DICTIONARY german_syn ( template = synonym, synonyms = german); CREATE TEXT SEARCH CONFIGURATION german_syn(COPY=

PostgreSQL full text search abbreviations

£可爱£侵袭症+ 提交于 2020-04-16 11:47:40
问题 I created a Postgresql full text search using 'german'. How can I configer, that when I search for "Bezirk", lines containing "Bez." are also a match? (And vice-versa) 回答1: @pozs is right. You need to use a synonym dictionary. 1 - In the directory $SHAREDIR/tsearch_data create the file german.syn with the following contents: Bez Bezirk 2 - Execute the query: CREATE TEXT SEARCH DICTIONARY german_syn ( template = synonym, synonyms = german); CREATE TEXT SEARCH CONFIGURATION german_syn(COPY=

Improving performance with a Similarity Postgres fuzzy self join query

丶灬走出姿态 提交于 2020-01-02 07:21:28
问题 I am trying to run a query that joins a table against itself and does fuzzy string comparison (using trigram comparisons) to find possible company name matches. My goal is to return records where the trigram similarity of one record's company name (ref_name field) matches another record's company name. Currently, I have my threshold set to 0.9 so it will only bring back matches that are very likely to contain the a similar string. I know that self joins can result in many comparisons by

How can I get words after and before a specific token?

空扰寡人 提交于 2019-12-25 03:53:38
问题 I currently work on a project which is simply creating basic corpus databases and tokenizes texts. But it seems I am stuck in a matter. Assume that we have those things: import os, re texts = [] for i in os.listdir(somedir): # Somedir contains text files which contain very large plain texts. with open(i, 'r') as f: texts.append(f.read()) Now I want to find the word before and after a token. myToken = 'blue' found = [] for i in texts: fnd = re.findall('[a-zA-Z0-9]+ %s [a-zA-Z0-9]+|\. %s [a-zA

Improving performance with a Similarity Postgres fuzzy self join query

空扰寡人 提交于 2019-12-05 23:26:48
I am trying to run a query that joins a table against itself and does fuzzy string comparison (using trigram comparisons) to find possible company name matches. My goal is to return records where the trigram similarity of one record's company name (ref_name field) matches another record's company name. Currently, I have my threshold set to 0.9 so it will only bring back matches that are very likely to contain the a similar string. I know that self joins can result in many comparisons by nature, but I want to optimize my query the best I can. I don't need results instantaneously, but currently

How to perform trigram operations in Google BigQuery?

谁说胖子不能爱 提交于 2019-12-04 04:44:18
问题 I do use the pg_trgm module in PostgreSQL to calculate similarity between two strings using trigrams. Particularly I use: similarity(text, text) Which returns returns a number that indicates how similar the two arguments are (between 0 and 1). How can I perform similarity function (or equivalent) on Google BigQuery? 回答1: Try below. At least as a blueprint for enhancing SELECT text1, text2, similarity FROM JS( // input table ( SELECT * FROM (SELECT 'mikhail' AS text1, 'mikhail' AS text2),

How to perform trigram operations in Google BigQuery?

旧城冷巷雨未停 提交于 2019-12-02 01:47:22
I do use the pg_trgm module in PostgreSQL to calculate similarity between two strings using trigrams. Particularly I use: similarity(text, text) Which returns returns a number that indicates how similar the two arguments are (between 0 and 1). How can I perform similarity function (or equivalent) on Google BigQuery? Try below. At least as a blueprint for enhancing SELECT text1, text2, similarity FROM JS( // input table ( SELECT * FROM (SELECT 'mikhail' AS text1, 'mikhail' AS text2), (SELECT 'mikhail' AS text1, 'mike' AS text2), (SELECT 'mikhail' AS text1, 'michael' AS text2), (SELECT 'mikhail'