I\'ve just ventured into the seemingly simple but extremely complex world of searching. For an application, I am required to build a search mechanism for searching users by
We created a simple 'name' field type that allows mixing both 'key' (e.g., SOUNDEX) and 'pairwise' portions of the answers above.
Here's the overview:
Here's the core of its implementation...
List createFields(SchemaField field, String name) {
Collection nameFields = deriveFieldsForName(name);
List docFields = new ArrayList<>();
for (FieldSpec fs : nameFields) {
docFields.add(new Field(fs.getName(), fs.getStringValue(),
fs.getLuceneField()));
}
docFields.add(createDocValues(field.getName(), new Name(name)));
return docFields;
}
The heart of this is deriveFieldsForName(name) in which you can include 'keys' from PhoneticFilters, LowerCaseFolding, etc.
Here's the core of its implementation...
public Query getFieldQuery(QParser parser, SchemaField field, String val) {
Name name = parseNameString(externalVal, parser.getParams());
QuerySpec querySpec = buildQuery(name);
return querySpec.accept(new SolrQueryVisitor(field.getName()));
}
The heart of this is the buildQuery(name) method which should produce a query that is aware of deriveFieldsForName(name) above so for a given query name it will find good candidate names.
Here's what this looks like in your query...
&rq={!myRerank reRankQuery=$rrq} &rrq={!func}myMatch(fieldName, "John Doe")
The content of myMatch could have a pairwise Levenstein or Jaro-Winkler implementation.
N.B. Our own full implementation uses proprietary code for deriveFieldsForName, buildQuery, and myMatch (see http://www.basistech.com/text-analytics/rosette/name-indexer/) to handle more kinds of variations that the ones mentioned above (e.g., missing spaces, cross-language).