Building Examine (lucene.net) index with comma separated list of IDs

问题

I have an Umbraco website that is using Examine search which is based on lucene.net. I am pretty much trying to do exactly what is described in the following article:

Querying against a comma separated list of IDs with Examine and Lucene.Net?

The problem I have is when I am trying to create the index using the following code:

// Loop through articles
        foreach (var a in articles)
        {
            yield return new SimpleDataSet()
            {
                NodeDefinition = new Examine.IndexedNode()
                {
                    NodeId = a.Id,
                    Type = "Article"

                },
                RowData = new Dictionary<string, string>()
                {
                    {"Name", a.Name},
                    {"Url", a.NiceUrl},
                    {"Category", "1234"},
                    {"Category", "5678"}
                }
            };
        }

I am receiving the following error:

An item with the same key has already been added.

Does anyone know how I can get around this issue?

回答1:

Here is a full example of doing it in lucene, however as said Examine seems to limit the flexiblity by having input in a Dictionary. However changing examine to handle it should be simple.

public static void Main (string[] args)
    {
        Analyzer analyser = new StandardAnalyzer (Lucene.Net.Util.Version.LUCENE_CURRENT);
        Directory dir = new RAMDirectory ();

        using (IndexWriter iw = new IndexWriter (dir, analyser, Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED)) {

            Document doc1 = new Document ();
            doc1.Add (new Field("title", "multivalued", Field.Store.YES, Field.Index.ANALYZED));
            doc1.Add (new Field("multival", "val1", Field.Store.YES, Field.Index.ANALYZED));
            doc1.Add (new Field("multival", "val2", Field.Store.YES, Field.Index.ANALYZED));
            iw.AddDocument (doc1);
            Document doc2 = new Document ();
            doc2.Add (new Field("title", "singlevalued", Field.Store.YES, Field.Index.ANALYZED));
            doc2.Add (new Field("multival", "val1", Field.Store.YES, Field.Index.ANALYZED));        
            iw.AddDocument (doc2);
        }

        using (Searcher searcher = new IndexSearcher (dir, true)) {
            var q1 = new TermQuery (new Term ("multival", "val1"));
            var q1result = searcher.Search (q1, 1000);

            //Will print "Found 2 documents"
            Console.WriteLine ("Found {0} documents", q1result.TotalHits);

            var q2 = new TermQuery (new Term ("multival", "val2"));
            var q2result = searcher.Search (q2, 1000);
            //Will print "Found 1 documents"
            Console.WriteLine ("Found {0} documents", q2result.TotalHits);
        }
    }

回答2:

The next version of Examine (v2) will support this properly, with any luck that might be out within a couple months but that's really just dependent on how much time we get.

In the meantime, you could use the DocumentWriting event on your indexer which gives you direct access to the Lucene Document, then you can index however you like. So you could initially have a comma separated list of ids for your categories and during this event you could split them and add them as individual values in Lucene.

回答3:

The error you are seeing is a restriction of .NET's Dictionary<TKey, TValue> class as mentioned by @DavidH. The restriction is inherited from Examine's SimpleDataSet class, which by looking at the source only allows Dictionary<string, string> as a way of adding row data to a document.

However, a Lucene Document does allow you to add multiple fields with the same name as mentioned on the linked question:

using Lucene.Net.Documents;

var document = new Document();
document.Add(CreateField("Id", a.Id));
document.Add(CreateField("Name", a.Name));
document.Add(CreateField("Url", a.NiceUrl));        
document.Add(CreateField("Category", "1234"));    
document.Add(CreateField("Category", "5678"));    

...

private Field CreateField(string fieldName, string fieldValue)
{
    return new Field(
        fieldName, 
        fieldValue, 
        Field.Store.YES, 
        Field.Index.ANALYZED);
}

Although not as convenient as Examine's API, using Lucene natively is a lot more flexible for these scenarios.

回答4:

The dictionary keys must be unique, and this is not specific to Lucene but instead to the .NET Dictionary<TKey, TValue> class. One possible option is to pipe delimit the values under one "Category" dictionary key, and then split on the pipe character to parse them out:

RowData = new Dictionary<string, string>()
{
    {"Name", a.Name},
    {"Url", a.NiceUrl},
    {"Category", "1234|5678"}
}

You could then use string.Split on the pipe character '|' to parse them back out.

来源：https://stackoverflow.com/questions/16796975/building-examine-lucene-net-index-with-comma-separated-list-of-ids

标签

ASP.NET

umbraco

lucene.net