lucene.net

Lucene / Lucene.NET - Document.SetBoost() values?

时间秒杀一切 提交于 2019-11-30 03:56:52
问题 I know it takes in a float, but what are some typical values for various levels of boosting within a result? For example: If I wanted to boost a document's weighting by 10% then I should set it 1.1? For 20% then 1.2? What happens if I start setting boosts to values like 75.0? or 500.0? Edit: Fixed Formatting 回答1: Please see the Lucene Similarity Documentation for the formula. In principle, all other factors being equal, setting a document's boost to 1.1 will indeed give it a score that is 10%

Lucene - Wildcards in phrases

左心房为你撑大大i 提交于 2019-11-30 03:02:03
问题 I am currently attempting to use Lucene to search data populated in an index. I can match on exact phrases by enclosing it in brackets (i.e. "Processing Documents"), but cannot get Lucene to find that phrase by doing any sort of "Processing Document*". The obvious difference being the wildcard at the end. I am currently attempting to use Luke to view and search the index. (it drops the asterisk at the end of the phrase when parsing) Adding the quotes around the data seems to be the main

Can someone explain to me what this GetCardinality method is doing?

假装没事ソ 提交于 2019-11-30 01:00:40
I've been looking into faceted search with Lucene.NET, I've found a brilliant example here which explains a fair amount, apart from the fact that it completely overlooks the function which checks the cardinality of items in a bit array. Can anyone give me a run down of what it is doing? The main things I don't understand is why the bitsSetArray is created as it is, what it is used for and how all the if statements work in the for loop. This may be a big ask but I have to understand how this works before I can even think of using it in my own code. Thanks public static int GetCardinality

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

非 Y 不嫁゛ 提交于 2019-11-29 19:40:51
I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article). I built a medium-size knowledge base (maybe 2GB of indexed text) on top of SQL Server 2005's FTS in 2006, and have now moved it to 2008's iFTS. Both situations have worked well for me, but the move from 2005 to 2008 was actually an improvement for me. My situation was NOT like StackOverflow's in the sense that I was indexing data that was only refreshed nightly, however I was trying

Indexing .PDF, .XLS, .DOC, .PPT using Lucene.NET

青春壹個敷衍的年華 提交于 2019-11-29 19:26:59
I've heard of Lucene.Net and I've heard of Apache Tika . The question is - how do I index these documents using C# vs Java? I think the issue is that there is no .Net equivalent of Tika which extracts relevant text from these document types. UPDATE - Feb 05 2011 Based on given responses, it seems that the is not currently a native .Net equivalent of Tika. 2 interesting projects were mentioned that are each interesting in their own right: Xapian Project ( http://xapian.org/ ) - An alternative to Lucene written in unmanaged code. The project claims to support "swig" which allows for C# bindings.

How to sort/filter using the new Sitecore.Search API

独自空忆成欢 提交于 2019-11-29 15:20:08
问题 I couldn't find any way to do sort and filter using the new Sitecore.Search API. Lucene provides the following methods: Search(Query query, Filter filter) Search(Query query, Sort sort) Search(Query query, Filter filter, Sort sort) But I don't think Sitecore.Search API exposes these features. Am I missing something? Can someone please explain how to perform Filter and Sort with the new Sitecore.Search API? Or do I need to use the wrapped Searcher.Search(Query, Sort) to achieve this? I am

Find all available values for a field in lucene .net

天大地大妈咪最大 提交于 2019-11-29 15:16:05
If I have a field x, that can contain a value of y, or z etc, is there a way I can query so that I can return only the values that have been indexed? Example x available settable values = test1, test2, test3, test4 Item 1 : Field x = test1 Item 2 : Field x = test2 Item 3 : Field x = test4 Item 4 : Field x = test1 Performing required query would return a list of: test1, test2, test4 I've implemented this before as an extension method: public static class ReaderExtentions { public static IEnumerable<string> UniqueTermsFromField( this IndexReader reader, string field) { var termEnum = reader

In Lucene, why do my boosted and unboosted documents get the same score?

本小妞迷上赌 提交于 2019-11-29 14:13:30
At index time I am boosting certain document in this way: if (myCondition) { document.SetBoost(1.2f); } But at search time documents with all the exact same qualities but some passing and some failing myCondition all end up having the same score. And here is the search code: BooleanQuery booleanQuery = new BooleanQuery(); booleanQuery.Add(new TermQuery(new Term(FieldNames.HAS_PHOTO, "y")), BooleanClause.Occur.MUST); booleanQuery.Add(new TermQuery(new Term(FieldNames.AUTHOR_TYPE, AuthorTypes.BLOGGER)), BooleanClause.Occur.MUST_NOT); indexSearcher.Search(booleanQuery, 10); Can you tell me what I

Wildcard at the Beginning of a searchterm -Lucene

旧城冷巷雨未停 提交于 2019-11-29 13:49:27
As far as i know lucene(.net) doesn't support the wildcard at the beginning of a searchterm --> http://lucene.apache.org/java/2_0_0/queryparsersyntax.html "Note: You cannot use a * or ? symbol as the first character of a search." for example *myword maybe because it's quiet difficult to search "everything" before the searchterm. Despite that, We are looknig for a way to use the wildcard at the beginning. Does anyone know if this is possible? One Thought was a searchterm, b searchterm, ....z*searchterm ... but that seems a bit random to me. thanks in advance Your question is tagged with Lucene

How to query for terms IN a collection using Lucene.Net, similar to SQL's IN operator?

…衆ロ難τιáo~ 提交于 2019-11-29 11:32:47
We are trying to search whether documents have a particular field value in a collection of possible values, field:[value1, value2, value3, ..., valueN] which would return the element if it matches any of the input values, similar to SQL's IN() operator. This would be similar to a range query, but the elements do not necessarily describe a range. An example using Lucene.Net API would be, var query = new QueryParser(version, "FieldName", analyzer).In("value1", "value2", "value3"); Is this possible in Lucene.Net? I4V field:value1 field:value2 .... should do the trick. By default all terms are OR