lucene.net

Proper structuring of Lucene.Net usage in an ASP.NET MVC site

谁说我不能喝 提交于 2019-11-28 15:45:42
问题 I'm building an ASP.NET MVC site where I plan to use Lucene.Net. I've envisioned a way to structure the usage of Lucene, but not sure whether my planned architecture is OK and efficient. My Plan: On Application_Start event in Global.asax: I check for the existence of the index on the file system - if it doesn't exist, I create it and fill it with documents extracted it from the database. When new content is submitted: I create an IndexWriter , fill up a document, write to the index, and

How to incorporate multiple fields in QueryParser?

末鹿安然 提交于 2019-11-28 15:40:07
Dim qp1 As New QueryParser("filename", New StandardAnalyzer()) Dim qp2 As New QueryParser("filetext", New StandardAnalyzer()) . . I am using the 'Lucene.Net' library and have the following question. Instead of creating two separate QueryParser objects and using them to obtain two Hits objects, is it possible perform a search on both fields using a single QueryParser object, so that I have only one Hits object which gives me the overall score of each Document? There are 3 ways to do this. The first way is to construct a query manually, this is what QueryParser is doing internally. This is the

Indexing .PDF, .XLS, .DOC, .PPT using Lucene.NET

痴心易碎 提交于 2019-11-28 14:58:31
问题 I've heard of Lucene.Net and I've heard of Apache Tika. The question is - how do I index these documents using C# vs Java? I think the issue is that there is no .Net equivalent of Tika which extracts relevant text from these document types. UPDATE - Feb 05 2011 Based on given responses, it seems that the is not currently a native .Net equivalent of Tika. 2 interesting projects were mentioned that are each interesting in their own right: Xapian Project (http://xapian.org/) - An alternative to

Find all available values for a field in lucene .net

血红的双手。 提交于 2019-11-28 09:07:55
问题 If I have a field x, that can contain a value of y, or z etc, is there a way I can query so that I can return only the values that have been indexed? Example x available settable values = test1, test2, test3, test4 Item 1 : Field x = test1 Item 2 : Field x = test2 Item 3 : Field x = test4 Item 4 : Field x = test1 Performing required query would return a list of: test1, test2, test4 回答1: I've implemented this before as an extension method: public static class ReaderExtentions { public static

Wildcard at the Beginning of a searchterm -Lucene

僤鯓⒐⒋嵵緔 提交于 2019-11-28 07:21:25
问题 As far as i know lucene(.net) doesn't support the wildcard at the beginning of a searchterm --> http://lucene.apache.org/java/2_0_0/queryparsersyntax.html "Note: You cannot use a * or ? symbol as the first character of a search." for example *myword maybe because it's quiet difficult to search "everything" before the searchterm. Despite that, We are looknig for a way to use the wildcard at the beginning. Does anyone know if this is possible? One Thought was a searchterm, b searchterm, ....z

How to query for terms IN a collection using Lucene.Net, similar to SQL's IN operator?

纵然是瞬间 提交于 2019-11-28 05:15:27
问题 We are trying to search whether documents have a particular field value in a collection of possible values, field:[value1, value2, value3, ..., valueN] which would return the element if it matches any of the input values, similar to SQL's IN() operator. This would be similar to a range query, but the elements do not necessarily describe a range. An example using Lucene.Net API would be, var query = new QueryParser(version, "FieldName", analyzer).In("value1", "value2", "value3"); Is this

How do i implement tag searching? with lucene?

不问归期 提交于 2019-11-28 04:35:28
I havent used lucene. Last time i ask (many months ago, maybe a year) people suggested lucene. If i shouldnt use lucene what should i use? As am example say there are items tagged like this apples carrots apples carrots apple banana if a user search apples i dont care if there is any preference from 1,2 and 4. However i seen many forums do this which i HATED is when a user search apple carrots 2 and 3 have high results while 1 is hard to find even though it matches my search more closely. Also i would like the ability to do search carrots -apples which will only get me 3. I am not sure what

Optimizing Lucene performance

社会主义新天地 提交于 2019-11-28 04:10:05
What are the various ways of optimizing Lucene performance? Shall I use caching API to store my lucene search query so that I save on the overhead of building the query again? Mitch Wheat Have you looked at Lucene Optimization Tip: Reuse Searcher Advanced Text Indexing with Lucene Should an index be optimised after incremental indexes in Lucene? Quick tips: Keep the size of the index small. Eliminate norms, Term vectors when not needed. Set Store flag for a field only if it a must. Obvious, but oft-repeated mistake. Create only one instance of Searcher and reuse. Keep in the index on fast

Lucene.Net Search result to highlight search keywords

戏子无情 提交于 2019-11-28 03:36:28
I use Lucene.Net to index some documents. I want to show the user a couple of lines as to why that document is in the result set. just like when you use google to search and it shows the link and followed by the link there are a few lines with the keywords highlighted. any ideas? When you have a result you can get the indexed text pass it along with your query through a method similar to this: public string GeneratePreviewText(Query q, string text) { QueryScorer scorer = new QueryScorer(q); Formatter formatter = new SimpleHTMLFormatter(highlightStartTag, highlightEndTag); Highlighter

Leading wildcard character throws error in Lucene.NET

我是研究僧i 提交于 2019-11-28 01:25:49
If the search query contains a leading wildcard character ( * or ? ), the QueryParser 's Parse function throws an error. Dim q As String = "*abc" Dim qp As New QueryParser("text", New StandardAnalyzer()) Dim query As Query = qp.Parse(q) Is there any way to solve this problem in Lucene.NET v2.0.0.4? Set QueryParser.SetAllowLeadingWildcard Method to true. The API page states that "this can produce very slow queries on big indexes" though. Maybe you have to use a WildcardQuery , but ...In order to prevent extremely slow WildcardQueries, a Wildcard term should not start with one of the wildcards..