pylucene

writing a custom anaylzer in pylucene/inheritance using jcc?

﹥>﹥吖頭↗ 提交于 2021-02-07 07:19:23
问题 I want to write a custom analyzer in pylucene. Usually in java lucene , when you write a analyzer class , your class inherits lucene's Analyzer class. but pylucene uses jcc , the java to c++/python compiler. So how do you let a python class inherit from a java class using jcc ,and especially how do you write a custom pylucene analyzer? Thanks. 回答1: Here's an example of an Analyzer that wraps the EdgeNGram Filter. import lucene class EdgeNGramAnalyzer(lucene.PythonAnalyzer): ''' This is an

Where is the best place to do initVM and attachCurrentThread when using pylucene in Django

一世执手 提交于 2021-01-27 06:50:26
问题 I'm using pylucene in a django based site and I was wondering if anyone knew where the best place to start up the jvm and attach threads would be. I don't want to have to start a new jvm every time someone loads a page, but I was occasionally getting the cryptic "Cannot Import Name" error in django when I was attaching threads at search time. Is it a mistake to attach the thread in views.py? Edit : I'm specifically looking for a way to instantiate a single jvm and leave it running so I can

Where is the best place to do initVM and attachCurrentThread when using pylucene in Django

爱⌒轻易说出口 提交于 2021-01-27 06:50:10
问题 I'm using pylucene in a django based site and I was wondering if anyone knew where the best place to start up the jvm and attach threads would be. I don't want to have to start a new jvm every time someone loads a page, but I was occasionally getting the cryptic "Cannot Import Name" error in django when I was attaching threads at search time. Is it a mistake to attach the thread in views.py? Edit : I'm specifically looking for a way to instantiate a single jvm and leave it running so I can

How to get a list of all tokens from Lucene 8.6.1 index using PyLucene?

こ雲淡風輕ζ 提交于 2021-01-05 08:53:15
问题 I have got some direction from this question. I first make the index like below. import lucene from org.apache.lucene.analysis.standard import StandardAnalyzer from org.apache.lucene.index import IndexWriterConfig, IndexWriter, DirectoryReader from org.apache.lucene.store import SimpleFSDirectory from java.nio.file import Paths from org.apache.lucene.document import Document, Field, TextField from org.apache.lucene.util import BytesRefIterator index_path = "./index" lucene.initVM() analyzer =

How to get a list of all tokens from Lucene 8.6.1 index?

拈花ヽ惹草 提交于 2021-01-04 06:37:50
问题 I have looked at how to get a list of all tokens from Solr/Lucene index? but Lucene 8.6.1 doesn't seem to offer IndexReader.terms() . Has it been moved or replaced? Is there an easier way than this answer? 回答1: Some History You asked: I'm just wondering if IndexReader.terms() has moved or been replaced by an alternative. The Lucene v3 method IndexReader.terms() was moved to AtomicReader in Lucene v4. This was documented in the v4 alpha release notes. (Bear in mind that Lucene v4 was released

How to get a list of all tokens from Lucene 8.6.1 index?

淺唱寂寞╮ 提交于 2021-01-04 06:37:33
问题 I have looked at how to get a list of all tokens from Solr/Lucene index? but Lucene 8.6.1 doesn't seem to offer IndexReader.terms() . Has it been moved or replaced? Is there an easier way than this answer? 回答1: Some History You asked: I'm just wondering if IndexReader.terms() has moved or been replaced by an alternative. The Lucene v3 method IndexReader.terms() was moved to AtomicReader in Lucene v4. This was documented in the v4 alpha release notes. (Bear in mind that Lucene v4 was released

Finding a single fields terms with Lucene (PyLucene)

风流意气都作罢 提交于 2020-01-01 17:20:13
问题 I'm fairly new to Lucene's Term Vectors - and want to make sure my term gathering is as efficient as it possibly can be. I'm getting the unique terms and then retrieving the docFreq() of the term to perform faceting. I'm gathering all documents terms from the index using: lindex = SimpleFSDirectory(File(indexdir)) ireader = IndexReader.open(lindex, True) terms = ireader.terms() #Returns TermEnum This works fine, but is there a way to only return terms for specific fields (across all documents

Building Pylucene on ubuntu 14.04(trusty tahr)

青春壹個敷衍的年華 提交于 2019-12-21 05:36:28
问题 As per the installation instructions, JCC is successfully built. Dependencies Installed were: ant, openjdk-7-jdk, python-setuptools, python-dev. Then procedding to make pylucene, in "Makefile" i choose specs corresponding to Ubuntu 11. # Linux (Ubuntu 11.10 64-bit, Python 2.7.2, OpenJDK 1.7, setuptools 0.6.16) # Be sure to also set JDK['linux2'] in jcc's setup.py to the JAVA_HOME value # used below for ANT (and rebuild jcc after changing it). PREFIX_PYTHON=/usr ANT=JAVA_HOME=/usr/lib/jvm/java

Pylucene 4.9.0 Ubuntu 14.04 Installation ImportError

扶醉桌前 提交于 2019-12-13 07:25:09
问题 I've been trying to install Pylucene on my Mac for a little over a week, and have given up on that in favor of trying to install it with Ubuntu through a virtual machine. I thought the installation process had gone well, so I fired up Python in the terminal and tried to import lucene and received the following ImportError : Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/parallels/anaconda/lib/python2.7/site-packages/lucene/__init__.py", line 7, in <module>

DelimitedPayloadFilter in PyLucene?

我与影子孤独终老i 提交于 2019-12-11 07:56:55
问题 I am trying to implement a python version of the java from http://searchhub.org/2010/04/18/refresh-getting-started-with-payloads/ using pylucene. My analyzer is producing an lucene.InvalidArgsError on the init call to the DelimitedTokenFilter The class is below, and any help is greatly appreciated. The java version compiled with the JAR files from the pylucene 3.6 build works fine. import lucene class PayloadAnalyzer(lucene.PythonAnalyzer): encoder = None def __init__(self, encoder): lucene