Is there a pure Python Lucene?

早过忘川 提交于 2019-12-03 02:13:06


The ruby folks have Ferret. Someone know of any similar initiative for Python? We're using PyLucene at current, but I'd like to investigate moving to pure Python searching.


Whoosh is a new project which is similar to lucene, but is pure python.


The only one pure-python (not involving even C extension) search solution I know of is Nucular. It's slow (much slower than PyLucene) and unstable yet.

We moved from PyLucene-based home baked search and indexing to Solr but YMMV.


I recently found pyndexter. It provides abstract interface to various different backend full-text search engines/indexers. And it ships with a default pure-python implementation.

These things can be disastrously slow though in Python.


For some applications pure Python is overrated. Take a look at Xapian.


lupy was a lucene port to pure python.The lupy people suggest that you use PyLucene. Sorry. Maybe you can use the Java sources in combination with Jython.


+1 to the Xapian and Pyndexter answers.

Ferret is actually written in C with Ruby bindings on top. A pure Ruby search engine would be even slower than a pure Python one. I would love to see "someone else" write a Cython/Pyrex layer for Python interface to Ferret, but won't do it myself because why bother when there are Python bindings for Xapian.


For non-pure Python, Sphinx Search with Python API works the fastest. From the benchmarks from multiple blogs, Sphinx Search is way faster than Lucene, uses way less memory and it is in C.

I am developing a multi-document search engine based on it, using python and web2py as framework.


After weeks of searching for this, I found a nice Python solution: repoze.catalog. It's not strictly Python-only because it uses ZODB for storage, but it seems a better dependency to me than something like SOLR.