问题
How can I get Flask-WhooshAlchemy to create the .seg files for an already existing database filled with records? By calling:
with app.app_context():
whooshalchemy.whoosh_index(app, MappedClass)
I can get the .toc file, but the .seg files will only be created and once I insert a record directly via Flask-WhooshAlchemy interface. Thus all already existing records will never be included in a whoosh search.
回答1:
Here is a script that indexes an existing database. FWIW, Whoosh refers to that as "batch indexing".
This is a little rough, but it works:
#!/usr/bin/env python2
import os
import sys
import app
from models import YourModel as Model
from flask.ext.whooshalchemy import whoosh_index
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
atatime = 512
with app.app_context():
index = whoosh_index(app, Model)
searchable = Model.__searchable__
print 'counting rows...'
total = int(Model.query.order_by(None).count())
done = 0
print 'total rows: {}'.format(total)
writer = index.writer(limitmb=10000, procs=16, multisegment=True)
for p in Model.query.yield_per( atatime ):
record = dict([(s, p.__dict__[s]) for s in searchable])
record.update({'id' : unicode(p.id)}) # id is mandatory, or whoosh won't work
writer.add_document(**record)
done += 1
if done % atatime == 0:
print 'c {}/{} ({}%)'.format(done, total, round((float(done)/total)*100,2) ),
print '{}/{} ({}%)'.format(done, total, round((float(done)/total)*100,2) )
writer.commit()
You may want to play with the the parameters:
atatime
- the number of records to pull from the database at oncelimitmb
- "max" megabytes to useprocs
- cores to use in parallel
I used this to index around 360,000 records on an 8-core AWS instance. It took about 4 minutes, most of which was waiting for the (single-threaded) commit()
.
回答2:
Flask-WhooshAlchemy seems not maintained
you can also try my fork https://github.com/Revolution1/Flask-WhooshAlchemyPlus
just simply:
pip install flask-whooshalchemyplus
from flask-whooshalchemyplus import index_all
index_all(app)
I also add some new feature and fixed a lot bugs.
thanks:)
来源:https://stackoverflow.com/questions/22872951/flask-whooshalchemy-with-existing-database