How to use SQLAlchemy to create a full text search index on SQLite and query it?

我是研究僧i 提交于 2021-02-16 13:40:26

问题


I am create a simple app which can performance basic operations. SQLite is used as database. I want to perform wildcard search but I know that it has poor performance. I want to try out full text search but I cannot full a example on how to do it. I confirmed that SQLite has full text search support. Here is my sample code.

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.Text, unique=True, nullable=False)
    thumb = db.Column(db.Text, nullable=False, default="")

    role = db.relationship("Role", backref="person", cascade="delete")


class Role(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    person_id = db.Column(db.Integer, db.ForeignKey(Person.id, ondelete="CASCADE"), nullable=False)
    role = db.Column(db.Text, nullable=False)

How can I create a FTS index and query it using SQLAlchemy. For example, searching name in Person.


回答1:


FTS5 provides virtual tables that support full-text search. In other words you cannot create a full-text index on a column in an existing table. Instead you can create an FTS5 virtual table and copy relevant data from your original table over for indexing. In order to avoid storing the same data twice you can make it an external content table, though you will still have to make sure that the FTS5 table is kept in sync, either manually or with triggers.

You could create a generic custom DDL construct that'd handle creating a FTS5 virtual table that mirrors another table:

class CreateFtsTable(DDLElement):
    """Represents a CREATE VIRTUAL TABLE ... USING fts5 statement, for indexing
    a given table.

    """

    def __init__(self, table, version=5):
        self.table = table
        self.version = version


@compiles(CreateFtsTable)
def compile_create_fts_table(element, compiler, **kw):
    """
    """
    tbl = element.table
    version = element.version
    preparer = compiler.preparer
    sql_compiler = compiler.sql_compiler

    tbl_name = preparer.format_table(tbl)
    vtbl_name = preparer.quote(tbl.name + "_idx")

    text = "\nCREATE VIRTUAL TABLE "
    text += vtbl_name + " "
    text += "USING fts" + str(version) + "("

    separator = "\n"

    pk_column, = tbl.primary_key
    columns = [col for col in tbl.columns if col is not pk_column]

    for column in columns:
        text += separator
        separator = ", \n"
        text += "\t" + preparer.format_column(column)

        if not isinstance(column.type, String):
            text += " UNINDEXED"

    text += separator
    text += "\tcontent=" + sql_compiler.render_literal_value(
            tbl.name, String())

    text += separator
    text += "\tcontent_rowid=" + sql_compiler.render_literal_value(
            pk_column.name, String())

    text += "\n)\n\n"
    return text

The given implementation is a bit naive and indexes all text columns by default. The created virtual table is implicitly named by adding _idx after the original table name.

But that alone is not enough, if you want to automate keeping the tables in sync with triggers, and since you're adding an index for just one table, you could just opt to use text DDL constructs in your migration script:

def upgrade():
    ddl = [
        """
        CREATE VIRTUAL TABLE person_idx USING fts5(
            name,
            thumb UNINDEXED,
            content='person',
            content_rowid='id'
        )
        """,
        """
        CREATE TRIGGER person_ai AFTER INSERT ON person BEGIN
            INSERT INTO person_idx (rowid, name, thumb)
            VALUES (new.id, new.name, new.thumb);
        END
        """,
        """
        CREATE TRIGGER person_ad AFTER DELETE ON person BEGIN
            INSERT INTO person_idx (person_idx, rowid, name, thumb)
            VALUES ('delete', old.id, old.name, old.thumb);
        END
        """,
        """
        CREATE TRIGGER person_au AFTER UPDATE ON person BEGIN
            INSERT INTO person_idx (person_idx, rowid, name, thumb)
            VALUES ('delete', old.id, old.name, old.thumb);
            INSERT INTO person_idx (rowid, name, thumb)
            VALUES (new.id, new.name, new.thumb);
        END
        """
    ]

    for stmt in ddl:
        op.execute(sa.DDL(stmt))

If your person table contains existing data, remember to insert those to the created virtual table as well for indexing.

In order to actually use the created virtual table you could create a non-primary mapper for Person:

person_idx = db.Table('person_idx', db.metadata,
                      db.Column('rowid', db.Integer(), primary_key=True),
                      db.Column('name', db.Text()),
                      db.Column('thumb', db.Text()))

PersonIdx = db.mapper(
    Person, person_idx, non_primary=True,
    properties={
        'id': person_idx.c.rowid
    }
)

And to make a full-text query using for example MATCH:

db.session.query(PersonIdx).\
    filter(PersonIdx.c.name.op("MATCH")("john")).\
    all()

Note that the result is a list of Person objects. PersonIdx is just a Mapper.


As noted by Victor K. the use of non-primary mappers is deprecated and the new alternative is to use aliased(). The setup is mostly the same, but the rowid to id mapping needs to take place when creating person_idx Table using the key parameter of Column:

person_idx = db.Table('person_idx', db.metadata,
                      db.Column('rowid', db.Integer(), key='id', primary_key=True),
                      db.Column('name', db.Text()),
                      db.Column('thumb', db.Text()))

and instead of a new mapper create the alias:

PersonIdx = db.aliased(Person, person_idx, adapt_on_names=True)

The alias works more like the mapped class in that you do not access mapped attributes through .c, but directly:

db.session.query(PersonIdx).\
    filter(PersonIdx.name.op("MATCH")("john")).\
    all()


来源:https://stackoverflow.com/questions/49893974/how-to-use-sqlalchemy-to-create-a-full-text-search-index-on-sqlite-and-query-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!