问题
I am trying to tokenize my table fields with a query.
SELECT regexp_split_to_table(mytable.field_name, E'\\s+') from mytable limit 20;
This works when I execute it from psql shell but when I do:
from django.db import connection cursor=connection.cursor()
cursor.execute("SELECT regexp_split_to_table(mytable.field_name,E'\\s+')
FROM mytable LIMIT 20")
cursor.fetchall()
... it fails to return tokens. What am I doing wrong?
回答1:
The backslash is treated as meta-character by Django and is interpreted inside double quotes.
So one layer of E'\\s+')
gets stripped before the string arrives at the PostgreSQL server, which will see E'\s+')
. The escape string will result in 's+'
which in turn will make regexp_split_to_table()
split your strings at any number of s
instead of non-printing space, which the character class shorthand \s
stands for in regular expressions.
Double your backslashes in the string to get what you intended: E'\\\\s+')
:
"SELECT regexp_split_to_table(field_name, E'\\\\s+') FROM mytable LIMIT 20"
As an alternative, to avoid problems with the special meaning of the backslash \
, you can use [[:space:]]
to denote the same character class:
"SELECT regexp_split_to_table(field_name, '[[:space:]]+') FROM mytable LIMIT 20"
Details in the chapter "Pattern Matching" in the manual.
回答2:
Thanks to new in Django F
, Func
, and supprot for postgresql ArrayField
you can now call this function like this:
from django.db.models import F, Value, TextField
from django.contrib.postgres.fields import ArrayField
from django.db.models.expressions import Func
MyTable.objects.annotate(
some_field_splitted=Func(
F('some_field'),
Value(","),
function='regexp_split_to_array',
output_field=ArrayField(TextField())
)
).filter(some_field_splitted__contains=[HERE_SOME_VALUE])
来源:https://stackoverflow.com/questions/8907041/django-postgresql-regexp-split-to-table-not-working