Cassandra Full-Text Search

隐身守侯 提交于 2019-12-10 12:33:52

问题


Full-Text search in Cassandra;

I am fairly new to Cassandra, and wish to understand it more properly. I am attempting to perform a Full-Text search in Cassandra, but after some research I have found that there may not be a "simple" approach for this.. and I say maybe because the first page of Google hasn't said much of anything.

So I am trying to understand now instead, what is the best approach here.. This sort of lead me to take make up my own assumptions based on what I've learned so far about Cassandra, that is based on these two principals; a) design your tables based on your queries, rather than the data, and b) more-data is a good thing, as long as it is being used properly.

With that being said, I've come up with a couple of solutions I'd like to share, and also ask that if anyone has a better idea, please fill me on it before I commit to anything unreasonable/naive.

First Solution: Create a Column Family(CF), with two primary keys and an Index like so:

CREATE TABLE "FullTextSearch" (
"PartialText" text,
"TargetIdentifier" uuid,
"CompleteText" text,
"Type" int,
PRIMARY KEY ("PartialText","TargetIdentifier")
);
CREATE INDEX IX_FullTextSearch_Type "keyspace"."FullTextSearch" ("Type");

With the above table, I would need to insert rows for the text "Hello World" as follows:

BATCH APPLY;
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("H",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("He",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hel",000000000-0000-0000-0000-000000000,"Hello World",1);
.....
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hello Wor",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hello Worl",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Hello World",000000000-0000-0000-0000-000000000,"Hello World",1);
.....
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Wor",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("Worl",000000000-0000-0000-0000-000000000,"Hello World",1);
INSERT INTO "FullTextSearch" ("PartialText","TargetIdentifier","CompleteText","Type") VALUES ("World",000000000-0000-0000-0000-000000000,"Hello World",1);
END BATCH;

Basically, the above will satisfy the following wildcards/partialtext "%o W%", "Hello%", "Worl%"; However it will not satisfy partial words such as "%ell%" for "Hello", which I can feel alright about for now..... (OCD sorta kicks in here)

This approach sort of sucks for me because I would now have to delete/re-insert any time a save/name change occurs on the "TargetIdentifier";

The Second Solution, would be very similar only this time making use of wide-columns; where the table might look like:

CREATE TABLE "FullTextSearch" (
"TargetIdentifier" uuid,
"Type" int,
"CompleteText" text,
PRIMARY KEY("TargetIdentifier")
);

and now during a search something like:

SELECT * FROM "FullTextSearch" WHERE "He" = 1;

so that if the column exists, the respective rows are returned;

Third Solution: similar to the one above, only this time instead of using wide-columns we use a set column such as map for the partial texts, and perform a query like:

SELECT * FROM "FullTextSearch" WHERE "PartialTexts"['He'] = 1;

Anyways, I am all out of ideas, it is late, and I can only hope for a great response! Please, let me know what I should be doing here... am I even on the right path?


回答1:


AFAIK Datastax Enterprise Search is the (commercial) successor of Solandra.

Cassandra 2.0 supports so called "custom secondary indexes". Custom secondary indexes are Java code. Your own implementation has to implement the abstract class org.apache.cassandra.db.index.SecondaryIndex (See http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/create_index_r.html)

I'm not sure whether implementations exist for Elasticsearch or Solr.

I would not recommend to code all the weird full text search logic like stemming, multiple/exotic language support or even geo spatial stuff.

But SecondaryIndexwould be a good point to start integrating your favorite search engine.




回答2:


If your dataset is relative small you can simply use a inmemory instance of lucene, update the index at a set interval and you are ready to go.




回答3:


Check out SOLANDRA (former Lucandra)

But I think Solandra is not being actively developed any more, the author moved to Datastax and continued his work there.

So You can also take a look at Datastax Enterprise Search

There are some limitation also, look at DistributedSearch




回答4:


The very basic thing about cassandra is if you want to use where clause for filtration of records that column is either primary key or you have to assign index to it, so what i can see is you have given primary key to "TargetIdentifier" field and index to "Type" and using "CompleteText" in where clause so this may not work..

Assign secondary index to "CompleteTex" and check whether you are getting desired output or not.




回答5:


A couple other options you have: Stratio Lucene Plugin. This uses Lucene for implementing a native secondary index.

You also have SSTable Attached Secondary Index (SASI) available to use for free text searching.

Be forewarned that both of these strategies use locally distributed indexes such that queries will not be very performant since searches will end up being broadcast across the entire cluster. For SASI, you can avoid this if you can use a partition key as part of your query.




回答6:


Use elassandra which comes elasticsearch as a plugin in cassandra. An example can be found from here



来源:https://stackoverflow.com/questions/24858141/cassandra-full-text-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!