Efficient way to store a JSON string in a Cassandra column?

匆匆过客 提交于 2019-12-04 10:06:45

问题


Cassandra newbie question. I'm collecting some data from a social networking site using REST calls. So I end up with the data coming back in JSON format.

The JSON is only one of the columns in my table. I'm trying to figure out what the "best practice" is for storing the JSON string.

First I thought of using the map type, but the JSON contains a mix of strings, numerical types, etc. It doesn't seem like I can declare wildcard types for the map key/value. The JSON string can be quite large, probably over 10KB in size. I could potentially store it as a string, but it seems like that would be inefficient. I would assume this is a common task, so I'm sure there are some general guidelines for how to do this.

I know Cassandra has native support for JSON, but from what I understand, that's mostly used when the entire JSON map matches 1-1 with the database schema. That's not the case for me. The schema has a bunch of columns and the JSON string is just a sort of "payload". Is it better to store the JSON string as a blob or as text? BTW, the Cassandra version is 2.1.5.

Any hints appreciated. Thanks in advance.


回答1:


In the Cassandra Storage engine there's really not a big difference between a blob and a text, since Cassandra stores text as blobs essentially. And yes the "native" JSON support you speak of is only for when your data model matches your JSON model, and it's only in Cassandra 2.2+.

I would store it as a text type, and you shouldn't have to implement anything to compress your JSON data when sending the data (or handle uncompressing). Since Cassandra's Binary Protocol supports doing transport compression. Also make sure your table is storing the data compressed with the same compression algorithm (I suggest using LZ4 since it's the fastest algo implmeneted) to save on doing compression for each read request. Thus if you configure storing the data compressed and use transport compression, you don't even have to implement either yourself.

You didn't say which Client Driver you're using, but here's the documentation on how to setup Transport Compression for Datastax Java Client Driver.




回答2:


It depends on how to want to query your JSON. There are 3 possible strategies:

  1. Store as a string
  2. Store as a compressed blob
  3. Store as a blob

Option 1 has the advantage of being human readable when you query your data on command line with cqlsh or if you want to debug data directly live. The drawback is the size of this JSON column (10k)

Option 2 has the advantage to keep the JSON payload small because text elements have a pretty decent compression ration. Drawbacks are: a. you need to take care of compression/decompression client side and b. it's not human readable directly

Option 3 has drawbacks of option 1 (size) and 2 (not human readable)



来源:https://stackoverflow.com/questions/36342531/efficient-way-to-store-a-json-string-in-a-cassandra-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!