CqlStorage generates wrong Pig schema

最后都变了- 提交于 2020-01-16 04:50:07

问题


I'm loading some simple data from Cassandra into Pig using CqlStorage. The CqlStorage loader defines a schema based on the Cassandra schema, but it seems to be wrong.

If I do:

data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;

I get this:

data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int}

However, if I DUMP data, I get results like these:

((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

Clearly the results from Cassandra are key/value pairs, as would be expected. I don't know why the schema generated by CqlStorage() would be so different.

This is really causing me problems trying to access the column values. I tried a naive approach of FLATTENing each tuple, then trying to access the values that way:

flattened = FOREACH data GENERATE
  FLATTEN(isbn),
  FLATTEN(booktitle),
  ...
values = FOREACH flattened GENERATE
  $1 AS ISBN,
  $3 AS BookTitle,
  ...

As soon as I try to access field $5, Pig complains about the index being out of bounds. (Curiously, flattened thinks it has the same schema as the original data.)

Somehow, CqlStorage seems to be generating the wrong schema, and that schema persists to projections of the original collection. Is there any way to work around this?

(I'm using Cassandra 1.2.8 and Pig 0.11.1)


回答1:


This was resolved for, CCE: BinSedesTuple cannot be cast to String, by Applying the fix in https://issues.apache.org/jira/browse/CASSANDRA-5867.

As Alex Lui, mentioned in my ticket:

git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
cd cassandra
git checkout cassandra-1.2
patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt
ant


来源:https://stackoverflow.com/questions/18391552/cqlstorage-generates-wrong-pig-schema

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!