问题
I'm writing a code where I'm fetching a dataset using an internal library and %pyspark interpreter. However I am unable to pass the dataset to %python interpreter. I tried using string variables and it is working fine, but with dataset I'm using the following code to put dataset in a zeppelin context- z.put("input_data",input_data)
and it is throwing the following error:
AttributeError: 'DataFrame' object has no attribute '_get_object_id'
.
Can you please tell me how can I do this? Thanks in advance.
回答1:
You can put the result in ResourcePool via print it to %table.
%python
print('%table a\tb\n408+\t+408\n0001\t++99\n40817810300001453030\t0000040817810300001453030')
Then get in such way.
%spark.pyspark
ic = z.getInterpreterContext()
pool = ic.getResourcePool()
paragraphId = "20180828-093109_1491500809"
t = pool.get(ic.getNoteId(), paragraphId, "zeppelin.paragraph.result.table").get().toString()
print(t)
This way allows to transfer up to 50-100 megabytes of raw data.
Anyway I recommend to follow @zjffdu to use only one of these interpreters.
来源:https://stackoverflow.com/questions/51915908/how-can-i-pass-datasets-between-pyspark-interpreter-and-python-interpreters-in