Create a dataframe from a list in pyspark.sql

北慕城南 提交于 2019-12-04 10:26:18
limbo

You have a list of float64 and I think it doesn't like that type. On the other hand, when you hard code it it's just a list of float.
Here is a question with an answer that goes over on how to convert from numpy's datatype to python's native ones.

I have had this problem, the following is my solution that use 'float()' to convert the type:

1. At the beginning ,it's type is np.float64

my_rdd.collect()   
output ==>  [2.8,3.9,1.2]   

2. convert the type to python float

my_convert=my_rdd.map(lambda x: (float(x),)).collect()  
output ==> [(2.8,),(3.9,),(1.2,)]  

3. no error raise again

sqlContext.createDataFrame(my_convert).show()

4. for your sample ,I suggest :

li = example_data.map(lambda x: get_labeled_prediction(w,x)).map(lambda y:(float(y[0]),float(y[1]))).collect()
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!