Pyspark can't convert float to Float :-/

匿名 (未验证) 提交于 2019-12-03 01:41:02

问题:

I have a pyspark rdd:

proba_classe_0.take(2) [0.38030685472943737, 0.34728188900913715] 

I want to transform on DF :

from pyspark.sql.types import FloatType fields = [ StructField('probabilite' , FloatType() ) ] schema = StructType(fields) df_proba_classe_1 = spark.createDataFrame(proba_classe_1, schema=schema) df_proba_classe_1.count() 

I got a strange error :

TypeError: StructType can not accept object 0.6196931452705625 in type <class 'float'> 

回答1:

you gotta map the rdd because rdds are type string

rdd = sc\ .parallelize(['0.38030685472943737', '0.34728188900913715'])\ .map(lambda x: float(x))  df = spark\ .createDataFrame(rdd, FloatType()).toDF("id")  df.show()  +----------+ |        id| +----------+ |0.38030684| | 0.3472819| +----------+  df.printSchema()  root  |-- id: float (nullable = true) 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!