问题
my spark RDD looks something like this
totalDistance=flightsParsed.map(lambda x:x.distance)
totalDistance.take(5)
[1979.0, 640.0, 1947.0, 1590.0, 874.0]
But when i run reduce on it I get error as mentioned below
totalDistance=flightsParsed.map(lambda x:x.distance).reduce(lambda y,z:y+z)
ValueError: could not convert string to float:
Please help.
回答1:
Did you try:
totalDistance=flightsParsed.map(lambda x: int(x.distance or 0))
or
totalDistance=flightsParsed.map(lambda x: float(x.distance or 0))
You may have missing or inconsistent data inside flightsParsed
来源:https://stackoverflow.com/questions/47559522/valueerror-could-not-convert-string-to-float-in-pyspark