TypeError: object of type 'map' has no len() Python3

问题

I'm trying to implement KMeans algorithm using Pyspark it gives me the above error in the last line of the while loop. it works fine outside the loop but after I created the loop it gave me this error How do I fix this ?

#  Find K Means of Loudacre device status locations
#
# Input data: file(s) with device status data (delimited by '|')
# including latitude (13th field) and longitude (14th field) of device locations
# (lat,lon of 0,0 indicates unknown location)
# NOTE: Copy to pyspark using %paste

# for a point p and an array of points, return the index in the array of the point closest to p
def closestPoint(p, points):
    bestIndex = 0
    closest = float("+inf")
    # for each point in the array, calculate the distance to the test point, then return
    # the index of the array point with the smallest distance
    for i in range(len(points)):
        dist = distanceSquared(p,points[i])
        if dist < closest:
            closest = dist
            bestIndex = i
    return bestIndex

# The squared distances between two points
def distanceSquared(p1,p2):
    return (p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2

# The sum of two points
def addPoints(p1,p2):
    return [p1[0] + p2[0], p1[1] + p2[1]]

# The files with device status data
filename = "/loudacre/devicestatus_etl/*"

# K is the number of means (center points of clusters) to find
K = 5

# ConvergeDist -- the threshold "distance" between iterations at which we decide we are done
convergeDist=.1

# Parse device status records into [latitude,longitude]
rdd2=rdd1.map(lambda line:(float((line.split(",")[3])),float((line.split(",")[4]))))
# Filter out records where lat/long is unavailable -- ie: 0/0 points
# TODO
filterd=rdd2.filter(lambda x:x!=(0,0))
# start with K randomly selected points from the dataset
# TODO
sample=filterd.takeSample(False,K,42)
# loop until the total distance between one iteration's points and the next is less than the convergence distance specified
tempDist =float("+inf")
while tempDist > convergeDist:
    # for each point, find the index of the closest kpoint.  map to (index, (point,1))
    # TODO
    indexed =filterd.map(lambda (x1,x2):(closestPoint((x1,x2),sample),((x1,x2),1)))

    # For each key (k-point index), reduce by adding the coordinates and number of points

    reduced=indexed.reduceByKey(lambda x,y: ((x[0][0]+y[0][0],x[0][1]+y[0][1]),x[1]+y[1]))
    # For each key (k-point index), find a new point by calculating the average of each closest point
    # TODO
    newCenters=reduced.mapValues(lambda x1: [x1[0][0]/x1[1], x1[0][1]/x1[1]]).sortByKey()
    # calculate the total of the distance between the current points and new points
    newSample=newCenters.collect() #new centers as a list
    samples=zip(newSample,sample) #sample=> old centers
    samples1=sc.parallelize(samples)
    totalDistance=samples1.map(lambda x:distanceSquared(x[0][1],x[1]))
    # Copy the new points to the kPoints array for the next iteration
    tempDist=totalDistance.sum()
    sample=map(lambda x:x[1],samples) #new sample for next iteration as list
sample

回答1:

You are getting this error because you are trying to get len of map object (of generator type) which do not supports len. For example:

>>> x = [[1, 'a'], [2, 'b'], [3, 'c']]

# `map` returns object of map type
>>> map(lambda a: a[0], x)
<map object at 0x101b75ba8>

# on doing `len`, raises error
>>> len(map(lambda a: a[0], x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'map' has no len()

In order to find the length, you will have to type-cast the map to list (or tuple) and then you may call len over it. For example:

>>> len(list(map(lambda a: a[0], x)))
3

Or it is even better to simply create a list using the list comprehension (without using map) as:

>>> my_list = [a[0] for a in x]

# since it is a `list`, you can take it's length
>>> len(my_list)
3

来源：https://stackoverflow.com/questions/41903852/typeerror-object-of-type-map-has-no-len-python3

标签

python

python-3.x

apache-spark

pyspark

k-means