问题
I have a dataframe of params and apply a function to each row. this function is essentially a couple of sql_queries and simple calculations on the result.
I am trying to leverage Dask's multiprocessing while keeping structure and ~ interface. The example below works and indeed has a significant boost:
def get_metrics(row):
    record = {'areaName': row['name'],
              'areaType': row.area_type,
              'borough': row.Borough,
              'fullDate': row['start'],
              'yearMonth': row['start'],
              }
    Q = Qsi.format(unittypes=At,
                   start_date=row['start'],
                   end_date=row['end'],
                   freq='Q',
                   area_ids=row['descendent_ids'])
    sales = _get_DF(Q)
    record['salesInventory'] = len(sales)
    record['medianAskingPrice'] = sales.price.median()
    R.append(record)
R = []
x = ddf.map_partition(lambda x: x.apply(_metric, axis=1), meta={'result': None})
    x.compute()
result2 = pd.DataFrame(R)
However, when I try to use .apply method instead (see below), it throws me 'DataFrame' object has no attribute 'name'...
R = list()
y = ddf.apply(_metrics, axis=1, meta={'result': None})
Yet, ddf.head() shows that there is a name column in the dataframe
回答1:
If the output of your _metric function is a Series, maybe you should use meta=('your series's columns name','output's dtype')   
This worked for me.
来源:https://stackoverflow.com/questions/46737134/dask-apply-attributeerror-dataframe-object-has-no-attribute-name