问题
Performing .shape is giving me the following error.
AttributeError: 'DataFrame' object has no attribute 'shape'
How should I get the shape instead?
回答1:
You can get the number of columns directly
len(df.columns) # this is fast
You can also call len on the dataframe itself, though beware that this will trigger a computation.
len(df) # this requires a full scan of the data
Dask.dataframe doesn't know how many records are in your data without first reading through all of it.
回答2:
To get the shape we can try this way:
dask_dataframe.describe().compute()
"count" column of the index will give the number of rows
len(dask_dataframe.columns)
this will give the number of columns in the dataframe
回答3:
With shape you can do the following
a = df.shape
a[0].compute(),a[1]
This will shop the shape just as it is shown with pandas
回答4:
Well, I know this is a quite old question, but I had the same issue and I got an out-of-the-box solution which I just want to register here.
Considering your data, I'm wondering that it is originally saved in a CSV similar file; so, for my situation, I just count the lines of that file (minus one, the header line). Inspired by this answer here, this is the solution I'm using:
import dask.dataframe as dd
from itertools import (takewhile,repeat)
def rawincount(filename):
f = open(filename, 'rb')
bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
return sum( buf.count(b'\n') for buf in bufgen )
filename = 'myHugeDataframe.csv'
df = dd.read_csv(filename)
df_shape = (rawincount(filename) - 1, len(df.columns))
print(f"Shape: {df_shape}")
Hope this could help someone else as well.
来源:https://stackoverflow.com/questions/50355598/how-should-i-get-the-shape-of-a-dask-dataframe