Performing .shape is giving me the following error.
AttributeError: 'DataFrame' object has no attribute 'shape'
How should I get the shape instead?
You can get the number of columns directly
len(df.columns) # this is fast
You can also call len on the dataframe itself, though beware that this will trigger a computation.
len(df) # this requires a full scan of the data
Dask.dataframe doesn't know how many records are in your data without first reading through all of it.
To get the shape we can try this way:
dask_dataframe.describe().compute()
"count" column of the index will give the number of rows
len(dask_dataframe.columns)
this will give the number of columns in the dataframe
With shape you can do the following
a = df.shape
a[0].compute(),a[1]
This will shop the shape just as it is shown with pandas
Well, I know this is a quite old question, but I had the same issue and I got an out-of-the-box solution which I just want to register here.
Considering your data, I'm wondering that it is originally saved in a CSV similar file; so, for my situation, I just count the lines of that file (minus one, the header line). Inspired by this answer here, this is the solution I'm using:
import dask.dataframe as dd
from itertools import (takewhile,repeat)
def rawincount(filename):
f = open(filename, 'rb')
bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024) for _ in repeat(None)))
return sum( buf.count(b'\n') for buf in bufgen )
filename = 'myHugeDataframe.csv'
df = dd.read_csv(filename)
df_shape = (rawincount(filename) - 1, len(df.columns))
print(f"Shape: {df_shape}")
Hope this could help someone else as well.
来源:https://stackoverflow.com/questions/50355598/how-should-i-get-the-shape-of-a-dask-dataframe