I see the paramter npartitions in many functions, but I don\'t understand what it is good for / used for.
http://dask.pydata.org/en/latest/dataframe-api.html
The npartitions property is the number of Pandas dataframes that compose a single Dask dataframe. This affects performance in two main ways.
Generally you want a few times more partitions than you have cores. Every task takes up a few hundred microseconds in the scheduler.
You can determine the number of partitions either at data ingestion time using the parameters like blocksize= in read_csv(...) or afterwards by using the .repartition(...) method.