I am analysing a dataset having 200 rows and 1200 columns, this dataset is stored in a .CSV
file. In order to process, I read this file using R\'s read.csv()<
Wide data sets are typically slower to read into memory than long data sets (i.e. the transposed one). This effects many programs that read data, such as R, Python, Excel, etc. though this description is more pertinent to R:
NA
. This means that every column has at least as many cells as the number of rows in the csv file, whereas in a long dataset you can potentially drop the NA
values and save some spaceSince your dataset doesn't appear to contain any NA
values, my hunch is that you're seeing the speed improvement because of the second point. You can test this theory by passing colClasses = rep('numeric', 20)
to read.csv
or fread
for the 20 column data set, or rep('numeric', 120)
for the 120 column one, which should decrease the overhead of guessing data types.