Reading big data with fixed width

后端 未结 3 1317
[愿得一人]
[愿得一人] 2020-12-02 23:21

How can I read big data formated with fixed width? I read this question and tried some tips, but all answers are for delimited data (as .csv), and that\'s not my case. The d

3条回答
  •  独厮守ぢ
    2020-12-03 00:16

    The LaF package is pretty good at reading fixed width files very fast. I use it dayly to load in files of +/- 100Mio records with 30 columns (not that much character columns as you have - mainly numeric data and some factors). And it is pretty fast. So this is what I would do.

    library(LaF)
    library(ffbase)
    my.data.laf <- laf_open_fwf('TS_MATRICULA_RS.txt', 
                      column_widths=c(5, 13, 14, 3, 3, 5, 4, 6, 6, 6, 1, 1, 1, 4, 3, 2, 9, 3, 2, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 11, 9, 2, 3, 9, 3, 2, 9, 9, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1), stringsAsFactors=FALSE, comment.char='', 
                      column_types=c('integer', 'integer', 'integer', 'integer', 'integer', 'integer', 'integer', 'integer', 'integer', 'integer', 'categorical', 'categorical', 'categorical',
                                   'integer', 'integer', 'categorical', 'integer', 'integer', 'categorical', 'integer', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical',
                                   'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical',
                                   'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'integer',
                                   'integer', 'integer', 'integer', 'integer', 'integer', 'integer', 'integer', 'categorical', 'integer', 'integer', 'categorical', 'categorical', 'categorical',
                                   'categorical', 'integer', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical', 'categorical'))
    my.data <- laf_to_ffdf(my.data.laf, nrows=1000000)
    my.data.in.ram <- as.data.frame(my.data)
    

    PS. I started using the LaF package because I was annoyed by the slowness of read.fwf and because the PL/SQL PostgreSQL code which I was working with initially to parse the data was becoming a hassle to maintain.

提交回复
热议问题