Reading a huge json file in R , issues

后端 未结 3 1231
北荒
北荒 2020-12-30 11:17

I am trying to read very huge json file using R , and I am using the RJSON library with this commend json_data <- fromJSON(paste(readLines(\"myfile.json\"), collaps

3条回答
  •  长发绾君心
    2020-12-30 11:49

    Well, just sharing my experience about read json file. the progress of I am trying to read 52.8MB,19.7MB,1.3GB,93.9MB,158.5MB json files cost me 30minutes and finally auto resume R session, after that tried to apply parallel computing and would like to see the progress but failed.

    https://github.com/hadley/plyr/issues/265

    And then I tried to add the parameter pagesize = 10000, its work and more efficient then ever. Well, we only need read once and later save as RData/Rda/Rds format as by saveRDS.

    > suppressPackageStartupMessages(library('BBmisc'))
    > suppressAll(library('jsonlite'))
    > suppressAll(library('plyr'))
    > suppressAll(library('dplyr'))
    > suppressAll(library('stringr'))
    > suppressAll(library('doParallel'))
    > 
    > registerDoParallel(cores=16)
    > 
    > ## https://www.kaggle.com/c/yelp-recsys-2013/forums/t/4465/reading-json-files-with-r-how-to
    > ## https://class.coursera.org/dsscapstone-005/forum/thread?thread_id=12
    > fnames <- c('business','checkin','review','tip','user')
    > jfile <- paste0(getwd(),'/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_',fnames,'.json')
    > dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.parallel=TRUE)
    > dat
    list()
    > jfile
    [1] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json"
    [2] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_checkin.json" 
    [3] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json"  
    [4] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_tip.json"     
    [5] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_user.json"    
    > dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.progress='=')
    opening file input connection.
     Imported 61184 records. Simplifying into dataframe...
    closing file input connection.
    opening file input connection.
     Imported 45166 records. Simplifying into dataframe...
    closing file input connection.
    opening file input connection.
     Found 470000 records...
    

提交回复
热议问题