Extracting to a data frame from a JSON generated multi-level list with occasional missing elements

主宰稳场 提交于 2019-12-23 06:04:30

问题


I'm pulling soccer data through an API - the resulting JSON is returned as a list; dput example below:

list(list(id = 10332894L, league_id = 8L, season_id = 12962L, 
aggregate_id = NULL, venue_id = 201L, localteam_id = 51L, 
visitorteam_id = 27L, weather_report = list(code = "drizzle", 
    temperature = list(temp = 53.92, unit = "fahrenheit"), 
    clouds = "90%", humidity = "87%", wind = list(speed = "12.75 m/s", 
        degree = 200L)), attendance = 25098L, leg = "1/1", 
deleted = FALSE, referee = list(data = list(id = 15267L, 
    common_name = "L. Probert", fullname = "Lee Probert", 
    firstname = "Lee", lastname = "Probert"))), list(id = 10332895L, 
league_id = 8L, season_id = 12962L, aggregate_id = NULL, 
venue_id = 340L, localteam_id = 251L, visitorteam_id = 78L, 
weather_report = list(code = "drizzle", temperature = list(
    temp = 50.07, unit = "fahrenheit"), clouds = "90%", humidity = "93%", 
    wind = list(speed = "6.93 m/s", degree = 160L)), attendance = 22973L, 
leg = "1/1", deleted = FALSE, referee = list(data = list(
    id = 15273L, common_name = "M. Oliver", fullname = "Michael Oliver", 
    firstname = "Michael", lastname = "Oliver"))))

I'm extracting using a for loop at the moment - the reprex shows 2 top level list items when there are hundreds in the full data. The main drawback of using a loop is that there are sometimes missing values which cause the loop to stop. I'd like to move this to purrr but am struggling to extract 2nd level nested items using at_depth or modify_depth. There are also nests inside nests which really adds to the complexity.

The end-state should be a tidy data frame - from this data the df will only have 2 rows but will have many columns each representing an item, no matter where that item is nested in this list. If something's missing then it should be an NA value.

The ideal scenario for a solution, even though it may be inelegant is that there's a data frame per level / nested item produced that can then be bound together later.

thanks.


回答1:


Step1: Replace NULL with NA using community wiki's function here

simple_rapply <- function(x, fn)
{
  if(is.list(x))
  {
    lapply(x, simple_rapply, fn)
  } else
  {
    fn(x)
  }
}    
non.null.l <- simple_rapply(l, function(x) if(is.null(x)) NA else x)

Step2:

library(purrr)
map_df(map(non.null.l,unlist),bind_rows)


来源:https://stackoverflow.com/questions/54010404/extracting-to-a-data-frame-from-a-json-generated-multi-level-list-with-occasiona

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!