How to open an .xlsb file in R?

前端 未结 4 1922
自闭症患者
自闭症患者 2020-12-14 20:13

I\'m trying to open an .xlsb file in R and keep getting similar errors.

Any recommendations on how to solve this issue without having to download the data and save

相关标签:
4条回答
  • 2020-12-14 20:33

    Use the RODBC package:

    library(RODBC)
    wb <- "D:\\Data\\Masked Data.xlsb" # Give the file name
    con2 <- odbcConnectExcel2007(wb)
    data <- sqlFetch(con2, "Sheet1$") # Provide name of sheet
    nrow(data)
    
    0 讨论(0)
  • 2020-12-14 20:39

    If you get the following error in R trying to connect to .xlsb:

    [RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
    

    then, you are probably are missing to install the AccessDatabaseEngine_X64.exe from Microsoft. I had this problem today, and after installing this file I've got no more error messages.

    0 讨论(0)
  • 2020-12-14 20:50

    One way could be to use ODBC:

    require(RODBC)
    if (any(grepl("*.xlsb", odbcDataSources(), fixed = TRUE))) {
      download.file(url = "http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NTcwMjI1fENoaWxkSUQ9MjcxMjIxfFR5cGU9MQ==&t=1", 
                    destfile = file.path(tempdir(), "test.xlsb"), 
                    mode = "wb")
      conn <- odbcConnectExcel2007( file.path(tempdir(), "test.xlsb")) 
      df <- sqlFetch(conn, sub("'(.*)\\$'", "\\1", sqlTables(conn)$TABLE_NAME)[4]) # read 4th sheet in the table name list
      head(df, 10)
      #                                             F1          F2         F3       F4        F5 F6
      # 1                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
      # 2                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
      # 3                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
      # 4                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
      # 5  Baker Hughes Gulf of Mexico Oil / Gas Split        <NA>       <NA>     <NA>      <NA> NA
      # 6                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
      # 7                                         <NA> US Offshore Total\nGoM Gas\nGoM Oil \nGoM NA
      # 8                                       1/7/00         127        123      116         7 NA
      # 9                                      1/14/00         125        121      116         5 NA
      # 10                                     1/21/00         125        121      116         5 NA
      close(conn) 
    }
    
    0 讨论(0)
  • 2020-12-14 20:54

    readxlsb package can read Excel binary (.xlsb) files into R. Here are some info taken from the package vignettes:

    read_xlsb(path, sheet, range, col_names, col_types, na, trim_ws, skip, ...)

    sheet:

    Either a name, or the index of the sheet to read. Index of the first sheet is 1. If the sheet name is embedded in the range argument, or implied if range is a named range, then this argument is ignored

    range:

    range can be specified as

    • A named range. Named ranges are not case sensitive
    • In Sheet!A1 notation
    • In Sheet!R1C1 notation
    • As a cellranger::cell_limits object

    col_names

    • TRUE: The first row is used for column names. Empty cells result in a column name of the form ‘column.i’
    • FALSE: Column names will be ‘column.i’
    • Character vector: vector containing column names.

    col_types

    Can be implied from the spreadsheet or specified in advanced. When specifying types, options are

    • “logical” (or “boolean”), “numeric” (or “double”), “integer”, “date” and “string” (or “character”)
    • Use “skip” (or “ignore”) to skip a column

    na

    A character string that is interpret as NA. This does not effect the implied data type for a column.

    trim_ws

    Should leading and trailing whitespaces be trimmed from character strings?

    skip

    The number of rows to skip before reading data.

    library(readxlsb)
    
    res = read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"), 
                    range = "PORTFOLIO", 
                    debug = TRUE)
    
    ls(res$env)
    #> [1] "content"      "named_ranges" "sheets"       "stream"
    
    res$env$named_ranges
    #>             name                     range sheet_idx first_column first_row
    #> 1   INFO_RELEASE          FirstSheet!$A$11         0            1        11
    #> 2        OUTLOOK 'My SecondTab'!$A$1:$C$13         1            1         1
    #> 3      PORTFOLIO      FirstSheet!$A$3:$C$9         0            1         3
    #> 4 SAVED_DATETIME          FirstSheet!$C$13         0            3        13
    #> 5          TITLE           FirstSheet!$A$1         0            1         1
    #>   last_column last_row
    #> 1           1       11
    #> 2           3       13
    #> 3           3        9
    #> 4           3       13
    #> 5           1        1
    

    Created on 2020-07-07 by the reprex package (v0.3.0)

    0 讨论(0)
提交回复
热议问题