How to open an .xlsb file in R?

前端未结

关注

 4  1927

自闭症患者

I\'m trying to open an .xlsb file in R and keep getting similar errors.

Any recommendations on how to solve this issue without having to download the data and save

相关标签:

4条回答

清酒与你

2020-12-14 20:33

Use the RODBC package:

library(RODBC)
wb <- "D:\\Data\\Masked Data.xlsb" # Give the file name
con2 <- odbcConnectExcel2007(wb)
data <- sqlFetch(con2, "Sheet1$") # Provide name of sheet
nrow(data)

0 讨论(0)

离开以前

2020-12-14 20:39
If you get the following error in R trying to connect to .xlsb:
```
[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
```
then, you are probably are missing to install the AccessDatabaseEngine_X64.exe from Microsoft. I had this problem today, and after installing this file I've got no more error messages.
0 讨论(0)
发布评论:

提交评论
- 加载中...

自闭症患者

2020-12-14 20:50

One way could be to use ODBC:

require(RODBC)
if (any(grepl("*.xlsb", odbcDataSources(), fixed = TRUE))) {
  download.file(url = "http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NTcwMjI1fENoaWxkSUQ9MjcxMjIxfFR5cGU9MQ==&t=1", 
                destfile = file.path(tempdir(), "test.xlsb"), 
                mode = "wb")
  conn <- odbcConnectExcel2007( file.path(tempdir(), "test.xlsb")) 
  df <- sqlFetch(conn, sub("'(.*)\\$'", "\\1", sqlTables(conn)$TABLE_NAME)[4]) # read 4th sheet in the table name list
  head(df, 10)
  #                                             F1          F2         F3       F4        F5 F6
  # 1                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 2                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 3                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 4                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 5  Baker Hughes Gulf of Mexico Oil / Gas Split        <NA>       <NA>     <NA>      <NA> NA
  # 6                                         <NA>        <NA>       <NA>     <NA>      <NA> NA
  # 7                                         <NA> US Offshore Total\nGoM Gas\nGoM Oil \nGoM NA
  # 8                                       1/7/00         127        123      116         7 NA
  # 9                                      1/14/00         125        121      116         5 NA
  # 10                                     1/21/00         125        121      116         5 NA
  close(conn) 
}

0 讨论(0)

北荒

2020-12-14 20:54
readxlsb package can read Excel binary (.xlsb) files into R. Here are some info taken from the package vignettes:
read_xlsb(path, sheet, range, col_names, col_types, na, trim_ws, skip, ...)

sheet:

Either a name, or the index of the sheet to read. Index of the first sheet is 1. If the sheet name is embedded in the range argument, or implied if range is a named range, then this argument is ignored

range:

range can be specified as
- A named range. Named ranges are not case sensitive
- In Sheet!A1 notation
- In Sheet!R1C1 notation
- As a cellranger::cell_limits object
col_names
- TRUE: The first row is used for column names. Empty cells result in a column name of the form ‘column.i’
- FALSE: Column names will be ‘column.i’
- Character vector: vector containing column names.
col_types

Can be implied from the spreadsheet or specified in advanced. When specifying types, options are
- “logical” (or “boolean”), “numeric” (or “double”), “integer”, “date” and “string” (or “character”)
- Use “skip” (or “ignore”) to skip a column
na

A character string that is interpret as NA. This does not effect the implied data type for a column.

trim_ws

Should leading and trailing whitespaces be trimmed from character strings?

skip

The number of rows to skip before reading data.
```
library(readxlsb)

res = read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"), 
                range = "PORTFOLIO", 
                debug = TRUE)

ls(res$env)
#> [1] "content"      "named_ranges" "sheets"       "stream"

res$env$named_ranges
#>             name                     range sheet_idx first_column first_row
#> 1   INFO_RELEASE          FirstSheet!$A$11         0            1        11
#> 2        OUTLOOK 'My SecondTab'!$A$1:$C$13         1            1         1
#> 3      PORTFOLIO      FirstSheet!$A$3:$C$9         0            1         3
#> 4 SAVED_DATETIME          FirstSheet!$C$13         0            3        13
#> 5          TITLE           FirstSheet!$A$1         0            1         1
#>   last_column last_row
#> 1           1       11
#> 2           3       13
#> 3           3        9
#> 4           3       13
#> 5           1        1
```
^{Created on 2020-07-07 by the reprex package (v0.3.0)}
0 讨论(0)
发布评论:

提交评论
- 加载中...