I\'m trying to open an .xlsb file in R and keep getting similar errors.
Any recommendations on how to solve this issue without having to download the data and save
Use the RODBC
package:
library(RODBC)
wb <- "D:\\Data\\Masked Data.xlsb" # Give the file name
con2 <- odbcConnectExcel2007(wb)
data <- sqlFetch(con2, "Sheet1$") # Provide name of sheet
nrow(data)
If you get the following error in R trying to connect to .xlsb
:
[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
then, you are probably are missing to install the AccessDatabaseEngine_X64.exe
from Microsoft. I had this problem today, and after installing this file I've got no more error messages.
One way could be to use ODBC:
require(RODBC)
if (any(grepl("*.xlsb", odbcDataSources(), fixed = TRUE))) {
download.file(url = "http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NTcwMjI1fENoaWxkSUQ9MjcxMjIxfFR5cGU9MQ==&t=1",
destfile = file.path(tempdir(), "test.xlsb"),
mode = "wb")
conn <- odbcConnectExcel2007( file.path(tempdir(), "test.xlsb"))
df <- sqlFetch(conn, sub("'(.*)\\$'", "\\1", sqlTables(conn)$TABLE_NAME)[4]) # read 4th sheet in the table name list
head(df, 10)
# F1 F2 F3 F4 F5 F6
# 1 <NA> <NA> <NA> <NA> <NA> NA
# 2 <NA> <NA> <NA> <NA> <NA> NA
# 3 <NA> <NA> <NA> <NA> <NA> NA
# 4 <NA> <NA> <NA> <NA> <NA> NA
# 5 Baker Hughes Gulf of Mexico Oil / Gas Split <NA> <NA> <NA> <NA> NA
# 6 <NA> <NA> <NA> <NA> <NA> NA
# 7 <NA> US Offshore Total\nGoM Gas\nGoM Oil \nGoM NA
# 8 1/7/00 127 123 116 7 NA
# 9 1/14/00 125 121 116 5 NA
# 10 1/21/00 125 121 116 5 NA
close(conn)
}
readxlsb package can read Excel binary (.xlsb) files into R. Here are some info taken from the package vignettes:
read_xlsb(path, sheet, range, col_names, col_types, na, trim_ws, skip, ...)
sheet:
Either a name, or the index of the sheet to read. Index of the first sheet is 1. If the sheet name is embedded in the range argument, or implied if range is a named range, then this argument is ignored
range:
range can be specified as
- A named range. Named ranges are not case sensitive
- In Sheet!A1 notation
- In Sheet!R1C1 notation
- As a cellranger::cell_limits object
col_names
- TRUE: The first row is used for column names. Empty cells result in a column name of the form ‘column.i’
- FALSE: Column names will be ‘column.i’
- Character vector: vector containing column names.
col_types
Can be implied from the spreadsheet or specified in advanced. When specifying types, options are
- “logical” (or “boolean”), “numeric” (or “double”), “integer”, “date” and “string” (or “character”)
- Use “skip” (or “ignore”) to skip a column
na
A character string that is interpret as NA. This does not effect the implied data type for a column.
trim_ws
Should leading and trailing whitespaces be trimmed from character strings?
skip
The number of rows to skip before reading data.
library(readxlsb)
res = read_xlsb(path = system.file("extdata", "TestBook.xlsb", package = "readxlsb"),
range = "PORTFOLIO",
debug = TRUE)
ls(res$env)
#> [1] "content" "named_ranges" "sheets" "stream"
res$env$named_ranges
#> name range sheet_idx first_column first_row
#> 1 INFO_RELEASE FirstSheet!$A$11 0 1 11
#> 2 OUTLOOK 'My SecondTab'!$A$1:$C$13 1 1 1
#> 3 PORTFOLIO FirstSheet!$A$3:$C$9 0 1 3
#> 4 SAVED_DATETIME FirstSheet!$C$13 0 3 13
#> 5 TITLE FirstSheet!$A$1 0 1 1
#> last_column last_row
#> 1 1 11
#> 2 3 13
#> 3 3 9
#> 4 3 13
#> 5 1 1
Created on 2020-07-07 by the reprex package (v0.3.0)