readr

Remove attributes from data read in readr::read_csv

南楼画角 提交于 2021-01-27 15:42:07
问题 readr::read_csv adds attributes that don't get updated when the data is edited. For example, library('tidyverse') df <- read_csv("A,B,C\na,1,x\nb,1,y\nc,1,z") # Remove columns with only one distinct entry no_info <- df %>% sapply(n_distinct) no_info <- names(no_info[no_info==1]) df2 <- df %>% select(-no_info) Inspecting the structure, we see that column B is still present in the attributes of df2 : > str(df) Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 3 variables: $ A:

Dealing with Byte Order Mark (BOM) in R [duplicate]

早过忘川 提交于 2021-01-27 07:42:28
问题 This question already has answers here : Read a UTF-8 text file with BOM (2 answers) Closed 4 years ago . Sometimes a Byte Order Mark (BOM) is present at the beginning of a .CSV file. The symbol is not visible when you open the file using Notepad or Excel, however, When you read the file in R using various methods, you will different symbols in the name of first column. here is an example A sample csv file with BOM in the beginning. ID,title,clean_title,clean_title_id 1,0 - 0,,0 2,"""0 - 1

timezone error reading csv

房东的猫 提交于 2020-05-15 10:40:07
问题 After googling for a couple of hours I have not found a solution to this problem. Basically when I run read_csv("some_file.csv") function from readr package I get the following error: Error: Unknown TZ UTC and csv is not read. The only way I can read the CSV is this way: read_csv("some_file.csv",locale=locale(tz="Australia/Sydney")) Sydney being my timezone. But I'd rather fix the error than work around it if possible. Does anybody know how to fix the UTC error permanently? E.g. Startup

how to skip reading certain columns in readr [duplicate]

微笑、不失礼 提交于 2020-05-14 14:39:47
问题 This question already has answers here : Only read selected columns (4 answers) Closed 2 years ago . I have a simple csv file called "test.csv" with the following content: colA,colB,colC 1,"x",12 2,"y",34 3,"z",56 Let's say I want to skip reading in colA and just read in colB and colC. I want a general way to do this because I have lots of files to read in and sometimes colA is called something else altogether but colB and colC are always the same. According to the read_csv documentation, one

Read CSV in R and filter columns by name

折月煮酒 提交于 2020-01-24 07:10:12
问题 Let's say I have a CSV with dozens or hundreds of columns and I want to pull in just about 2 or 3 columns. I know about the colClasses solution as described here but the code gets very unreadable. I want something like usecols from pandas' read_csv. Loading everything and just selecting afterwards is not a solution (the file is super big, it doesn't fit in memory). 回答1: I will use package data.table and then with fread() specify columns to keep/drop by arguments select or drop . From ?fread

Read CSV in R and filter columns by name

℡╲_俬逩灬. 提交于 2020-01-24 07:07:06
问题 Let's say I have a CSV with dozens or hundreds of columns and I want to pull in just about 2 or 3 columns. I know about the colClasses solution as described here but the code gets very unreadable. I want something like usecols from pandas' read_csv. Loading everything and just selecting afterwards is not a solution (the file is super big, it doesn't fit in memory). 回答1: I will use package data.table and then with fread() specify columns to keep/drop by arguments select or drop . From ?fread

Read CSV in R and filter columns by name

主宰稳场 提交于 2020-01-24 07:06:54
问题 Let's say I have a CSV with dozens or hundreds of columns and I want to pull in just about 2 or 3 columns. I know about the colClasses solution as described here but the code gets very unreadable. I want something like usecols from pandas' read_csv. Loading everything and just selecting afterwards is not a solution (the file is super big, it doesn't fit in memory). 回答1: I will use package data.table and then with fread() specify columns to keep/drop by arguments select or drop . From ?fread

write_csv read_csv with scientific notation after 1000th row

你离开我真会死。 提交于 2020-01-02 06:11:59
问题 Writing a data frame with a mix of small integer entries (value less than 1000) and "large" ones (value 1000 or more) into csv file with write_csv() mixes scientific and non-scientific entries. If the first 1000 rows are small values but there is a large value thereafter, read_csv() seems to get confused with this mix and outputs NA for scientific notations: test_write_read <- function(small_value, n_fills, position, large_value) { tib <- tibble(a = rep(small_value, n_fills)) tib$a[position]

write_csv read_csv with scientific notation after 1000th row

徘徊边缘 提交于 2020-01-02 06:10:11
问题 Writing a data frame with a mix of small integer entries (value less than 1000) and "large" ones (value 1000 or more) into csv file with write_csv() mixes scientific and non-scientific entries. If the first 1000 rows are small values but there is a large value thereafter, read_csv() seems to get confused with this mix and outputs NA for scientific notations: test_write_read <- function(small_value, n_fills, position, large_value) { tib <- tibble(a = rep(small_value, n_fills)) tib$a[position]

How to use “cols()” and “col_double” with respect to comma as decimal mark

萝らか妹 提交于 2019-12-30 07:18:08
问题 I would like to parse my columns with the readr package to the right type while reading. Difficulty: the fields are separated by semicolon ( ; ), while comma ( , ) is used as decimal mark. library(readr) # Test data: T <- "Date;Time;Var1;Var2 01.01.2011;11:11;2,4;5,6 02.01.2011;12:11;2,5;5,5 03.01.2011;13:11;2,6;5,4 04:01.2011;14:11;2,7;5,3" read_delim(T, ";") # A tibble: 4 × 4 # Date Time Var1 Var2 # <chr> <time> <dbl> <dbl> # 1 01.01.2011 11:11:00 24 56 # 2 02.01.2011 12:11:00 25 55 # 3 03