readr | 易学教程

Remove attributes from data read in readr::read_csv

阅读更多关于 Remove attributes from data read in readr::read_csv

问题 readr::read_csv adds attributes that don't get updated when the data is edited. For example, library('tidyverse') df <- read_csv("A,B,C\na,1,x\nb,1,y\nc,1,z") # Remove columns with only one distinct entry no_info <- df %>% sapply(n_distinct) no_info <- names(no_info[no_info==1]) df2 <- df %>% select(-no_info) Inspecting the structure, we see that column B is still present in the attributes of df2 : > str(df) Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 3 variables: $ A:

Dealing with Byte Order Mark (BOM) in R [duplicate]

阅读更多关于 Dealing with Byte Order Mark (BOM) in R [duplicate]

问题 This question already has answers here : Read a UTF-8 text file with BOM (2 answers) Closed 4 years ago . Sometimes a Byte Order Mark (BOM) is present at the beginning of a .CSV file. The symbol is not visible when you open the file using Notepad or Excel, however, When you read the file in R using various methods, you will different symbols in the name of first column. here is an example A sample csv file with BOM in the beginning. ID,title,clean_title,clean_title_id 1,0 - 0,,0 2,"""0 - 1

timezone error reading csv

阅读更多关于 timezone error reading csv

问题 After googling for a couple of hours I have not found a solution to this problem. Basically when I run read_csv("some_file.csv") function from readr package I get the following error: Error: Unknown TZ UTC and csv is not read. The only way I can read the CSV is this way: read_csv("some_file.csv",locale=locale(tz="Australia/Sydney")) Sydney being my timezone. But I'd rather fix the error than work around it if possible. Does anybody know how to fix the UTC error permanently? E.g. Startup

how to skip reading certain columns in readr [duplicate]

阅读更多关于 how to skip reading certain columns in readr [duplicate]

问题 This question already has answers here : Only read selected columns (4 answers) Closed 2 years ago . I have a simple csv file called "test.csv" with the following content: colA,colB,colC 1,"x",12 2,"y",34 3,"z",56 Let's say I want to skip reading in colA and just read in colB and colC. I want a general way to do this because I have lots of files to read in and sometimes colA is called something else altogether but colB and colC are always the same. According to the read_csv documentation, one

Read CSV in R and filter columns by name

阅读更多关于 Read CSV in R and filter columns by name

问题 Let's say I have a CSV with dozens or hundreds of columns and I want to pull in just about 2 or 3 columns. I know about the colClasses solution as described here but the code gets very unreadable. I want something like usecols from pandas' read_csv. Loading everything and just selecting afterwards is not a solution (the file is super big, it doesn't fit in memory). 回答1: I will use package data.table and then with fread() specify columns to keep/drop by arguments select or drop . From ?fread

Read CSV in R and filter columns by name

阅读更多关于 Read CSV in R and filter columns by name

Read CSV in R and filter columns by name

阅读更多关于 Read CSV in R and filter columns by name

write_csv read_csv with scientific notation after 1000th row

阅读更多关于 write_csv read_csv with scientific notation after 1000th row

问题 Writing a data frame with a mix of small integer entries (value less than 1000) and "large" ones (value 1000 or more) into csv file with write_csv() mixes scientific and non-scientific entries. If the first 1000 rows are small values but there is a large value thereafter, read_csv() seems to get confused with this mix and outputs NA for scientific notations: test_write_read <- function(small_value, n_fills, position, large_value) { tib <- tibble(a = rep(small_value, n_fills)) tib$a[position]

write_csv read_csv with scientific notation after 1000th row

阅读更多关于 write_csv read_csv with scientific notation after 1000th row

How to use “cols()” and “col_double” with respect to comma as decimal mark

阅读更多关于 How to use “cols()” and “col_double” with respect to comma as decimal mark

问题 I would like to parse my columns with the readr package to the right type while reading. Difficulty: the fields are separated by semicolon ( ; ), while comma ( , ) is used as decimal mark. library(readr) # Test data: T <- "Date;Time;Var1;Var2 01.01.2011;11:11;2,4;5,6 02.01.2011;12:11;2,5;5,5 03.01.2011;13:11;2,6;5,4 04:01.2011;14:11;2,7;5,3" read_delim(T, ";") # A tibble: 4 × 4 # Date Time Var1 Var2 # <chr> <time> <dbl> <dbl> # 1 01.01.2011 11:11:00 24 56 # 2 02.01.2011 12:11:00 25 55 # 3 03