r

Restructure data in r: reshape, dcast, melt…nothing seems to work for this dataframe

二次信任 提交于 2021-02-16 18:11:08
问题 Here is an example of what the first few rows of my imported dataframe looks like (in the full dataset there are a total of five levels/factors for the subject variable the other two are Algebra II and Geometry). SID firstName lastName subject sumScaleScore sumPerformanceLevel 604881 JIM Ro Mathematics 912 2 604881 JIM Ro ELA 964 4 594181 JERRY Chi ELA 997 1 594181 JERRY Chi Mathematics 918 3 564711 KILE Gamma ELA 933 5 564711 KILE Gamma Algebra I 1043 7 I want to restructure it from the

识别缺失值

北战南征 提交于 2021-02-16 18:01:13
a、NA:缺失值 b、NaN:不是一个数,代表不可能值 c、Inf,-Inf:分别代表正无穷、负无穷 is.na()、is.nan()和is.infinite()函数返回值实例 X is.na(X) is.nan(X) is.infinite(X)表示无穷值 X <- NA TRUE FALSE FLASE X <- 0/0 TRUE TRUE FLASE X <- 1/0 FLASE FLASE TRUE > y <- c(1,2,3,NA) > is.na(y) [1] FALSE FALSE FALSE TRUE > data(sleep,package='VIM') #加载数据 > sleep[complete.cases(sleep),] #列出没有缺失值的行 BodyWgt BrainWgt NonD Dream Sleep Span Gest Pred Exp Danger 2 1.000 6.60 6.3 2.0 8.3 4.5 42.0 3 1 3 5 2547.000 4603.00 2.1 1.8 3.9 69.0 624.0 3 5 4 6 10.550 179.50 9.1 0.7 9.8 27.0 180.0 4 4 4 #太长,删除过多输出 > sleep[!complete.cases(sleep),] #列出一个或者多个缺失值的行 BodyWgt

Filtering Column by Multiple values [duplicate]

╄→гoц情女王★ 提交于 2021-02-16 15:41:45
问题 This question already has answers here : Filter multiple values on a string column in dplyr (4 answers) Closed 7 days ago . I would like to filter values based on one column with multiple values. For example, one data.frame has s&p 500 tickers, i have to pick 20 of them and associated closing prices. How to do it? 回答1: If I understand well you question, I believe you should do it with dplyr : library(dplyr) target <- c("Ticker1", "Ticker2", "Ticker3") filter(df, Ticker %in% target) The answer

Convert percentage columns with % into numeric in R

為{幸葍}努か 提交于 2021-02-16 15:29:12
问题 I have a small dataset as follows: id price month_pct year_pct 0 1 1.85 -2.63% -5.13% 1 2 2.42 0.00% 0.83% 2 3 1.81 0.00% -0.55% 3 4 4.37 -2.89% -5.62% 4 5 1.86 0.00% -7.92% 5 6 1.78 -1.11% -15.24% I would like to convert month_pct and year_pct (which are factor type) into numeric then multiply by 100 . How could I do that in R? Thanks. id price month_pct year_pct 0 1 1.85 -2.63 -5.13 1 2 2.42 0.00 0.83 2 3 1.81 0.00 -0.55 3 4 4.37 -2.89 -5.62 4 5 1.86 0.00 -7.92 5 6 1.78 -1.11 -15.24 Code

Find the minma /valley points and get the index where the valley starts and valley ends in R

浪子不回头ぞ 提交于 2021-02-16 15:29:06
问题 I am kind of new to Statistics and R.I have a requirement to find the peaks and valleys and the index where the peak/valley starts and ends. For the Maxima/peak i got the findPeaks function which helps me with the peak requirement.But i am unable to find any packages for finding the valley points that suits my requirement. The following is the R function for finding the peaks. function (x, nups = 1, ndowns = nups, zero = "0", peakpat = NULL, minpeakheight = -Inf, minpeakdistance = 1,

Pivot Wider in R

…衆ロ難τιáo~ 提交于 2021-02-16 15:28:49
问题 I have a dataframe like this rest_id task_name quarter nc 123 labeling 1 TRUE 123 labeling 2 FALSE 123 labeling 3 FALSE 123 labeling 4 FALSE 123 cooking 1 TRUE 123 cooking 2 FALSE 123 cooking 3 TRUE 123 cooking 4 FALSE 123 cleaning 1 TRUE 123 cleaning 2 FALSE 123 cleaning 3 TRUE 123 cleaning 4 FALSE I want to pivot it to look like this rest_id quarter labeling cooking cleaning 123 1 TRUE TRUE TRUE 123 2 FALSE FALSE FALSE 123 3 FALSE TRUE TRUE 123 4 FALSE FALSE FALSE I've tried this: X <-

Pivot Wider in R

十年热恋 提交于 2021-02-16 15:28:05
问题 I have a dataframe like this rest_id task_name quarter nc 123 labeling 1 TRUE 123 labeling 2 FALSE 123 labeling 3 FALSE 123 labeling 4 FALSE 123 cooking 1 TRUE 123 cooking 2 FALSE 123 cooking 3 TRUE 123 cooking 4 FALSE 123 cleaning 1 TRUE 123 cleaning 2 FALSE 123 cleaning 3 TRUE 123 cleaning 4 FALSE I want to pivot it to look like this rest_id quarter labeling cooking cleaning 123 1 TRUE TRUE TRUE 123 2 FALSE FALSE FALSE 123 3 FALSE TRUE TRUE 123 4 FALSE FALSE FALSE I've tried this: X <-

R- Select rows with non-NA values in at least one of the four columns

无人久伴 提交于 2021-02-16 15:27:31
问题 I have this code that works fine: CompleteCoxObs<-temp[is.na(temp[,8])== FALSE | is.na(temp[,9])== FALSE | is.na(temp[,10])== FALSE,]; What is a better and more efficient way to achieve the same result? 回答1: You can try this to check for all the columns: CompleteCox.df <- temp.df[rowSums(is.na(temp.df)) != ncol(temp.df),] In your case: CompleteCox.df <- temp.df[rowSums(is.na(temp.df[, c(8,9,10)])) != 3,] 回答2: You can try one of the followings: temp[!is.na(rowSums(temp[,8:10])),] or temp[

Adding Multiple “sliders” to the same Graph

你。 提交于 2021-02-16 15:25:31
问题 I am using the R programming language. Using the "plotly" library, I was able to make the following interactive graph: library(dplyr) library(ggplot2) library(shiny) library(plotly) library(htmltools) library(dplyr) #generate data set.seed(123) var = rnorm(731, 100,25) date= seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day") data = data.frame(var,date) vals <- 90:100 combine <- vector('list', length(vals)) count <- 0 for (i in vals) { data$var_i = i data$new_var_i = ifelse(data$var >i,1,0

Handling imbalanced data in multi-class classification problem

删除回忆录丶 提交于 2021-02-16 15:24:08
问题 I have multi-class classification problem and data is heavily skewed. My target variable (y) has 3 classes and their % in data is as follows: - 0=3% - 1=90% - 2=7% I am looking for Packages in R which can do multi-class oversampling, Undersampling or both the techniques. If it is not doable in R then where I can handle this problem.? PS: I tried using ROSE package in R but it works only for binary class problems. 回答1: Well there is the caret -package which offers a wide range of ML-algorithms