问题
My lab has separate groups for parents and children in the study. We have the data collected in one data frame right now. There are specific questions asked with children and some asked to parents. We have named them SCAREDC (scared child) and SCAREDP(scared parent) respectively. Naturally, SCAREDC will have NAs for the parents and SCAREDP will have NAs for the children in the dataframe.
currently, my dataframe looks like this
head(child_parent_total
familySID time SCAREDC1 SCAREDC2 SCAREDC3 SCAREDC4 SCAREDC5 SCAREDC6 SCAREDC7 SCAREDC8 SCAREDC9 SCAREDC10
1 1 Post NA NA NA NA NA NA NA NA NA NA
2 1 Pre 0 0 0 0 0 0 0 2 0 1
3 10 Post NA NA NA NA NA NA NA NA NA NA
4 10 Pre 0 0 1 1 0 0 0 1 0 0
5 101 Post 0 0 1 0 0 0 0 0 0 1
6 101 Pre 1 1 1 0 0 0 0 0 0 1
SCAREDC11 SCAREDC12 SCAREDC13 SCAREDC14 SCAREDC15 SCAREDC16 SCAREDC17 SCAREDC18 SCAREDC19 SCAREDC20
1 NA NA NA NA NA NA NA NA NA NA
2 0 0 0 0 0 0 1 0 0 0
3 NA NA NA NA NA NA NA NA NA NA
4 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 0 0 0 0
6 1 0 0 0 0 0 2 1 0 0
SCAREDC21 SCAREDC22 SCAREDC23 SCAREDC24 SCAREDC25 SCAREDC26 SCAREDC27 SCAREDC28 SCAREDC29 SCAREDC30
1 NA NA NA NA NA NA NA NA NA NA
2 1 0 0 0 1 1 0 0 1 0
3 NA NA NA NA NA NA NA NA NA NA
4 0 0 0 0 0 1 0 0 1 0
5 0 0 0 0 0 1 0 0 1 0
6 0 0 0 0 0 0 0 0 1 0
SCAREDC31 SCAREDC32 SCAREDC33 SCAREDC34 SCAREDC35 SCAREDC36 SCAREDC37 SCAREDC38 SCAREDC39 SCAREDC40
1 NA NA NA NA NA NA NA NA NA NA
2 0 2 0 0 1 0 1 0 1 1
3 NA NA NA NA NA NA NA NA NA NA
4 0 1 0 0 1 0 0 0 1 0
5 0 1 0 0 0 0 1 0 1 0
6 1 0 0 0 0 1 0 0 2 1
SCAREDC41 CDIC1 CDIC2 CDIC3 CDIC4 CDIC5 CDIC6 CDIC7 CDIC8 CDIC9 CDIC10 CDIC11 CDIC12 CDIC13 CDIC14 CDIC15
1 NA 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
3 NA 1 1 1 1 1 0 1 1 1 2 2 1 1 1 1
4 1 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 1 1 1 1 0 0 0 0 0 1 1 1 0 1
CDIC16 CDIC17 CDIC18 CDIC19 CDIC20 CDIC21 CDIC22 CDIC23 CDIC24 CDIC25 CDIC26 CDIC27 SCAREDC_T
1 0 1 0 0 0 0 0 0 0 0 0 0 NA
2 0 1 0 0 0 0 0 0 0 0 0 0 15.99
3 1 1 0 1 1 1 1 0 1 1 1 1 NA
4 1 1 0 0 0 1 1 0 1 0 0 0 9.84
5 0 2 0 0 0 2 0 0 0 0 1 0 6.97
6 1 0 0 0 0 0 1 0 1 0 0 0 13.94
scared_pd_score scared_pd_res scared_gad_score scared_gad_res scared_sad_score scared_sad_res
1 NA <NA> NA <NA> NA <NA>
2 0 no 3 no 4 no
3 NA <NA> NA <NA> NA <NA>
4 1 no 1 no 3 no
5 0 no 1 no 1 no
6 2 no 0 no 2 no
scared_socad_score scared_socad_res scared_ssa_score scared_ssa_res CDIC_T cdic_negmood cdic_interp
1 NA <NA> NA <NA> 1.89 1 0
2 8 yes 1 no 1.89 1 0
3 NA <NA> NA <NA> 25.92 7 4
4 5 no 0 no 11.07 3 0
5 5 no 0 no 5.13 0 1
6 5 no 5 yes 11.07 2 2
cdic_ineffect cdic_anhedonia cdic_selfesteem SCAREDP1 SCAREDP2 SCAREDP3 SCAREDP4 SCAREDP5 SCAREDP6
1 0 1 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0
3 12 7 5 0 0 0 0 1 0
4 8 5 1 0 0 0 0 0 0
5 0 4 0 0 0 1 0 0 0
6 12 3 1 0 0 0 0 0 0
SCAREDP7 SCAREDP8 SCAREDP9 SCAREDP10 SCAREDP11 SCAREDP12 SCAREDP13 SCAREDP14 SCAREDP15 SCAREDP16
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0 0 0
4 0 1 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 1 0 0 0 0 0 0 0 0
SCAREDP17 SCAREDP18 SCAREDP19 SCAREDP20 SCAREDP21 SCAREDP22 SCAREDP23 SCAREDP24 SCAREDP25 SCAREDP26
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 1
3 0 0 0 0 1 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0 0
5 0 1 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
SCAREDP27 SCAREDP28 SCAREDP29 SCAREDP30 SCAREDP31 SCAREDP32 SCAREDP33 SCAREDP34 SCAREDP35 SCAREDP36
1 0 0 0 0 0 1 0 0 1 0
2 0 0 0 0 0 1 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 1 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
SCAREDP37 SCAREDP38 SCAREDP39 SCAREDP40 SCAREDP41 CDIP1 CDIP2 CDIP3 CDIP4 CDIP5 CDIP6 CDIP7 CDIP8 CDIP9
1 0 0 0 0 1 0 0 0 0 0 0 1 0 0
2 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 0 0 0 0 1 1 1 1 1 1 1 1 0 0
4 0 0 0 0 1 1 1 1 1 0 1 0 0 2
5 0 0 0 0 0 1 1 0 0 0 1 1 0 1
6 0 0 0 0 0 0 0 0 0 0 0 0 0 1
CDIP10 CDIP11 CDIP12 CDIP13 CDIP14 CDIP15 CDIP16 CDIP17 SCAREDP_T CDIP_T
1 0 0 0 1 2 0 0 0 2.87 4.08
2 0 0 0 0 2 0 0 0 2.87 2.04
3 0 0 1 1 2 0 2 1 4.10 13.94
4 0 1 0 1 1 0 1 1 4.10 12.07
5 0 0 1 2 1 0 1 1 2.05 11.05
6 0 0 0 1 1 0 1 1 0.82 4.93
I'm trying to find the mean and standard deviation for SCAREDC and SCAREDP separately.
I have toyed around with this so far:
SCAREDC <- rowMeans(dplyr::select(child_parent_total, SCAREDC1:SCAREDC41), na.rm=TRUE)
This didn't yield what I needed it to.
Then I thought about just using the summary, but again it didn't give me what I needed.
So then I thought of using na.omit:
new_child_parent_total <- child_parent_total %>%
unlist %>%
na.omit %>%
as.data.frame
Of course though, this would take out all of the data since I have NAs throughout the dataframe.
What am I missing here? Is this a matter of aggregating my data into certain groups? Is there a way to do it using the dplyr functions as I tried earlier? (I should note, that has worked in the past when we had the child and parents organized in separate dataframes. My problem here is that I can't seem to figure out how find the means and standard deviations when they are in the same frame together)
I know describeby could force these descriptives, but again I'm not sure how to do that.
After discussing with my colleagues, it seems that there are multiple ways to do this: describeby, na.rm/na.omit, aggregate, etc.
The way I'm currently trying out is based off of the work of a postdoc in my lab. He started to organize the data through familySID, and then find the total of SCAREDC and SCAREDP (SCAREDC_T and SCAREDP_T respectively) and through that use describeby to find those descriptives. He then did this separately for each time period and later the subscales, but that is above and beyond what I need to do.
My current rework looks like this right now:
load(file="SCARED_Practice.rda")
child_parent_total$SCAREDC_T #adds new column for Scared Child Totals
child_parent_total$SCAREDP_T #adds new column for Scared Parent Totals
child_parent_total$familySID <- as.numeric(child_parent_total$familySID) #looks for familySID, describes as numeric
child_parent_total <- child_parent_total[order(child_parent_total$familySID),] #orders child_parent_total by familySID
So, I think the angle I want to take is to order these, then aggregate and find totals, and then find the descriptives using describeby in order to avoid NAs. I'm stuck though. Where am I going wrong?
回答1:
Try using this :
library(dplyr)
child_parent_total %>%
#Select only columns that have SCARED in their name
#i.e SCAREDP and SCAREDC
select(starts_with('SCARED')) %>%
#Get the data in long format, remove NA values
tidyr::pivot_longer(cols = everything(), values_drop_na = TRUE) %>%
#Create a group for parent and child
group_by(grp = c('Parent', 'Child')[grepl('SCAREDC\\d+', name) + 1]) %>%
#Take mean and standard deviation for each group
summarise(mean = mean(value), sd = sd(value))
来源:https://stackoverflow.com/questions/61333985/how-to-find-the-mean-and-standard-deviation-of-rows-in-dataframes-with-some-havi