问题
I have a dataset of 240 cases, in which I want to create a blank row after each existing row. Leaving me with 480 rows, of which half is filled and the other half is empty (which I then want to fill with some data myself).
Example of data
id groep_MNC zkhs fbeh pgebdat p_age pgesl
1 3 1 1 1 1955-12-01 42.50000 1
2 5 1 1 1 1943-04-09 55.16667 1
3 7 1 1 1 1958-04-10 40.25000 1
4 10 1 1 1 1958-04-17 40.25000 1
5 12 1 1 2 1947-11-01 50.66667 1
6 14 1 1 2 1952-02-02 46.41667 1
Ideally, 'id' should be copied, thus looking like this:
id groep_MNC zkhs fbeh pgebdat p_age pgesl
1 3 1 1 1 1955-12-01 42.50000 1
2 3 NA NA NA NA NA NA
3 5 1 1 1 1943-04-09 55.16667 1
4 5 NA NA NA NA NA NA
5 7 1 1 1 1958-04-10 40.25000 1
6 7 NA NA NA NA NA NA
7 10 1 1 1 1958-04-17 40.25000 1
8 10 NA NA NA NA NA NA
9 12 1 1 2 1947-11-01 50.66667 1
10 12 NA NA NA NA NA NA
11 14 1 1 2 1952-02-02 46.41667 1
12 14 NA NA NA NA NA NA
I've tried copying all the rows with this code:
mydf_long <- mydf[rep(1:nrow(mydf), each = 2),]
But as you can see, that is not even close to what I want to end up with.
Edit: Thanks for the edits and comments. I need to transform my original data to a format that is suitable for multilevel analyses. However, the data is still quite messy so other approaches that initially worked on a small subset of my data, didn't work on my full set. For more information about the background, see my other questions:
Reshape/gather function to create dataset ready for multilevel analysis
Tidy up and reshape messy dataset (reshape/gather/unite function)?
R - replace values by row given some statement in if loop with another value in same df
Since I have relative 'few' partner variables, I now want to create blank lines, and fill them in with the partner data.
回答1:
We can duplicate each row and then set the row with even row numbers to be NA
.
dt2 <- dt[rep(1:nrow(dt), each = 2), ]
dt2[1:nrow(dt2) %% 2 == 0, ] <- NA
head(dt2)
id groep_MNC zkhs fbeh pgebdat p_age pgesl
1 3 1 1 1 1955-12-01 42.50000 1
1.1 NA NA NA NA <NA> NA NA
2 5 1 1 1 1943-04-09 55.16667 1
2.1 NA NA NA NA <NA> NA NA
3 7 1 1 1 1958-04-10 40.25000 1
3.1 NA NA NA NA <NA> NA NA
DATA
dt <- read.table(text = " id groep_MNC zkhs fbeh pgebdat p_age pgesl
1 3 1 1 1 1955-12-01 42.50000 1
2 5 1 1 1 1943-04-09 55.16667 1
3 7 1 1 1 1958-04-10 40.25000 1
4 10 1 1 1 1958-04-17 40.25000 1
5 12 1 1 2 1947-11-01 50.66667 1
6 14 1 1 2 1952-02-02 46.41667 1",
header = TRUE, stringsAsFactors = FALSE)
回答2:
Try this:
require(dplyr)
df %>%
group_by(id) %>%
do(rbind(.,c(.$id,rep(NA,NCOL(df)-1)))) %>%
ungroup() %>% data.frame()
Output:
id groep_MNC zkhs fbeh pgebdat p_age pgesl
1 3 1 1 1 1955-12-01 42.50000 1
2 3 NA NA NA <NA> NA NA
3 5 1 1 1 1943-04-09 55.16667 1
4 5 NA NA NA <NA> NA NA
5 7 1 1 1 1958-04-10 40.25000 1
6 7 NA NA NA <NA> NA NA
7 10 1 1 1 1958-04-17 40.25000 1
8 10 NA NA NA <NA> NA NA
9 12 1 1 2 1947-11-01 50.66667 1
10 12 NA NA NA <NA> NA NA
11 14 1 1 2 1952-02-02 46.41667 1
12 14 NA NA NA <NA> NA NA
Sample data:
require(data.table)
df <- fread("id groep_MNC zkhs fbeh pgebdat p_age pgesl
3 1 1 1 1955-12-01 42.50000 1
5 1 1 1 1943-04-09 55.16667 1
7 1 1 1 1958-04-10 40.25000 1
10 1 1 1 1958-04-17 40.25000 1
12 1 1 2 1947-11-01 50.66667 1
14 1 1 2 1952-02-02 46.41667 1")
回答3:
Another option using dplyr
:
library(dplyr)
df %>%
split(df$id) %>%
Map(rbind, ., NA) %>%
do.call(rbind, .) %>%
mutate(id = rep(df$id, each = 2))
Or you can use map_dfr
from purrr
:
library(purrr)
df %>%
group_by(id) %>%
map_dfr(rbind, NA) %>%
mutate(id = rep(df$id, each = 2))
Result:
# A tibble: 12 x 7
id groep_MNC zkhs fbeh pgebdat p_age pgesl
<int> <int> <int> <int> <chr> <dbl> <int>
1 3 1 1 1 1955-12-01 42.50000 1
2 3 NA NA NA <NA> NA NA
3 5 1 1 1 1943-04-09 55.16667 1
4 5 NA NA NA <NA> NA NA
5 7 1 1 1 1958-04-10 40.25000 1
6 7 NA NA NA <NA> NA NA
7 10 1 1 1 1958-04-17 40.25000 1
8 10 NA NA NA <NA> NA NA
9 12 1 1 2 1947-11-01 50.66667 1
10 12 NA NA NA <NA> NA NA
11 14 1 1 2 1952-02-02 46.41667 1
12 14 NA NA NA <NA> NA NA
来源:https://stackoverflow.com/questions/46262655/add-blank-rows-in-between-existing-rows