问题
Given a dataframe df as follows:
chrom position strand value label
chr1 17432 - 0 romeo
chr1 17433 - 0 romeo
chr1 17434 - 0 romeo
chr1 17435 - 0 romeo
chr1 17409 - 1 juliet
chr1 17410 - 1 juliet
chr1 17411 - 1 juliet
For each group of labels, I would like to number the lines that share the same label starting from 1 and put those numbers in a new column. (I don't just want to count them, the goal is to number them). The output should look something like this:
chrom position strand value label number
chr1 17432 - 0 romeo 1
chr1 17433 - 0 romeo 2
chr1 17434 - 0 romeo 3
chr1 17435 - 0 romeo 4
chr1 17409 - 1 juliet 1
chr1 17410 - 1 juliet 2
chr1 17411 - 1 juliet 3
Is there a function or package that does the job?
回答1:
dat <- read.table(header = TRUE, text = "chrom position strand value label
chr1 17432 - 0 romeo
chr1 17433 - 0 romeo
chr1 17434 - 0 romeo
chr1 17435 - 0 romeo
chr1 17409 - 1 juliet
chr1 17410 - 1 juliet
chr1 17411 - 1 juliet")
#install.packages('dplyr')
library(dplyr)
dat %.%
group_by(label) %.%
mutate(number = 1:n())
Source: local data frame [7 x 6]
Groups: label
chrom position strand value label number
1 chr1 17432 - 0 romeo 1
2 chr1 17433 - 0 romeo 2
3 chr1 17434 - 0 romeo 3
4 chr1 17435 - 0 romeo 4
5 chr1 17409 - 1 juliet 1
6 chr1 17410 - 1 juliet 2
7 chr1 17411 - 1 juliet 3
I am sure there are many other possibilities in R. Data.Table is one (see example below). Not sure why I needed to add print() to show the result however.
require(data.table)
dt <- data.table(dat)
print(dt[, number := 1:.N, by = label])
chrom position strand value label number
1: chr1 17432 - 0 romeo 1
2: chr1 17433 - 0 romeo 2
3: chr1 17434 - 0 romeo 3
4: chr1 17435 - 0 romeo 4
5: chr1 17409 - 1 juliet 1
6: chr1 17410 - 1 juliet 2
7: chr1 17411 - 1 juliet 3
回答2:
Executing Vincents solution resulted in an error for me:
could not find function "%.%"
However changing %.% for %>% did the trick for me:
library(dplyr)
dat %>%
group_by(label) %>%
mutate(number = 1:n())
Note, I'm using dplyr version 0.7.1
来源:https://stackoverflow.com/questions/21663752/r-assign-incremental-numbers-to-rows-containing-a-same-label