I have a file containing a certain number of lines. Each line looks like this:
TF_list_to_test10004/Nus_k0.345_t0.1_
A simple regular expression used with gsub()
:
x <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1"
gsub(".*:", "", x)
"PKMYT1"
See ?regex
or ?gsub
for more help.
You can use awk
like this:
awk -F: '{print $2}' /your/file
Some very simple move that I missed from the best response @Sacha Epskamp was to use the sub function, in this case to take everything before the ":"(instead of removing it), so it was very simple:
foo <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1"
# 1st, as she did to remove all before and up to ":":
gsub(".*:","",foo)
# 2nd, to keep everything before and up to ":":
gsub(":.*","",foo)
Basically, the same thing, just change the ":" position inside the sub argument. Hope it will help.
Below are 2 equivalent solutions:
The first uses perl's -a
autosplit feature to split each line into fields using :
, populate the F
fields array, and print the 2nd field $F[1]
(counted starting from field 0)
perl -F: -lane 'print $F[1]' file
The second uses a regular expression to substitute s///
from ^
the beginning of the line, .*:
any characters ending with a colon, with nothing
perl -pe 's/^.*://' file
There are certainly more than 2 ways in R. Here's another.
unlist(lapply(strsplit(foo, ':', fixed = TRUE), '[', 2))
If the string has a constant length I imagine substr
would be faster than this or regex methods.
I was working on a similar issue. John's and Josh O'Brien's advice did the trick. I started with this tibble:
library(dplyr)
my_tibble <- tibble(Col1=c("ABC:Content","BCDE:MoreContent","FG:Conent:with:colons"))
It looks like:
| Col1
1 | ABC:Content
2 | BCDE:MoreContent
3 | FG:Content:with:colons
I needed to create this tibble:
| Col1 | Col2 | Col3
1 | ABC:Content | ABC | Content
2 | BCDE:MoreContent | BCDE | MoreContent
3 | FG:Content:with:colons| FG | Content:with:colons
And did so with this code (R version 3.4.2).
my_tibble2 <- mutate(my_tibble
,Col2 = unlist(lapply(strsplit(Col1, ':',fixed = TRUE), '[', 1))
,Col3 = gsub("^[^:]*:", "", Col1))