问题
I have a data frame resembling the extract below:
Observation Identifier Value
Obs001 ABC_2001 54
Obs002 ABC_2002 -2
Obs003 1
Obs004 1
Obs005 Def_2001/05
I would like to transform this data frame into a data frame where portions of the string after the "_" sign would be removed: as illustrated below:
Observation Identifier_NoTime Value
Obs001 ABC 54
Obs002 ABC -2
Obs003 1
Obs004 1
Obs005 Def
I tried experimenting with strsplit
, gsub
and sub
as discussed here but cannot force those commends to work. I have to account for the fact that:
- Column has missing values and I want to leave them where they are
- String "_" is located in different places in the variable
- I also want to leave the rest of the data frame the way it is
回答1:
You could try the below sub
command to remove all the non-space characters from _
symbol.
sub("_\\S*", "", string)
Explanation:
_
Matches a literal_
symbol.\S*
Matches zero or more non-space characters.
OR
This would remove all the characters from _
symbol,
sub("_.*", "", string)
Explanation:
_
Matches a literal_
symbol..*
Matches any character zero or more times.
来源:https://stackoverflow.com/questions/26611922/remove-everything-after-a-string-in-a-data-frame-column-with-missing-values