R— Add leading zero to string, with no fixed string format

点点圈 提交于 2020-04-07 03:44:29

问题


I have a column as below.

9453, 55489, 4588, 18893, 4457, 2339, 45489HQ, 7833HQ

I would like to add leading zero if the number is less than 5 digits. However, some numbers have "HQ" in the end, some don't.(I did check other posts, they dont have similar problem in the "HQ" part)

so the finally desired output should be:

09453, 55489, 04588, 18893, 04457, 02339, 45489HQ, 07833HQ

any idea how to do this? Thank you so much for reading my post!


回答1:


A one-liner using regular expressions:

my_strings <- c("9453", "55489", "4588", 
      "18893", "4457", "2339", "45489HQ", "7833HQ")

gsub("^([0-9]{1,4})(HQ|$)", "0\\1\\2",my_strings)

[1] "09453"   "55489"   "04588"   "18893"   
    "04457"   "02339"   "45489HQ" "07833HQ"

Explanation:

^ start of string
[0-9]{1,4} one to four numbers in a row
(HQ|$) the string "HQ" or the end of the string

Parentheses represent capture groups in order. So 0\\1\\2 means 0 followed by the first capture group [0-9]{1,4} and the second capture group HQ|$.

Of course if there is 5 numbers, then the regex isn't matched, so it doesn't change.




回答2:


I was going to use the sprintf approach, but found the the stringr package provides a very easy solution.

library(stringr)
x <- c("9453", "55489", "4588", "18893", "4457", "2339", "45489HQ", "7833HQ")
[1] "9453"    "55489"   "4588"    "18893"   "4457"    "2339"    "45489HQ" "7833HQ"

This can be converted with one simple stringr::str_pad() function:

stringr::str_pad(x, 5, side="left", pad="0")
[1] "09453"   "55489"   "04588"   "18893"   "04457"   "02339"   "45489HQ" "7833HQ" 

If the number needs to be padded even if the total string width is >5, then the number and text need to be separated with regex. The following will work. It combines regex matching with the very helpful sprintf() function:

sprintf("%05.0f%s", # this encodes the format and recombines the number with padding (%05.0f) with text(%s)
        as.numeric(gsub("^(\\d+).*", "\\1", x)), #get the number
        gsub("[[:digit:]]+([a-zA-Z]*)$", "\\1", x)) #get just the text at the end
[1] "09453"   "55489"   "04588"   "18893"   "04457"   "02339"   "45489HQ" "07833HQ"



回答3:


Another attempt, which will also work in cases like "123" or "1HQR":

x <- c("18893","4457","45489HQ","7833HQ","123", "1HQR")
regmatches(x, regexpr("^\\d+", x)) <- sprintf("%05d", as.numeric(sub("\\D+$","",x)))
x
#[1] "18893"    "04457"    "45489HQ"  "07833HQ"  "00123"    "00001HQR"

This basically finds any numbers at the start of the string (^\\d+) and replaces them with a zero-padded (via sprintf) string that was subset out by removing any non-numeric characters (\\D+$) from the end of the string.




回答4:


We can use only sprintf() and gsub() by splitting up the parts then putting them back together.

sprintf("%05d%s", as.numeric(gsub("[^0-9]+", "", x)), gsub("[0-9]+", "", x))
# [1] "18893"    "04457"    "45489HQ"  "07833HQ"  "00123"    "00001HQR"

Using @thelatemail's data:

x <- c("18893", "4457", "45489HQ", "7833HQ", "123", "1HQR")


来源:https://stackoverflow.com/questions/48432173/r-add-leading-zero-to-string-with-no-fixed-string-format

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!