R语言之字符串 | 易学教程

格式化数字

formatC()

sprintf()

format()

prettyNum()

输入都是numeric类型（包括数组），输出是character字符向量或数组

> pow <- 1:3
> (powers_of_e <- exp(pow))
[1]  2.718282  7.389056 20.085537
> formatC(powers_of_e)
[1] "2.718" "7.389" "20.09"
> formatC(powers_of_e,digits=3)
[1] "2.72" "7.39" "20.1"
> formatC(powers_of_e,digits=3,width=10)
[1] "      2.72" "      7.39" "      20.1"
> formatC(powers_of_e,digits=3,format="e")
[1] "2.718e+00" "7.389e+00" "2.009e+01"
> formatC(powers_of_e,digits=3,flag="+")
[1] "+2.72" "+7.39" "+20.1"
> sprintf("%s %d=%f","Euler's constant to the power",pow,powers_of_e)
[1] "Euler's constant to the power 1=2.718282" 
[2] "Euler's constant to the power 2=7.389056" 
[3] "Euler's constant to the power 3=20.085537"
> sprintf("To three decimal places,e^%d=%.3f",pow,powers_of_e)
[1] "To three decimal places,e^1=2.718"  "To three decimal places,e^2=7.389" 
[3] "To three decimal places,e^3=20.086"
> sprintf("In scientific notation,e^%d=%e",pow,powers_of_e)
[1] "In scientific notation,e^1=2.718282e+00"
[2] "In scientific notation,e^2=7.389056e+00"
[3] "In scientific notation,e^3=2.008554e+01"
> format(powers_of_e)
[1] " 2.718282" " 7.389056" "20.085537"
> format(powers_of_e,digits=3)
[1] " 2.72" " 7.39" "20.09"
> format(powers_of_e,digits=3,trim=TRUE)
[1] "2.72"  "7.39"  "20.09"
> format(powers_of_e,digits=3,scientific=TRUE)
[1] "2.72e+00" "7.39e+00" "2.01e+01"
> prettyNum(
+ c(1e10,1e-20),
+ big.mark=",",
+ small.mark=" ",
+ preserve.width="individual",
+ scientific=FALSE
+ )
[1] "10,000,000,000"            "0.00000 00000 00000 00001"

就像要经常处理数字和逻辑值一样，有时你也必须要处理文本

创建字符串

1.生物

> c(
+ "You should use double quotes most of the time",
+ 'Single quotes are better for including" inside the string'
+ )
[1] "You should use double quotes most of the time"             
[2] "Single quotes are better for including\" inside the string"

> cat("red","yellow","lorry")
red yellow lorry

2.杂交、合金

paste 函数能将不同字符串组合在起来。在它传入的参数向量中，每个元素都能自我循环以达到最长的矢量长度，然后字符串就被拼接在一起，中间以空格分开。可以使用参数sep 更改分隔符，或使用相关的paste0函数去掉分隔符。所有的字符串被组合后，可使用collapse 参数把结果收缩成一个包含所有元素的字符串

> paste(c("red","yellow"),"lorry")
[1] "red lorry"    "yellow lorry"
> paste(c("red","yellow"),"lorry",seq="-")
[1] "red lorry -"    "yellow lorry -"
> paste(c("red","yellow"),"lorry",collapse=",")
[1] "red lorry,yellow lorry"
> paste0(c("red","yellow"),"lorry")
[1] "redlorry"    "yellowlorry"

3.化妆

toString 函数是 paste 的变种，它在打印向量时非常有用。它使用逗号和空格分隔每个元素，且可限制打印的数量。

> x <- (1:15)^2
> toString(x)
[1] "1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225"
> toString(x,width=40)
[1] "1, 4, 9, 16, 25, 36, 49, 64, 81, 100...."

4.分尸

> strsplit(woodchuck," ",fixed=TRUE)    #fixed=TRUE意味着split的参数是固定长度的字符串而非正则表达式

请注意， strsplit 返回的是列表（而非字符向量或矩阵）。

在我们的例子中，某些词最后的逗号有些烦人。最好的方法是在空格分割符后加一个可选的逗号，使用正则表达式就很容易搞定。 ? 意味着“前面的字符可选”

split(woodchuck," ",fixed=TRUE)

用户体验

通常情况下，当字符串打印到控制台时，它们会以双引号括起来。如果对它们使用noquote 函数，就可以去掉这些引号。有时，这会使文本更具可读性

> x <- c("I","saw","a","saw","that","could","out","saw","any","other","saw"
+ )
> x <- c(
+ "I","saw","a","saw","that","could","out",
+ "saw","any","other","saw","I","ever","saw"
+ )
> y <- noquote(x)
> x
 [1] "I"     "saw"   "a"     "saw"   "that"  "could" "out"   "saw"   "any"  
[10] "other" "saw"   "I"     "ever"  "saw"  
> y
 [1] I     saw   a     saw   that  could out   saw   any   other saw   I    
[13] ever  saw

有一些特殊的字符可以被包含在字符串中。例如，我们可以通过 \t 插入一个制表符。在下例中，我们使用 cat 而非 print，因为 print 执行的额外的转换动作会把制表符 \t 转换成反斜杠和一个“t.”。 cat 的参数 fill = TRUE 使光标在一行结束后移动到下一行，将光标移动到下一行是通过打印换行符 \n 完成的（这在所有平台上都一样。在 R 中，不要使用 \r 或 \r\n 来打印换行符，因为 \r 会将光标移动到当前行的开始并覆盖你所写的内容）

> cat("foo\tbar",fill=TRUE)
foo     bar
> cat("foo\nbar")
foo
bar> cat("foo\nbar",fill=TRUE)
foo
bar

打印反斜杠符时需要连续输入两个反斜杠符，以免被误认为特殊字符。

> cat("foo\\bar",fill=TRUE)
foo\bar

如果我们需要在字符串中使用双引号，那么双引号符前必须加一个反斜杠来转义。同样地，如果要在字符串中使用单引号，则单引号需要被转义。与之相反，如果在被双引号引用的字符串中使用单引号，或在被单引号引用的字符串中使用双引号，则并不需要对其进行转义

> cat("foo\"bar",fill=TRUE)
foo"bar
> cat('foo\'bar',fill=TRUE)
foo'bar

通过打印报警符 \a 能让我们的电脑发出提示声（beep），不过 alarm 函数也能完成此功能且可读性更好。当想要程序在一个耗时很长的分析任务结束后主动通知你（你不在开放式的办公室），这个函数就能派上用场

> cat("\a")
> alarm()

截取字符串

substring()

sunstr()

> substring(woodchuck,1:6,10)
[1] "How much w" "f a woodc"  " would c"   " chuck "    " woodc"    
[6] "uch w"     
> substr(woodchuck,1:6,10)
[1] "How much w" "f a woodc"  " would c"   " chuck "    " woodc"

更改大小写

toupper()

tolower()

> toupper("I'm Shouting")
[1] "I'M SHOUTING"
> tolower("I'm Whispering")
[1] "i'm whispering"

文件路径

R 有一个工作目录，默认为文件被读写的地方。我们可以使用 getwd 查看到它的位置，并使用 setwd 来改变它：

> getwd()
[1] "F:/CS&AC&ME/Data Analysis/Learning R/R-3.5.2/bin/x64"

请注意，每个路径的目录部分由正斜杠分隔，即使在 Windows 下也是这样。为了保持可移植性，在 R 中你可以始终对路径使用正斜杠。根据操作系统的不同，文件处理函数能够魔术般地把它们自动替换为反斜杠。你也可以使用双反斜杠来表示 Windows 的路径，不过正斜杠仍为首选

你可以使用 file.path 来从各个目录中创建文件路径。它会自动地在目录名称之间插入正斜杠。

> #file.path("c:","Program Files","R","R-devel")
> R.home()    #同样也是R的安装目录
[1] "F:/CS&AC&ME/DATAAN~1/LEARNI~1/R-35~1.2"

basename 只返回文件名，而不包括前面的目录位置。与之相反， dirname 只返回文件的目录

> file_name <-  "F:/CS&AC&ME/Data Analysis/Learning R/R-3.5.2/bin/x64/RGui.exe"
> basename(file_name)
[1] "RGui.exe"
> dirname(file_name)
[1] "F:/CS&AC&ME/Data Analysis/Learning R/R-3.5.2/bin/x64"

来源：CSDN

作者：乐苏

链接：https://blog.csdn.net/qq_42303300/article/details/103564898

标签

R语言