Read R function output as columns

╄→尐↘猪︶ㄣ 提交于 2019-12-12 06:49:41

问题


I'm trying to come up with a way to solve this question I asked yesterday:

rpy2 fails to import 'rgl' R package

My goal is to check if certain packages are installed inside R from within python.

Following the recommendation by Dirk Eddelbuettel given in a comment on his answer, I'm using the installed.packages() function from R to list all the available packages.

This is what I've got so far:

from rpy2.rinterface import RRuntimeError
from rpy2.robjects.packages import importr
utils = importr('utils')

def importr_tryhard(packname, contriburl):
    try:
        rpack = utils.installed_packages()
    except RRuntimeError:
        rpack = []
    return rpack

contriburl = 'http://cran.stat.ucla.edu/'
rpack = importr_tryhard(packname, contriburl)
print rpack

Which returns a quite large output of the form:

           Package      LibPath                         Version   
ks         "ks"         "/usr/local/lib/R/site-library" "1.8.13"  
misc3d     "misc3d"     "/usr/local/lib/R/site-library" "0.8-4"   
mvtnorm    "mvtnorm"    "/usr/local/lib/R/site-library" "0.9-9996"
rgl        "rgl"        "/usr/local/lib/R/site-library" "0.93.986"
base       "base"       "/usr/lib/R/library"            "3.0.1"   
boot       "boot"       "/usr/lib/R/library"            "1.3-9"   
class      "class"      "/usr/lib/R/library"            "7.3-9"   
cluster    "cluster"    "/usr/lib/R/library"            "1.14.4"  
codetools  "codetools"  "/usr/lib/R/library"            "0.2-8"   
compiler   "compiler"   "/usr/lib/R/library"            "3.0.1"   
datasets   "datasets"   "/usr/lib/R/library"            "3.0.1"   
foreign    "foreign"    "/usr/lib/R/library"            "0.8-49"  
graphics   "graphics"   "/usr/lib/R/library"            "3.0.1"   
grDevices  "grDevices"  "/usr/lib/R/library"            "3.0.1"   
grid       "grid"       "/usr/lib/R/library"            "3.0.1"   
KernSmooth "KernSmooth" "/usr/lib/R/library"            "2.23-10" 
lattice    "lattice"    "/usr/lib/R/library"            "0.20-23" 
MASS       "MASS"       "/usr/lib/R/library"            "7.3-29"  
Matrix     "Matrix"     "/usr/lib/R/library"            "1.0-14"  
methods    "methods"    "/usr/lib/R/library"            "3.0.1"   
mgcv       "mgcv"       "/usr/lib/R/library"            "1.7-26"  
nlme       "nlme"       "/usr/lib/R/library"            "3.1-111" 
nnet       "nnet"       "/usr/lib/R/library"            "7.3-7"   
parallel   "parallel"   "/usr/lib/R/library"            "3.0.1"   
rpart      "rpart"      "/usr/lib/R/library"            "4.1-3"   
spatial    "spatial"    "/usr/lib/R/library"            "7.3-6"   
splines    "splines"    "/usr/lib/R/library"            "3.0.1"   
stats      "stats"      "/usr/lib/R/library"            "3.0.1"   
stats4     "stats4"     "/usr/lib/R/library"            "3.0.1"   
survival   "survival"   "/usr/lib/R/library"            "2.37-4"  
tcltk      "tcltk"      "/usr/lib/R/library"            "3.0.1"   
tools      "tools"      "/usr/lib/R/library"            "3.0.1"   
utils      "utils"      "/usr/lib/R/library"            "3.0.1"   
           Priority     
ks         NA           
misc3d     NA           
mvtnorm    NA           
rgl        NA           
base       "base"       
boot       "recommended"
class      "recommended"
cluster    "recommended"
...

I need to extract just the names of the packages installed, so either the first or the second columns would be enough for me.

I've tried using np.loadtxt(), np.genfromtxt() and with open(rpack) as csvfile:, but none was able to give back a list/array where either the columns or the rows was correctly separated (they all failed with different errors actually).

How could I read this output in column form, or more to the point, extract the names of the installed packages in a list/array?


回答1:


rpack in your case is an rpy2.robjects.vectors.Matrix object. Therefore you can simply use rpy2 class method .rx() to extract the column:

mylist = list(rpack.rx(True, 1))

Have a try.




回答2:


I've not used r2py before, but it looks like it's some kind of r2py object, and that might have an option to just grab that first column.

You could hapily parse it like a text file though; when you call print XXX it grabs the string representation of the object.

Try doing something like this:

s = str(rpack)
packages = [line.split()[0] for line in s.split("\n")[1:]]

You should try both the str and repr methods to get the string representation though, some people don't use both, or use them differently.

This doesn't feel like the cleanest way to do it though, and you'll have to make sure you parse the data correctly. Try printing dir(rpack) and seeing if there are any attributes in there which sound like they'll contain what you want.

A little bit of digging, the installed_packages documentation, and a quick peek at an R tutorial suggests you can just do this:

print mpack[,"Package"]


来源:https://stackoverflow.com/questions/28697549/read-r-function-output-as-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!