Code to import data from a Stack overflow query into R

后端 未结 4 911
长情又很酷
长情又很酷 2020-12-01 04:08

When I try to answer a question in Stack Overflow about R, a good part of my time is spent trying to rebuild the data given as example (unless the question author has been n

相关标签:
4条回答
  • 2020-12-01 04:54

    You can also ask the questioner to use the dput function which dumps any data structure in a way that can be just copy-pasted into R. e.g.

    > zz
      a  b   c
    1 1 11 foo
    2 2 12 bar
    3 3 13 baz
    4 4 14 bar
    5 5 15 foo
    
    > dput(zz)
    structure(list(a = 1:5, b = 11:15, c = structure(c(3L, 1L, 2L, 
    1L, 3L), .Label = c("bar", "baz", "foo"), class = "factor")), .Names = c("a", 
    "b", "c"), class = "data.frame", row.names = c(NA, -5L))
    
    > xx <- structure(list(a = 1:5, b = 11:15, c = structure(c(3L, 1L, 2L, 
    + 1L, 3L), .Label = c("bar", "baz", "foo"), class = "factor")), .Names = c("a", 
    + "b", "c"), class = "data.frame", row.names = c(NA, -5L))
    > xx
      a  b   c
    1 1 11 foo
    2 2 12 bar
    3 3 13 baz
    4 4 14 bar
    5 5 15 foo
    
    0 讨论(0)
  • 2020-12-01 04:56

    Recent version of R now offer an even lower keystroke option than the textConnection route for entry of columnar data into read.table and friends. faced with this:

    zz
      a  b   c
    1 1 11 foo
    2 2 12 bar
    3 3 13 baz
    4 4 14 bar
    5 5 15 foo
    

    One can simply insert : <- read.table(text=" after the zz, delete the carriage-return and then insert ", header=TRUE) after the last foo and type [enter].

    zz<- read.table(text="  a  b   c
    1 1 11 foo
    2 2 12 bar
    3 3 13 baz
    4 4 14 bar
    5 5 15 foo", header=TRUE)
    

    One can also use scan to efficiently enter long sequences of pure numbers or pure character vector entries. Faced with: 67 75 44 25 99 37 6 96 77 21 31 41 5 52 13 46 14 70 100 18 , one can simply type: zz <- scan() and hit [enter]. Then paste the selected numbers and hit [enter] again and perhaps a second time to cause a double carriage-return and the console should respond "read 20 items".

    > zz <- scan()
    1: 67  75  44  25  99  37   6  96  77  21  31  41   5  52  13  46  14  70 100  18
    21: 
    Read 20 items
    

    The "character" task. after pasting to console and editing out extraneous line-feeds and adding quotes, then hitting [enter]:

    > countries <- scan(what="character")
    1:     'republic of congo'
    2:     'republic of the congo'
    3:     'congo, republic of the'
    4:     'congo, republic'
    5: 'democratic republic of the congo'
    6: 'congo, democratic republic of the'
    7: 'dem rep of the congo'
    8: 
    Read 7 items
    
    0 讨论(0)
  • 2020-12-01 05:06

    Just want to add this because I now use it regularly and I think it's quite useful. There is a package overflow (install instructions below) that has a function to read copied data frames. Say I begin with an SO post that contains the data shown as the following, but with no dput output.

      Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    3          4.7         3.2          1.3         0.2  setosa
    4          4.6         3.1          1.5         0.2  setosa
    5          5.0         3.6          1.4         0.2  setosa
    6          5.4         3.9          1.7         0.4  setosa
    

    Now if I copy that data directly, and then run the following

    library(overflow)
    soread()
    # data.frame “mydf” created in your workspace
    #   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    # 1          5.1         3.5          1.4         0.2  setosa
    # 2          4.9         3.0          1.4         0.2  setosa
    # 3          4.7         3.2          1.3         0.2  setosa
    # 4          4.6         3.1          1.5         0.2  setosa
    # 5          5.0         3.6          1.4         0.2  setosa
    # 6          5.4         3.9          1.7         0.4  setosa
    

    I now have a data frame named mydf identical to the one I copied in my global environment, so I don't have to wait for the OP to post a dput of their data frame. I can change the name of the data frame with the out argument, which (obviously) defaults to mydf. There are also a few other useful functions for working with SO posts in the package (like sopkgs(), which installs a package temporarily so you can help with a question about a package that you have not previously installed).

    If you leave library(overflow) in your .Rprofile, then soread() makes pretty quick work of importing data from SO posts.

    overflow is available from GitHub, and can be installed with

    library(devtools)
    install_github("overflow", "sebastian-c")
    
    0 讨论(0)
  • 2020-12-01 05:11

    Maybe textConnection() is what you want here:

    R> zz <- read.table(textConnection("a  b   c
    1 11 foo
    2 12 bar
    3 13 baz
    4 14 bar
    5 15 foo"), header=TRUE)
    R> zz
      a  b   c
    1 1 11 foo
    2 2 12 bar
    3 3 13 baz
    4 4 14 bar
    5 5 15 foo
    R> 
    

    It allows you to treat the text as a "connection" from which to read. You can also just copy and paste, but access from the clipboard is more dependent on the operating system and hence less portable.

    0 讨论(0)
提交回复
热议问题