Sample Datasets in Pandas

匿名 (未验证) 提交于 2019-12-03 02:46:02

问题:

When using R it's handy to load "practice" datasets using

data(iris) 

or

data(mtcars) 

Is there something similar for Pandas? I know I can load using any other method, just curious if there's anything builtin

回答1:

The rpy2 module is made for this:

from rpy2.robjects import r, pandas2ri pandas2ri.activate()  r['iris'].head() 

yields

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species 1           5.1          3.5           1.4          0.2  setosa 2           4.9          3.0           1.4          0.2  setosa 3           4.7          3.2           1.3          0.2  setosa 4           4.6          3.1           1.5          0.2  setosa 5           5.0          3.6           1.4          0.2  setosa 

Up to pandas 0.19 you could use pandas' own rpy interface:

import pandas.rpy.common as rcom iris = rcom.load_data('iris') print(iris.head()) 

yields

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species 1           5.1          3.5           1.4          0.2  setosa 2           4.9          3.0           1.4          0.2  setosa 3           4.7          3.2           1.3          0.2  setosa 4           4.6          3.1           1.5          0.2  setosa 5           5.0          3.6           1.4          0.2  setosa 

rpy2 also provides a way to convert R objects into Python objects:

import pandas as pd import rpy2.robjects as ro import rpy2.robjects.conversion as conversion from rpy2.robjects import pandas2ri pandas2ri.activate()  R = ro.r  df = conversion.ri2py(R['mtcars']) print(df.head()) 

yields

    mpg  cyl  disp   hp  drat     wt   qsec  vs  am  gear  carb 0  21.0    6   160  110  3.90  2.620  16.46   0   1     4     4 1  21.0    6   160  110  3.90  2.875  17.02   0   1     4     4 2  22.8    4   108   93  3.85  2.320  18.61   1   1     4     1 3  21.4    6   258  110  3.08  3.215  19.44   1   0     3     1 4  18.7    8   360  175  3.15  3.440  17.02   0   0     3     2 


回答2:

An alternative solution is to use the brilliant plotting package seaborn. Importing only the api avoids changing matplotlib.rcParams, which is the default behavior.

import seaborn.apionly as sns iris = sns.load_dataset('iris') print(iris.head()) 

yields the same as for the R solution.

   sepal_length  sepal_width  petal_length  petal_width species 0           5.1          3.5           1.4          0.2  setosa 1           4.9          3.0           1.4          0.2  setosa 2           4.7          3.2           1.3          0.2  setosa 3           4.6          3.1           1.5          0.2  setosa 4           5.0          3.6           1.4          0.2  setosa 

Edit: If you do not want to import seaborn at all, but still want to access its sample data sets, you can use @andrewwowens's approach for the seaborn sample data:

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') 

Note that the sample data sets containing categorical columns have their column type modified by sns.load_dataset() and the result might not be the same by getting it from the url directly.

Edit: The iris and tips sample data sets are also available in pandas' github repo here.



回答3:

Any publically available .csv file can be loaded into pandas extremely quickly using its URL. Here is an example using one of the UCLA .csv datasets.

import pandas as pd  file_name = "http://www.ats.ucla.edu/stat/data/binary.csv"  df = pd.read_csv(file_name)  df.head() 

The output here being the .csv file header you just loaded from the given URL.

>>>df.head()    admit  gre   gpa  prestige 0      0  380  3.61         3 1      1  660  3.67         3 2      1  800  4.00         1 3      1  640  3.19         4 4      0  520  2.93         4 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!