When using R it's handy to load "practice" datasets using
data(iris) or
data(mtcars) Is there something similar for Pandas? I know I can load using any other method, just curious if there's anything builtin
When using R it's handy to load "practice" datasets using
data(iris) or
data(mtcars) Is there something similar for Pandas? I know I can load using any other method, just curious if there's anything builtin
The rpy2 module is made for this:
from rpy2.robjects import r, pandas2ri pandas2ri.activate() r['iris'].head() yields
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa Up to pandas 0.19 you could use pandas' own rpy interface:
import pandas.rpy.common as rcom iris = rcom.load_data('iris') print(iris.head()) yields
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa rpy2 also provides a way to convert R objects into Python objects:
import pandas as pd import rpy2.robjects as ro import rpy2.robjects.conversion as conversion from rpy2.robjects import pandas2ri pandas2ri.activate() R = ro.r df = conversion.ri2py(R['mtcars']) print(df.head()) yields
mpg cyl disp hp drat wt qsec vs am gear carb 0 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 An alternative solution is to use the brilliant plotting package seaborn. Importing only the api avoids changing matplotlib.rcParams, which is the default behavior.
import seaborn.apionly as sns iris = sns.load_dataset('iris') print(iris.head()) yields the same as for the R solution.
sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Edit: If you do not want to import seaborn at all, but still want to access its sample data sets, you can use @andrewwowens's approach for the seaborn sample data:
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') Note that the sample data sets containing categorical columns have their column type modified by sns.load_dataset() and the result might not be the same by getting it from the url directly.
Edit: The iris and tips sample data sets are also available in pandas' github repo here.
Any publically available .csv file can be loaded into pandas extremely quickly using its URL. Here is an example using one of the UCLA .csv datasets.
import pandas as pd file_name = "http://www.ats.ucla.edu/stat/data/binary.csv" df = pd.read_csv(file_name) df.head() The output here being the .csv file header you just loaded from the given URL.
>>>df.head() admit gre gpa prestige 0 0 380 3.61 3 1 1 660 3.67 3 2 1 800 4.00 1 3 1 640 3.19 4 4 0 520 2.93 4