问题
Im trying to read CSV file thats on github with Python using pandas> i have looked all over the web, and I tried some solution that I found on this website, but they do not work. What am I doing wrong?
I have tried this:
import pandas as pd
url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)
print(df.head(5))
回答1:
You should provide URL to raw content. Try using this:
import pandas as pd
url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))
Output:
alpha-2 ... intermediate-region-code
name ...
Afghanistan AF ... NaN
Åland Islands AX ... NaN
Albania AL ... NaN
Algeria DZ ... NaN
American Samoa AS ... NaN
回答2:
Add ?raw=true at the end of the GitHub URL to get the raw file link.
In your case,
import pandas as pd
url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)
print(df.head(5))
Note: This works only with GitHub links and not with GitLab or Bitbucket links.
回答3:
I recommend to either use pandas as you tried to and others here have explained, or depending on the application, the python csv-handler CommaSeperatedPython, which is a minimalistic wrapper for the native csv-library.
The library returns the contents of a file as a 2-Dimensional String-Array. It's is in its very early stage though, so if you want to do large scale data-analysis, I would suggest Pandas.
回答4:
You can copy/paste the url and change 2 things:
- Remove "blob"
- Replace github.com by raw.githubusercontent.com
For instance this link:
https://github.com/mwaskom/seaborn-data/blob/master/iris.csv
Works this way:
import pandas as pd
pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
来源:https://stackoverflow.com/questions/55240330/how-to-read-csv-file-from-github-using-pandas