unpacking a sql select into a pandas dataframe

这一生的挚爱 提交于 2021-02-05 12:55:47

问题


Suppose I have a select roughly like this:

select instrument, price, date from my_prices;

How can I unpack the prices returned into a single dataframe with a series for each instrument and indexed on date?

To be clear: I'm looking for:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: ...
Data columns (total 2 columns):
inst_1    ...
inst_2    ...
dtypes: float64(1), object(1) 

I'm NOT looking for:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: ...
Data columns (total 2 columns):
instrument    ...
price         ...
dtypes: float64(1), object(1)

...which is easy ;-)


回答1:


Update: recent pandas have the following functions: read_sql_table and read_sql_query.

First create a db engine (a connection can also work here):

from sqlalchemy import create_engine
# see sqlalchemy docs for how to write this url for your database type:
engine = create_engine('mysql://scott:tiger@localhost/foo')

See sqlalchemy database urls.

pandas_read_sql_table

table_name = 'my_prices'
df = pd.read_sql_table(table_name, engine)

pandas_read_sql_query

df = pd.read_sql_query("SELECT instrument, price, date FROM my_prices;", engine)

The old answer had referenced read_frame which is has been deprecated (see the version history of this question for that answer).


It's often makes sense to read first, and then perform transformations to your requirements (as these are usually efficient and readable in pandas). In your example, you can pivot the result:

df.reset_index().pivot('date', 'instrument', 'price')

Note: You could miss out the reset_index you don't specify an index_col in the read_frame.




回答2:


You can pass a cursor object to the DataFrame constructor. For postgres:

import psycopg2
conn = psycopg2.connect("dbname='db' user='user' host='host' password='pass'")
cur = conn.cursor()
cur.execute("select instrument, price, date from my_prices")
df = DataFrame(cur.fetchall(), columns=['instrument', 'price', 'date'])

then set index like

df.set_index('date', drop=False)

or directly:

df.index =  df['date']



回答3:


This connect with postgres and pandas with remote postgresql

# CONNECT TO POSTGRES USING PANDAS
import psycopg2 as pg
import pandas.io.sql as psql

this is used to establish the connection with postgres db

connection = pg.connect("host=192.168.0.1 dbname=db user=postgres")

this is used to read the table from postgres db

dataframe = psql.read_sql("SELECT * FROM DB.Table", connection)



回答4:


import pandas as pd
import pandas.io.sql as sqlio
import psycopg2

conn = psycopg2.connect("host='{}' port={} dbname='{}' user={} password={}".format(host, port, dbname, username, pwd))
sql = "select count(*) from table;"
dat = sqlio.read_sql_query(sql, conn)
conn = None

import pandas as pd

conn = psycopg2.connect("host='{}' port={} dbname='{}' user={} password={}".format(host, port, dbname, username, pwd))
sql = "select count(*) from table;"
dat = pd.read_sql_query(sql, conn)
conn = None


来源:https://stackoverflow.com/questions/17156084/unpacking-a-sql-select-into-a-pandas-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!