Python SQL loop variables through multiple queries

问题

I'm having trouble with a Python Teradata (tdodbc) query with looping through the same query with different variables and merging the results. I received good direction in another post and ended up here. My issue now is that the dataframe only ends up with query results of the final variable in the loop, "state5". Unfortunately we have 5 states each in their own databases with the same schema. I can run the same query, but want to loop the variables so I can run for all 5 states and return an appended query. This was easy using SAS Macro variables and mending, but need to bring data to python for EDA and data science.

from teradata import tdodbc
udaExec = td.UdaExec(appConfigFile="udaexec.ini")
with udaExec.connect("${dataSourceName}") as session:


    state_dataframes = []
    STATES = ["state1", "state2", "state3", "state4", "state5"]

    for state in STATES:

    query1 = """database my_db_{};"""

    query2 = """      
        select top 10
        '{}' as state
        ,a.*
        from table_a
        """

    session.execute(query1.format(state))
    session.execute(query2.format(state))

    state_dataframes.append(pd.read_sql(query2, session))
    all_states_df = pd.concat(state_dataframes)

回答1:

I was able to finally get this to work although it may not be the most eloquent way to do it. I did try to do the drop tables as a single variable "query5" but was receiving a DDL error. Once I separated each drop table into it's own session.execute, it worked.

udaExec = td.UdaExec(appConfigFile="udaexec.ini")

with udaExec.connect("${dataSourceName}") as session:

    state_dataframes = []
    STATES = ["state1", "state2", "state3", "state4", "state5"]

    for state in STATES:

            query1 = """database my_db_{};"""

            query2 = """   
            create set volatile table v_table
            ,no fallback, no before journal, no after journal as
            (  
            select top 10
            '{}' as state
            ,t.*
            from table t
            )   
            with data
            primary index (dw_key)  
            on commit preserve rows;
            """

            query3 = """
            create set volatile table v_table_2
            ,no fallback, no before journal, no after journal as
            (  
            select t.*
            from v_table t
            )   
            with data
            primary index (dw_key)  
            on commit preserve rows;

            """

            query4 = """

            select t.* 
            from v_table_2 t

            """

            session.execute(query1.format(state))
            session.execute(query2.format(state))
            session.execute(query3)
            state_dataframes.append(pd.read_sql(query4, session))
            session.execute("DROP TABLE v_table")
            session.execute("DROP TABLE v_table_2")

    all_states_df = pd.concat(state_dataframes)

Edit for clarity: correcting the query in the question only required proper indentation. In my Teradata environment I have limited spool space which requires building many vol tables to break apart queries. Since I spent a good amount of time trying to solve this, I added to the answer to help others who may run into this scenario.

来源：https://stackoverflow.com/questions/60332300/python-sql-loop-variables-through-multiple-queries

标签

python

sql

pandas

for-loop

teradata