Google Composer- How do I install Microsoft SQL Server ODBC drivers on environments

后端 未结 4 1907
灰色年华
灰色年华 2021-01-16 09:37

I am new to GCP and Airflow and am trying to run my python pipelines via a simple PYODBC connection via python 3. However, I believe I have found what I need to install on t

4条回答
  •  失恋的感觉
    2021-01-16 10:28

    It's Simple ! No Need of DockerFile, KubernetesPodOperator, LD_LIBRARY_PATH, etc just a basic python operator will do

    Points to consider

    • GCP Composer Worker's Pod image is ubuntu 1604 (just run a basic python operator with a command os.system('cat /etc/os-release') to check )
    • It has unixodbc-dev already installed on the worker's pod image
    • Composer creates the bucket and mount it with airflow
    • So why not just install the pyodbc from pypi packages and provide the mssql odbc driver as paramter in pyodbc connection method

    here 'gs://bucket_created_by_composer' == '/home/airflow/gcs'

    gcs bucket created by composer ->
              -> data/
              -> dags/
    

    Step By Step Approach

    Step 1: Install pyodbc, mssql odbc on any ubuntu instances to get the driver files

    for consideration lets do it on GCP VM Intance with ubuntu 1804 image

    #update the packages
    sudo apt update
    sudo apt-get update -y
    curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
    curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
    sudo apt-get update -y
    echo Installing mssql-tools and unixODBC developer...
    sudo ACCEPT_EULA=Y apt-get install -y mssql-tools unixodbc-dev
    sudo apt-get update -y
    sudo apt-get install  -y mssql-tools #it includes sql_cmd and bcp (we dont need those)
    sudo apt install python3-pip #installing pip3
    pip3 install pyodbc 
    

    Step 2: Get the Driver Files and upload it to the data folder of gcs_bucket created by the composer

    cd /opt/microsoft
    #now you can see there is one directory 'msodbcsql17', version may change
    #we need to upload this directory to the data folder of gcs_bucket
    
    #for this you may choose which ever approach suits you
    #copying the directory to / for proper zipping/uploading to gcs
    cp -r msodbcsql17 /home/ #you may need to use sudo 
    #upload this /home//msodbcsql17 to any gcs_bucket 
    gsutil cp -r /home//msodbcsql17 gs://
    

    download this folder from gcs bucket to local and the upload this folder to data folder of gcs bucket created by composer

    choose any approach/method, main aim is to get the msodbcsql17 folder in the data folder of gcs bucket created by composer

    Final structure:

    gcs bucket created by composer ->
              -> data/msodbcsql17/
              -> dags/
    

    Step 3: using this msodbcsql17 drivers for pyodbc connection

    EXAMPLE DAG:

    import os
    import time
    import datetime
    import argparse
    import json
    from airflow import DAG
    import airflow
    
    from airflow.operators import python_operator
    
    
    default_dag_args = {
        'start_date': airflow.utils.dates.days_ago(0), #
        'provide_context': True
    }
    
    
    
    dag = DAG(
            'pyodbc_test',
            schedule_interval=None, #change for composer
            default_args=default_dag_args
            )
    
    
    def check_connection(**kwargs):
        print('hello')
        driver='/home/airflow/gcs/data/msodbcsql17/lib64/libmsodbcsql-17.5.so.2.1'
        #this is the main driver file, the exact location can be found on gcs_bucket/data folder or check the /etc/odbcinst.in file of ubuntu instance in which you installed the pyodbc earlier
    
        def tconnection(ServerIp,LoginName,Password,mssql_portno):
            """ A method which return connection object"""
            import pyodbc
            pyodbc.pooling = False 
            try:   
                sql_conn = pyodbc.connect("DRIVER={4};SERVER={0},{1};UID={2};PWD={3}".format(ServerIp,mssql_portno,LoginName,Password,driver)) 
            except pyodbc.Error as ex:
                sqlstate = ex.args[1]
                raise
    
    
            return sql_conn
    
        con=tconnection('','','','1433')
        #recommendation is to take the password and login from airflow connections
        import pandas as pd
        q='select * from ..'
        df=pd.read_sql(q,con)
        print(df)
    
    Tcheck_connection= python_operator.PythonOperator(
            task_id='Tcheck_connection',
            python_callable=check_connection,
            dag=dag ) 
    
    
    #calling the task sequence
    Tcheck_connection 
    

    PYPI Packages

    pyodbc
    pandas
    

    Have tested on Composer recently

提交回复
热议问题