Google Composer- How do I install Microsoft SQL Server ODBC drivers on environments

后端 未结 4 1912
灰色年华
灰色年华 2021-01-16 09:37

I am new to GCP and Airflow and am trying to run my python pipelines via a simple PYODBC connection via python 3. However, I believe I have found what I need to install on t

4条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-16 10:23

    I was facing the same problem. The first solution which worked for me was building a docker image that would install the drivers and then run the code. Initially I tried to find a way of installing the drivers on the cluster but after many failures I read in documentation that the airflow image in composer is curated by Google and no changes affecting the image are allowable. So here is my docker file:

    FROM python:3.7-slim-buster
    #FROM gcr.io/data-development-254912/gcp_bi_baseimage 
    #FROM gcp_bi_baseimage
    LABEL maintainer = " " 
    ENV APP_HOME /app 
    WORKDIR $APP_HOME
    COPY / ./
    # install nano 
    RUN apt-get update \
        && apt-get install --yes --no-install-recommends \
            apt-utils \
            apt-transport-https \
            curl \
            gnupg \
            unixodbc-dev \ 
            gcc \
            g++ \ 
            nano \
        && curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
        && curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list \
        && apt-get update \
        && ACCEPT_EULA=Y apt-get install --yes --no-install-recommends msodbcsql17 \
        && apt-get install libgssapi-krb5-2 \
        && apt-get clean \
        && rm -rf /var/lib/apt/lists/* \
        && rm -rf /tmp/*
     RUN pip install -r requirements.txt
     CMD ["python","app.py"]
    

    requirements.txt:

    pyodbc==4.0.28
    google-cloud-bigquery==1.24.0    
    google-cloud-storage==1.26.0
    

    You should be good from this point on.

    Since then I managed to set up an Airflow named connection to our sql server and am using mssql_operator or mssql_hook. I had worked with a cloud engineer to set up the networking just right. What I found is that the named connection is much easier to use, yet kubernetesPodOperator is still much more reliable.

提交回复
热议问题