Apache Airflow - Connection issue to MS SQL Server using pymssql + SQLAlchemy

。_饼干妹妹 提交于 2021-02-05 05:56:05

问题


I am facing a problem to connect to an Azure MS SQL Server 2014 database in Apache Airflow 1.10.1 using pymssql. I want to use the MsSqlHook class provided by Airflow, for the convenience to create my connection in the Airflow UI, and then create a context manager for my connection using SqlAlchemy:

@contextmanager
def mssql_session(dt_conn_id):
    sqla_engine = MsSqlHook(mssql_conn_id=dt_conn_id).get_sqlalchemy_engine()
    session = sessionmaker(bind=sqla_engine)()
    try:
        yield session
    except:
        session.rollback()
        raise
    else:
        session.commit()
    finally:
        session.close()

But when I do that, I have this error when I run a request :

sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnect)') (Background on this error at: http://sqlalche.me/e/rvf5)

It seems come from pyodbc whereas I want to use pymssql (and in MsSqlHook, the method get_conn uses pymssql !)

I searched in the source code of Airflow the cause. I noticed that the method get_uri from the class DbApiHook (from which is inherited MsSqlHook) builds the connection string passed to SqlAlchemy like this:

'{conn.conn_type}://{login}{host}/{conn.schema}'

But conn.conn_type is simply equal to 'mssql' whereas we need to specify the DBAPI as described here: https://docs.sqlalchemy.org/en/latest/core/engines.html#microsoft-sql-server (for example : 'mssql+pymssql://scott:tiger@hostname:port/dbname')

So, by default, I think it uses pyodbc. But how can I set properly the conn_type of the connection to 'mssql+pymssql' instead of 'mssql' ? In the Airflow IU, you can simply select SQL server in a dropdown list, but not set as you want :

To work around the issue, I overload the get_uri method from DbApiHook in a new class I created inherited from MsSqlHook, and in which I build my own connection string, but it's not clean at all...

Thanks for any help


回答1:


You're right. There's no easy, straightforward way to get Airflow to do what you want. Personally I would build the sqlalchemy engine inside of your context manager, something like create_engine(hook.get_uri().replace("://", "+pymssql://")) -- then I would toss the code somewhere reusable.




回答2:


You can create a connection by passing it as an environment variable to Airflow. See the docs. The value of the variable is the database URL in the format SqlAlchemy accepts.

The name of the env var follows the pattern AIRFLOW_CONN_ to which you append the connection ID. For example AIRFLOW_CONN_MY_MSSQL, in this case, the conn_id would be 'my_mssql'.



来源:https://stackoverflow.com/questions/53616630/apache-airflow-connection-issue-to-ms-sql-server-using-pymssql-sqlalchemy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!