问题
I am using Azure DataBricks notebook with Azure library to get list of files in Blob Storage. This task is scheduled and cluster is terminated after finishing the job and started again with new run.
I am using Azure 4.0.0 library (https://pypi.org/project/azure/)
Sometimes I am getting error message:
- AttributeError: module 'lib' has no attribute 'SSL_ST_INIT'
and very rarely also:
- AttributeError: cffi library '_openssl' has no function, constant or global variable named 'CRYPTOGRAPHY_PACKAGE_VERSION'
I have found a solution as uninstall openssl or azure library, restart cluster and install it again, but restarting cluster may not be possible because it may need to handle longer tasks, etc.
I also tried to install/upgrade openSSL 16.2.0 in initialization script, but it does not help and start conflicting with some another openSSL library which is in Databricks cluster by default
Is there any option what I can do with it?
There is the code for getting list of files from Blob Storage:
import pandas as pd
import re
import os
from pyspark.sql.types import *
import azure
from azure.storage.blob import BlockBlobService
import datetime
import time
r = []
marker = None
blobService = BlockBlobService(accountName,accountKey)
while True:
result = blobService.list_blobs(sourceStorageContainer, prefix = inputFolder, marker=marker)
for b in result.items:
r.append(b.name)
if result.next_marker:
marker = result.next_marker
else:
break
print(r)
Thank you
回答1:
Solution for this issue is downgrade Azure library to 3.0.0.
It looks like Azure v4 has conflicts with some initial libraries in Databricks.
回答2:
This issue also has a link with the pyOpenSSL package too. Downgrading to version 18.0.0 did the trick for me. I used the below script as init script at cluster initilization
dbutils.fs.put("/databricks/script/pyOpenSSL-install.sh","""
#!/bin/bash
/databricks/python/bin/pip uninstall pyOpenSSL -y
/databricks/python/bin/pip install pyOpenSSL==18.0.0
""", True)
来源:https://stackoverflow.com/questions/54984230/databricks-cluster-does-not-initialize-azure-library-with-error-module-lib-ha