问题
As we can read in a note in this link about writing to Firestore:
For bulk data entry, use a server client library with parallelized individual writes. Batched writes perform better than serialized writes but not better than parallel writes. You should use a server client library for bulk data operations and not a mobile/web SDK.
I'm trying to program parallel writes to Firestore in a Python 3.7 Cloud Function. I have never programmed anything in parallel in Python, so I'm trying to use some modules like joblib to try to parallelize loops. But I'm not getting fast writes (probably I'm programming it wrong). Let's suppose I have the data I want to write to Firestore in an iterable object "data". I'm trying to parallelize something like this:
for key in data:
doc_ref = db.collection(u'my_collection').document(key.id)
doc_ref.set({u'new_data': key.new_data})
How can I achieve this in a parallel call in a Python 3.7 Cloud Function?
EDIT: I tried this, using multiprocesing:
from google.colab import drive
import firebase_admin
from firebase_admin import credentials
from firebase_admin import firestore
import time
from multiprocessing import Pool
import multiprocessing
drive.mount('/content/drive/')
#Initialize Firestore client
cred = credentials.Certificate('/content/drive/My Drive/Colab Notebooks/my_credentials.json')
firebase_admin.initialize_app(cred)
db = firestore.client()
#I will omit some code here.
#I append data dictionaries to a list called "dict_all_users".
#I want to write all these dictionaries to Firestore
#Parallelized function:
def test(dict_all_users):
doc_ref = db.collection(u'my_collection').document(dict_all_users['user_id'])
doc_ref.set({u'data1': dict_all_users['data1'], u'data2': dict_all_users['data2']})
num_cores = multiprocessing.cpu_count()
p = Pool(num_cores)
start = time.time()
p.map(test,dict_all_users)
stop = time.time()
print(stop-start)
The previous code took ~27 seconds to execute 500 writes. In other code with batched writes, it took ~1 second.
According to the docs mentioned before, parallel writes are faster than batched writes. What am I doing wrong?
来源:https://stackoverflow.com/questions/61460620/parallel-writes-to-firestore-using-a-python-3-7-cloud-function