Parallel writes to Firestore using a Python 3.7 Cloud Function

本小妞迷上赌 提交于 2020-04-30 09:25:05

问题


As we can read in a note in this link about writing to Firestore:

For bulk data entry, use a server client library with parallelized individual writes. Batched writes perform better than serialized writes but not better than parallel writes. You should use a server client library for bulk data operations and not a mobile/web SDK.

I'm trying to program parallel writes to Firestore in a Python 3.7 Cloud Function. I have never programmed anything in parallel in Python, so I'm trying to use some modules like joblib to try to parallelize loops. But I'm not getting fast writes (probably I'm programming it wrong). Let's suppose I have the data I want to write to Firestore in an iterable object "data". I'm trying to parallelize something like this:

for key in data:
  doc_ref = db.collection(u'my_collection').document(key.id)
  doc_ref.set({u'new_data': key.new_data})

How can I achieve this in a parallel call in a Python 3.7 Cloud Function?

EDIT: I tried this, using multiprocesing:

from google.colab import drive
import firebase_admin
from firebase_admin import credentials
from firebase_admin import firestore
import time
from multiprocessing import Pool
import multiprocessing

drive.mount('/content/drive/')

#Initialize Firestore client
cred = credentials.Certificate('/content/drive/My Drive/Colab Notebooks/my_credentials.json')
firebase_admin.initialize_app(cred)
db = firestore.client()

#I will omit some code here.
#I append data dictionaries to a list called "dict_all_users".
#I want to write all these dictionaries to Firestore

#Parallelized function:
def test(dict_all_users):   
    doc_ref = db.collection(u'my_collection').document(dict_all_users['user_id'])
    doc_ref.set({u'data1': dict_all_users['data1'], u'data2': dict_all_users['data2']})

num_cores = multiprocessing.cpu_count()
p = Pool(num_cores)
start = time.time()
p.map(test,dict_all_users)
stop = time.time()
print(stop-start)

The previous code took ~27 seconds to execute 500 writes. In other code with batched writes, it took ~1 second.

According to the docs mentioned before, parallel writes are faster than batched writes. What am I doing wrong?

来源:https://stackoverflow.com/questions/61460620/parallel-writes-to-firestore-using-a-python-3-7-cloud-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!