Reading Data From Cloud Storage Via Cloud Functions

本小妞迷上赌 提交于 2021-02-18 22:47:40

问题


I am trying to do a quick proof of concept for building a data processing pipeline in Python. To do this, I want to build a Google Function which will be triggered when certain .csv files will be dropped into Cloud Storage.

I followed along this Google Functions Python tutorial and while the sample code does trigger the Function to create some simple logs when a file is dropped, I am really stuck on what call I have to make to actually read the contents of the data. I tried to search for an SDK/API guidance document but I have not been able to find it.

In case this is relevant, once I process the .csv, I want to be able to add some data that I extract from it into GCP's Pub/Sub.


回答1:


The function does not actually receive the contents of the file, just some metadata about it.

You'll want to use the google-cloud-storage client. See the "Downloading Objects" guide for more details.

Putting that together with the tutorial you're using, you get a function like:

from google.cloud import storage

storage_client = storage.Client()

def hello_gcs_generic(data, context):
    bucket = storage_client.get_bucket(data['bucket'])
    blob = bucket.blob(data['name'])
    contents = blob.download_as_string()
    # Process the file contents, etc...



回答2:


This is an alternative solution using pandas:

Cloud Function Code:

import pandas as pd

def GCSDataRead(event, context):
    bucketName = event['bucket']
    blobName = event['name']
    fileName = "gs://" + bucketName + "/" + blobName
    
    dataFrame = pd.read_csv(fileName, sep=",")
    print(dataFrame)


来源:https://stackoverflow.com/questions/53347006/reading-data-from-cloud-storage-via-cloud-functions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!