google-cloud-dataflow

Creating a Dict from CSV dataflow python

耗尽温柔 提交于 2020-12-12 12:32:51
问题 I am trying to make a dict from csv data in python, I do not want to use the traditional split(',') and then using renaming the rows to the heading I would like, as I will be recieving different csv files with different amounts of information, and I will not be able to consistently target the rows I want with that method. THE HEADER NAMES WILL BE CONSISTENT, just their maybe more headers in one file compared to another Instead, I have been trying to formulate a list from the CSV file, then

Dataflow fails when I add requirements.txt [Python]

与世无争的帅哥 提交于 2020-12-12 06:50:13
问题 So when I try to run dataflow with the DataflowRunner and include the requirements.txt which looks like this google-cloud-storage==1.28.1 pandas==1.0.3 smart-open==2.0.0 Every time it fails on this line INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://..../beamapp-.../numpy-1.18.2.zip... Traceback (most recent call last): File "Database.py", line 107, in <module> run() File "Database.py", line 101, in run | 'Write CSV' >> beam.ParDo(WriteCSVFIle(options.output

Dataflow fails when I add requirements.txt [Python]

☆樱花仙子☆ 提交于 2020-12-12 06:49:05
问题 So when I try to run dataflow with the DataflowRunner and include the requirements.txt which looks like this google-cloud-storage==1.28.1 pandas==1.0.3 smart-open==2.0.0 Every time it fails on this line INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://..../beamapp-.../numpy-1.18.2.zip... Traceback (most recent call last): File "Database.py", line 107, in <module> run() File "Database.py", line 101, in run | 'Write CSV' >> beam.ParDo(WriteCSVFIle(options.output

ETL approaches to bulk load data in Cloud SQL

南笙酒味 提交于 2020-12-04 08:47:18
问题 I need to ETL data into my Cloud SQL instance. This data comes from API calls. Currently, I'm running a custom Java ETL code in Kubernetes with Cronjobs that makes request to collect this data and load it on Cloud SQL. The problem comes with managing the ETL code and monitoring the ETL jobs. The current solution may not scale well when more ETL processes are incorporated. In this context, I need to use an ETL tool. My Cloud SQL instance contains two types of tables: common transactional

ETL approaches to bulk load data in Cloud SQL

冷暖自知 提交于 2020-12-04 08:45:59
问题 I need to ETL data into my Cloud SQL instance. This data comes from API calls. Currently, I'm running a custom Java ETL code in Kubernetes with Cronjobs that makes request to collect this data and load it on Cloud SQL. The problem comes with managing the ETL code and monitoring the ETL jobs. The current solution may not scale well when more ETL processes are incorporated. In this context, I need to use an ETL tool. My Cloud SQL instance contains two types of tables: common transactional