How to convert csv into a dictionary in apache beam dataflow

前端未结

关注

 2  981

独厮守ぢ 2020-12-05 21:14

I would like to read a csv file and write it to BigQuery using apache beam dataflow. In order to do this I need to present the data to BigQuery in the form of a dictionary.

2条回答

抹茶落季 (楼主)

2020-12-05 21:54

As a supplement to Pablo's post, I'd like to share a little change I made myself to his sample. (+1 for you!)

Changed: reader = csv.reader(self._file) to reader = csv.DictReader(self._file)

The csv.DictReader uses the first row of the CSV file as Dict keys. The other rows are used to populate a dict per row with it's values. It'll automatically put the right values to the correct keys based on column order.

One little detail is that every value in the Dict is stored as string. This may conflict your BigQuery schema if you use eg. INTEGER for some fields. So you need to take care of proper casting afterwards.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...