Big Query job fails with “Bad character (ASCII 0) encountered.”

后端未结

关注

 3  670

孤街浪徒 2021-01-22 17:55

I have a job that is failing with the error

Line:14222274 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.

3条回答

难免孤独 (楼主)

2021-01-22 18:36
I had similar problems, trying to load in BigQuery a compressed file (saved it in Google Cloud Storage). These are the logs:
```
File: 0 / Offset:4563403089 / Line:328480 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328485 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328490 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328511 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328517 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid) 
```
For resolve the problem what I've done is remove the ASCII 0 characters from the compressed file. To do it, I executed the following commnad from an instance of Compute Engine with the sdk installed:
```
gsutil cp gs://bucket_987234/compress_file.gz - | gunzip | tr -d '\000' | gsutil cp - gs://bucket_987234/uncompress_and_clean_file
```
By using pipelines, I avoid having to have all storage on the hard disk (1G compress + 52G uncompress). The first program gets the compressed file from Storage, the second decompresses it, the thrid removes the ASCII 0 characters and the fourth program updaloads the result to Storage.

I don't compress the result when I upload again to Storage, because for BigQuery is faster load a uncompressed file. After that I can load on BigQuery the data without problems.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...