I have a job that is failing with the error
<Line:14222274 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.
I had similar problems, trying to load in BigQuery a compressed file (saved it in Google Cloud Storage). These are the logs:
File: 0 / Offset:4563403089 / Line:328480 / Field:21: Bad character (ASCII 0) encountered: field starts with: (error code: invalid) File: 0 / Offset:4563403089 / Line:328485 / Field:21: Bad character (ASCII 0) encountered: field starts with: (error code: invalid) File: 0 / Offset:4563403089 / Line:328490 / Field:21: Bad character (ASCII 0) encountered: field starts with: (error code: invalid) File: 0 / Offset:4563403089 / Line:328511 / Field:21: Bad character (ASCII 0) encountered: field starts with: (error code: invalid) File: 0 / Offset:4563403089 / Line:328517 / Field:21: Bad character (ASCII 0) encountered: field starts with: (error code: invalid)
For resolve the problem what I've done is remove the ASCII 0 characters from the compressed file. To do it, I executed the following commnad from an instance of Compute Engine with the sdk installed:
gsutil cp gs://bucket_987234/compress_file.gz - | gunzip | tr -d '\000' | gsutil cp - gs://bucket_987234/uncompress_and_clean_file
By using pipelines, I avoid having to have all storage on the hard disk (1G compress + 52G uncompress). The first program gets the compressed file from Storage, the second decompresses it, the thrid removes the ASCII 0 characters and the fourth program updaloads the result to Storage.
I don't compress the result when I upload again to Storage, because for BigQuery is faster load a uncompressed file. After that I can load on BigQuery the data without problems.