Big Query job fails with “Bad character (ASCII 0) encountered.”

后端 未结 3 670
孤街浪徒
孤街浪徒 2021-01-22 17:55

I have a job that is failing with the error

Line:14222274 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.

<
3条回答
  •  难免孤独
    2021-01-22 18:36

    I had similar problems, trying to load in BigQuery a compressed file (saved it in Google Cloud Storage). These are the logs:

    File: 0 / Offset:4563403089 / Line:328480 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328485 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328490 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328511 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
    File: 0 / Offset:4563403089 / Line:328517 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid) 
    

    For resolve the problem what I've done is remove the ASCII 0 characters from the compressed file. To do it, I executed the following commnad from an instance of Compute Engine with the sdk installed:

    gsutil cp gs://bucket_987234/compress_file.gz - | gunzip | tr -d '\000' | gsutil cp - gs://bucket_987234/uncompress_and_clean_file

    By using pipelines, I avoid having to have all storage on the hard disk (1G compress + 52G uncompress). The first program gets the compressed file from Storage, the second decompresses it, the thrid removes the ASCII 0 characters and the fourth program updaloads the result to Storage.

    I don't compress the result when I upload again to Storage, because for BigQuery is faster load a uncompressed file. After that I can load on BigQuery the data without problems.

提交回复
热议问题