“java.lang.IllegalArgumentException: No filesystem found for scheme gs” when running dataflow in google cloud platform

淺唱寂寞╮ 提交于 2021-02-19 05:01:47

问题


I am running my google dataflow job in Google Cloud Platform(GCP). When I run this job locally it worked well, but when running it on GCP, I got this error "java.lang.IllegalArgumentException: No filesystem found for scheme gs". I have access to that google cloud URI, I can upload my jar file to that URI and I can see some temporary file for my local job.

My Job id in GCP:

2019-08-08_21_47_27-162804342585245230 (beam version:2.12.0)

2019-08-09_16_41_15-11728697820819900062 (beam version:2.14.0)

I have tried beam version of 2.12.0 and 2.14.0, both of them have the same error.


java.lang.IllegalArgumentException: No filesystem found for scheme gs
    at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:456)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:526)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:689)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:125)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:284)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:206)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:190)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:169)
    at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
    at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412)
    at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381)
    at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)
    at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
    at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
    at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

回答1:


This may be caused by a couple of issues if you build a "fat jar" that bundles all of your dependencies.

  1. You must include the dependency org.apache.beam:google-cloud-platform-core to have the Beam GCS filesystem.
  2. Inside your far jar, you must preserve the META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar file with a line org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar. You can find this file in the jar from step 1. You will probably have many files with the same name in your dependencies, registering different Beam filesystems. You need to configure maven or gradle to combine these as part of your build or they will overwrite each other and not work properly.



回答2:


There is also one more reason for this exception. Make sure you create pipeline (e.g. Pipeline.create(options)) before you try to access files.




回答3:


It's normal. On your computer, you are using internal file with your tests (/.... In Linux, c:... In Windows). However, Google cloud storage isn't a an internal file system (btw it's not a file system) and thus the "gs://" can't be interpreted.

Try TextIO.read.from(...).

You can use it for internal and external files like GCS .

However, I experienced an issue, months ago on Windows environment, when I developed in Windows. C: wasn't a known scheme (same error as yours). It's possible that works now (I'm no longer on Windows, I can't test). Else, you have this workaround pattern: set a variable in your config object and perform a test on it like:

If (environment config variable is local)
    p.apply(FileSystems.getFileSystemInternal...);
Else 
    p.apply(TextIO.read.from(...));


来源:https://stackoverflow.com/questions/57438349/java-lang-illegalargumentexception-no-filesystem-found-for-scheme-gs-when-run

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!