DataflowRunner exits with “No files to stage has been found.”

为君一笑 提交于 2020-04-11 06:45:09

问题


I want to run the WordCount java example from https://beam.apache.org/get-started/quickstart-java/, but somehow I get an error that no files to stage have been found by the ClasspathScanningResourcesDetector. I run the example exactly as described on the website:

 mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
                  --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
                  --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
     -Pdataflow-runner

, which yields

Caused by: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
    ... 5 more
Caused by: java.lang.IllegalArgumentException: No files to stage has been found.
    at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:281)
    ... 10 more

I am using the latest beam version

<beam.version>2.19.0</beam.version>

Do you know how to fix this?

EDIT: This is a bug in 2.19.0. It works in 2.18.0

EDIT: I am using Redhat OpenJDK 8 on Windows

EDIT: Also, some unit test are failing from the standard wordcount example

DebuggingWordCountTest fails with

org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.FileNotFoundException: No files matched spec: /Users/<redacted>/AppData/Local/Temp/junit7907687962995108435/junit2682353785908929665.tmp

    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:321)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)

回答1:


  • When you are running the dataflow it will try to find and upload the dependencies.
  • I assume you are getting error "No files to stage has been found" due some classpath issue.
  • Try to use the --filesToStage option to manually provide the jars or classes to stage

Also Provided sample logs which successfully copied 114 files to stage so you can compare with your complete logs to understand the issue.

Mar 08, 2020 7:37:41 PM org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory create
INFO: No stagingLocation provided, falling back to gcpTempLocation
Mar 08, 2020 7:37:42 PM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 114 files. Enable logging at DEBUG level to see which files will be staged.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Uploading 114 files from PipelineOptions.filesToStage to staging location to prepare for execution.
Mar 08, 2020 7:37:48 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Staging files complete: 114 files cached, 0 files newly uploaded

You can try the below commands to generate the source code required and run the pipeline freshly to stage dependencies.

mvn archetype:generate \
      -DarchetypeGroupId=org.apache.beam \
      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
      -DarchetypeVersion=2.8.0 \
      -DgroupId=org.example \
      -DartifactId=first-dataflow \
      -Dversion="0.1" \
      -Dpackage=org.apache.beam.examples \
      -DinteractiveMode=false

Also you can try it in qwiklabs for free: https://google.qwiklabs.com/focuses/7974?parent=catalog



来源:https://stackoverflow.com/questions/60586141/dataflowrunner-exits-with-no-files-to-stage-has-been-found

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!