问题
I want to run the WordCount
java example from https://beam.apache.org/get-started/quickstart-java/, but somehow I get an error that no files to stage have been found by the ClasspathScanningResourcesDetector
. I run the example exactly as described on the website:
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
--gcpTempLocation=gs://<your-gcs-bucket>/tmp \
--inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
-Pdataflow-runner
, which yields
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
... 5 more
Caused by: java.lang.IllegalArgumentException: No files to stage has been found.
at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:281)
... 10 more
I am using the latest beam version
<beam.version>2.19.0</beam.version>
Do you know how to fix this?
EDIT: This is a bug in 2.19.0. It works in 2.18.0
EDIT: I am using Redhat OpenJDK 8 on Windows
EDIT: Also, some unit test are failing from the standard wordcount example
DebuggingWordCountTest fails with
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.FileNotFoundException: No files matched spec: /Users/<redacted>/AppData/Local/Temp/junit7907687962995108435/junit2682353785908929665.tmp
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:321)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
回答1:
- When you are running the dataflow it will try to find and upload the dependencies.
- I assume you are getting error "No files to stage has been found" due some classpath issue.
- Try to use the --filesToStage option to manually provide the jars or classes to stage
Also Provided sample logs which successfully copied 114 files to stage so you can compare with your complete logs to understand the issue.
Mar 08, 2020 7:37:41 PM org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory create
INFO: No stagingLocation provided, falling back to gcpTempLocation
Mar 08, 2020 7:37:42 PM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 114 files. Enable logging at DEBUG level to see which files will be staged.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Uploading 114 files from PipelineOptions.filesToStage to staging location to prepare for execution.
Mar 08, 2020 7:37:48 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
INFO: Staging files complete: 114 files cached, 0 files newly uploaded
You can try the below commands to generate the source code required and run the pipeline freshly to stage dependencies.
mvn archetype:generate \
-DarchetypeGroupId=org.apache.beam \
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
-DarchetypeVersion=2.8.0 \
-DgroupId=org.example \
-DartifactId=first-dataflow \
-Dversion="0.1" \
-Dpackage=org.apache.beam.examples \
-DinteractiveMode=false
Also you can try it in qwiklabs for free: https://google.qwiklabs.com/focuses/7974?parent=catalog
来源:https://stackoverflow.com/questions/60586141/dataflowrunner-exits-with-no-files-to-stage-has-been-found