Bug: Apache beam DataFlow runner throwing setup error
We are building data pipeline using Beam Python SDK and trying to run on Dataflow, but getting the below error,
A setup error was detected in beamapp-xxxxyyyy-0322102737-03220329-8a74-harness-lm6v. Please refer to the worker-startup log for detailed information.
But could not find detailed worker-startup logs.
We tried increasing memory size, worker count etc, but still getting the same error.
Here is the command we use,
python run.py \
--worker_machine_type n1-standard-8 \
data = pipeline | "load data" >> beam.io.Read(
beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")
data | "filter data" >> beam.Filter(lambda x: x.get('column_name') == value)
Above pipeline is just loading the data from BigQuery and filtering based on some column value. This pipeline works like a charm in DirectRunner but fails on Dataflow.
Are we doing any obvious setup mistake? anyone else getting the same error? We could use some help to resolve the issue.