Cloud Dataflow

Welcome to the Google Cloud Dataflow idea forum. You can submit and vote on ideas here to tell the Google Cloud Dataflow team which features you’d like to see.

This forum is for feature suggestions. If you’re looking for help forums, look here:

We can’t wait to hear from you!

How can we improve Cloud Dataflow?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Python 3x support

    Python 3x support is overdue. Python 3.6+ is now very mature and adds some serious speed improvements over 2.7x

    178 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  2. 76 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  9 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. 72 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  4. BigQuery to PubSub Dataflow Template

    I need a BigQuery to PubSub Template in Dataflow. This will allow us to create a streaming job to export BigQuery Gmail Logs into our SIEM using our PubSub HTTPS Endpoint.

    61 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Patch Apache Beam's XmlSource to allow templating in Dataflow

    Background

    In our project, a Cloud Function is used to start a Dataflow pipe in batch modus to upload data to ElasticSearch. The source to the Dataflow is an XML file.

    The Dataflow template is used to upload the data into GCP.

    Problem

    Uploading of templates requires option parameters to accept parameters at runtime. This is implemented by using the ValueProvider interface to embrace the option-type.

    The class for reading XML source XmlSource did not use ValueProvider for its option parameters, we solved this by patching XmlSource and applied these changes to the class.

    Upload of dataflow template should be…

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Example Code is Maddeningly Incomplete

    The example code provided is maddeningly incomplete. The biggest issue I have is that things like the complete code for the default templates is not provided. I want to create a slight variation of the Pub/Sub -> BigQuery example template, but I can't find the code for that template anywhere. It would be nice if that code were available so that I could base a custom dataflow job off of it. This would provide a known working example of exactly what I want from which to build on.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Show Total Memory and CPU Usage Alongside Worker Graph

    It would be amazing to in addition to the worker graph be able to directly see Memory and CPU consumption. This makes it much easier to co-relate & debug different stages and also gives a bit of insight into what machine types perform most optimal. It is possible to do this now by making metrics in Stackdriver, but it's very involved especially if a super simple graph could do the trick...just like in Kubernetes/GoogleContainerEngine

    27 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  8. scheduling dataflow pipeline code as a job in cloud in java

    How can we schedule dataflow pipeline code as a job to cloud in java??

    26 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  9. Ability to load data to BigQuery Partitions from dataflow pipelines (Python)

    Python Dataflow pipelines fail in _parse_table_reference function when you specify a BigQuery Table name with partition decorator for loading. This is very important aspect if you would want to leverage BigQuery Table Partitioning.

    25 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Add Dataflow Job Logs to Cloud Logging API

    Dataflow Job Logs are separate from Cloud Logging, so you cannot see Job Logs under Cloud Logging, nor create a Stackdriver alert for failed Dataflow jobs.

    https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline

    21 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Bug: Dataflow jobs are not shown consistently on cloud console

    Currently running jobs are sometimes not shown on cloud console. After refreshing the page, they sometimes show up, just to disappear again a couple of seconds later.

    The behaviour is very inconsistent and therefore I have not found a way to replicate the issue. It seems mostly time dependent. A couple of weeks ago I noticed this issue for the first time. Today, I've been suffering from the issue the whole day, while yesterday everything was working fine.

    20 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Bug: Apache beam DataFlow runner throwing setup error

    Hi,
    We are building data pipeline using Beam Python SDK and trying to run on Dataflow, but getting the below error,

    A setup error was detected in beamapp-xxxxyyyy-0322102737-03220329-8a74-harness-lm6v. Please refer to the worker-startup log for detailed information.

    But could not find detailed worker-startup logs.

    We tried increasing memory size, worker count etc, but still getting the same error.

    Here is the command we use,
    python run.py \
    --project=xyz \
    --runner=DataflowRunner \
    --staging_location=gs://xyz/staging \
    --temp_location=gs://xyz/temp \
    --requirements_file=requirements.txt \
    --worker_machine_type n1-standard-8 \
    --num_workers 2

    pipeline snippet

    data = pipeline | "load data" >> beam.io.Read(
    beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")
    )

    data…

    16 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. 15 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. dataprep / dataflow jobs region

    TLDR: dataprep creates jobs in the us only. This is probably a bug.

    I'm trying to prep data from BQ - EU to BQ -
    EU.

    But dataprep creates dataflow in the us, and because of that I get an error:

    Cannot read and write in different locations: source: EU, destination: US, error: Cannot read and write in different locations: source: EU, destination: US

    15 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Provide more frequent Python updates

    Python SDK is not up-to-date with various cloud SDKs, last update was in September...

    14 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Ability to use our own kubernetes cluster as dataflow runner

    It would be good to use our own container cluster for running dataflow workers as dataflow already use kubernetes for deploying workers. This could even take into consideration of user supplied cluster's current workload and balance workers between user provided cluster and dataflow cluster.

    13 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. delete or cancel job using the console

    Delete or cancel a job using console

    13 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Labeling Dataflow jobs

    Should be able to assign labels to data flow jobs and filter by labels in the overview page

    13 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. PubSub to BigQuery template should allow Avro and/or Protobuf

    BigQuery already has support for loading files in Avro, why not streaming them in from PubSub? This seems like an obvious feature to have but I don't see any way to do it currently. The PubSub to BigQuery template is great, but it would be so much better with this one feature turned on.

    11 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Show cost of current job according to the new pricing structure.

    It could be good to see the cost of the job in job view, I even thought of doing an chrome extension for this cause it's pretty trivial with datas vCPU sec, RAM MB sec, PD MB sec etc.

    10 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4
  • Don't see your idea?

Cloud Dataflow

Feedback and Knowledge Base