Cloud Dataflow

Welcome to the Google Cloud Dataflow idea forum. You can submit and vote on ideas here to tell the Google Cloud Dataflow team which features you’d like to see.

This forum is for feature suggestions. If you’re looking for help forums, look here:

We can’t wait to hear from you!

  1. Is there a way to get notified whenever there is a new dataflow release?

    Can we subscribe to an email list and get notified whenever there is a new dataflow release ?

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. 2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Provide more frequent Python updates

    Python SDK is not up-to-date with various cloud SDKs, last update was in September...

    17 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. I have to Dump Data in a table Reading a File From Bucket and Then Fire a 200 Query on that.

    I have to Read a File From Bucket then I Have To Dump that File Data to a BigQuery table and Then I have to Run 200 Queries in that Dumped table My Whole Process is one By one But Due to parallel Work my Work is Not Done Perfectly So i Have to Synchronize my Work Such that one Job get Finished then another get Triggered .
    Can anyone Help me.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. 84 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  6. I really need Python examples.

    How to read a file from GCS and load it into Bigquery, with Python.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Log GCE worker setup process and exception

    We had trouble to do even simple word count job with the DataflowRunner.

    The job was stuck for 1 hour, but log said nothing.

    ==========================================
    INFO:root:2017-08-28T07:26:07.709Z: JOBMESSAGEBASIC: (5296cce74062ca91): Starting 1 workers in asia-northeast1-c...
    INFO:root:2017-08-28T07:26:07.731Z: JOBMESSAGEDEBUG: (f4c12c649707e205): Value "group/Session" materialized.
    INFO:root:2017-08-28T07:26:07.745Z: JOBMESSAGEBASIC: (f4c12c649707e994): Executing operation read/Read+split+pairwithone+group/Reify+group/Write

    Stuck in 1 hour, so cancelled

    The reason was we need to set [network] and [subnetwork] option explicitly and correctly.
    After that, the job was going to work.

    Its so good to know what is happening, or what is stuck the job when setting up the workers.…

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Python 3x support

    Python 3x support is overdue. Python 3.6+ is now very mature and adds some serious speed improvements over 2.7x

    188 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  9. You could update the screen shots to be the current console.

    The screen shots are just old and outdated so cannot be used for troubleshooting. Permissions is not on the left for example.

    The error log messages for permissions fail to say which account is having access issues so you can and I have spent hours trying different random combinations. yet still have failed to get any dataflow most basic examples to work ever.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Iterative parallel processing

    Some more complex flows require one transformation being applied multiple times until a certain condition is met (like transverse a tree). Currently Dataflows do not allows to do that in a parallel way.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Javascript

    Would be great to add Javascript as a compute language in addition to Python and Java.

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Pin products on console drawer by project

    As today, when you pin on the console drawer a product (e.g.: Google BigQuery) this product is pinned independently of the selected Google project. Since different projects may use different sets of products, would be nice if the pinned products were scoped by project.

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Bug: Nullpointer when reading from BigQuery

    I believe I'm experiencing a bug in BigQuerySource for apache Beam when running on google dataflow. I described this in details on stackoverflow: https://stackoverflow.com/questions/44718323/apache-beam-with-dataflow-nullpointer-when-reading-from-bigquery/44755305#44755305

    Nobody seems to be able to respond to this. So posting it here as a potential bug.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. autodetect

    Enable --autodetect for BigQuery loads, consistent with bq load --autodetect on the command line

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. 3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Ability to run dataflow pipeline from deployed flex python app-engine service without gcloud

    I have defined a batch python pipeline inside a flex app-engine python service.

    Once deployed on GCP, the pipeline cannot be compiled and started.
    The current workaround is to install gcloud-sdk in dockerfile of the service.

    It would be great to not install gcloud-sdk on deployed service.
    It would be great to have documentation about best practice to run python pipeline from deployed service

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  17. Patch Apache Beam's XmlSource to allow templating in Dataflow

    Background

    In our project, a Cloud Function is used to start a Dataflow pipe in batch modus to upload data to ElasticSearch. The source to the Dataflow is an XML file.

    The Dataflow template is used to upload the data into GCP.

    Problem

    Uploading of templates requires option parameters to accept parameters at runtime. This is implemented by using the ValueProvider interface to embrace the option-type.

    The class for reading XML source XmlSource did not use ValueProvider for its option parameters, we solved this by patching XmlSource and applied these changes to the class.

    Upload of dataflow template should be…

    32 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Show avg/median per-element processing time in monitoring interface

    When selecting a transform in the Dataflow monitoring interface, you can currently see the number of elements that have been processed as well as the total execution time.

    It would be nice to be able to see the per-element processing time: either a simple average or better yet, a histogram. This would allow much easier performance monitoring.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Example Code is Maddeningly Incomplete

    The example code provided is maddeningly incomplete. The biggest issue I have is that things like the complete code for the default templates is not provided. I want to create a slight variation of the Pub/Sub -> BigQuery example template, but I can't find the code for that template anywhere. It would be nice if that code were available so that I could base a custom dataflow job off of it. This would provide a known working example of exactly what I want from which to build on.

    30 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Ability to Downscale Google Provided Templates

    We have a job which the basic templated DataFlow job works fairly well for, but so far we cannot see a way to have the machine use fewer machines. Our data ingestion is large and growing, but not yet extremely large. The 3 4 vcpu 15 gig ram machines that are started to process our volume of data are very overkill. I do not see any way to use these basic templates, but to also set the max_workers setting.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Cloud Dataflow

Categories

Feedback and Knowledge Base