Cloud Dataflow

Welcome to the Google Cloud Dataflow idea forum. You can submit and vote on ideas here to tell the Google Cloud Dataflow team which features you’d like to see.

This forum is for feature suggestions. If you’re looking for help forums, look here:

We can’t wait to hear from you!

  1. Error reporting via Stackdriver using logging appenders

    Hi my idea is to add labels to dataflow errors. I am trying to add more info to the exceptions in dataflow step using slf4j and logback. I have updated logger errors to include marker text to easily identify in GCP stackdriver. I have done the following steps

    Added logback.xml to src/main/resources (classpath).
    
    Created loggingeventenhancer and enhancer class to add new labels.
    Added markers to logger error, to find the type of error in Stackdriver.
    But the logs in stackdriver doesnt have new labels (or markers) added via logging appender. I think the logback.xml is not found by the maven
    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Custom job templates

    Custom job templates

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Custom job templates

    Custom job templates

    0 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Ability to select an existing job and create a new one from this job (like custom job template)

    Ability to select an existing job and create a new one from this job (like custom job template)

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Ability to re-run a failed / stopped / canceled job

    Ability to re-run a failed / stopped / canceled job

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Set Proxy Host

    When trying to run a Dataflow driver program behind a firewall, it needs to use a proxy to connect to GCP, but there does not seem to be a way to specify that HTTPS traffic should go through a proxy.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Dataflow templates broken with python 3

    Beam fails to stage a Dataflow template with python 3. I looks like Beam is trying to access the RuntimeValueProvider during staging causing 'not accessible' error

    Template stages fine with python 2.7

    Repo with code to reproduce the issue and stack trace: https://github.com/firemuzzy/dataflow-templates-bug-python3

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. BUG: google-cloud-firestore not working in apache_beam[gcp] 2.16.0 works but in 2.13.0

    Running

    python pipeline --runner DataflowRunner ... --requirements_file requirements.txt

    throws an error if having

    apache_beam[gcp]==2.16.0
    google-cloud-firestore

    in requirements.txt but not if

    apache_beam[gcp]==2.13.0
    google-cloud-firestore

    Part of error:

    ModuleNotFoundError: No module named 'Cython'\n \n ----------------------------------------\nCommand "python setup.py egg_info" failed with error code 1 in /private/var/folders/h4/n9rzy8z52lqdh7sfkhr96nnw0000gn/T/pip-download-lx28dwpv/pyarrow/\n'

    See also: https://stackoverflow.com/questions/57286517/importing-google-firestore-python-client-in-apache-beam

    Where to report bugs?

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Deployment Manager template for Dataflow

    Please provide GDM template to CRUD data flow templates. Currently there is no deployment manager support for Dataflow. Since Dataflow uses GCE and GCS under the hood, a rich GDM support can be created to add tremendous value. GDM will allow mass/concurrent deployment of Dataflow templates.

    9 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  10. Support MapState from Beam API

    Support MapState from Beam API

    Currently the DataflowRunner does not support MapState for stateful DoFns

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Codegen

    I would like to get a valid job and generate at least some of the code I would need to re-create this in python using the client API. I want to be able to get this for my historic jobs.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Button: "copy dataflow job"

    I would like to be able to copy a dataflow job so that i can tweak the parameters and run it again without having to enter them all in manually.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Cannot specify diskSizeGb when launching a template

    When I create a template, It's possible to specify --diskSizeGb, but, If you don't specify it, it's not possible to pass it as parameter

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Cannot specify location when starting a dataflow job from REST API

    I'm using a DataFlow template to export data from Bigtable. Using the cmdline api, I'm able to specify a region to run the job (europe-west1). But, when comes to REST API, i can't specify any region, except us-central1. The error is:

    : The workflow could not be created, since it was sent to an invalid regional endpoint (europe-west1). Please resubmit to a valid Cloud Dataflow regional endpoint.">

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  15. pub/sub to cloud sql template

    There are lots of useful templates. One that would be useful to me is pub/sub to cloud sql.

    5 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Is there plan for GO SDK for Dataflow? Seems Apache beam has SDK for GO now.

    I want to build pipeline using GO SDK. Java is slow for my application and I don't want waste money on hiring more CPUs to pay for slowness of Java/Python. Now that apache beam supports GO SDK, is there plan to support it in Dataflow?

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Meaningful information about steps output collection in UI

    In the UI, when clicking on a step, adding output collections tag Names or OutPutTag id when available instead of "out"+index would be more meaningful.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Allow custom logger appenders

    Using a custom log appender (e.g. with logback) inside Dataflow is impossible at the moment. Any logging settings I have seem to be superseded by Google's own appender and just show up in the Dataflow logs in Stackdriver. I want to send my logs to an Elasticsearch cluster, since the rest of my logs which are generated by other non-Dataflow systems are there as well.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4
  • Don't see your idea?

Cloud Dataflow

Categories

Feedback and Knowledge Base