Cloud Dataflow

Welcome to the Google Cloud Dataflow idea forum. You can submit and vote on ideas here to tell the Google Cloud Dataflow team which features you’d like to see.

This forum is for feature suggestions. If you’re looking for help forums, look here:

We can’t wait to hear from you!

  1. Codegen

    I would like to get a valid job and generate at least some of the code I would need to re-create this in python using the client API. I want to be able to get this for my historic jobs.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Button: "copy dataflow job"

    I would like to be able to copy a dataflow job so that i can tweak the parameters and run it again without having to enter them all in manually.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Cannot specify diskSizeGb when launching a template

    When I create a template, It's possible to specify --diskSizeGb, but, If you don't specify it, it's not possible to pass it as parameter

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Cannot specify location when starting a dataflow job from REST API

    I'm using a DataFlow template to export data from Bigtable. Using the cmdline api, I'm able to specify a region to run the job (europe-west1). But, when comes to REST API, i can't specify any region, except us-central1. The error is:

    : The workflow could not be created, since it was sent to an invalid regional endpoint (europe-west1). Please resubmit to a valid Cloud Dataflow regional endpoint.">

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  5. pub/sub to cloud sql template

    There are lots of useful templates. One that would be useful to me is pub/sub to cloud sql.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Is there plan for GO SDK for Dataflow? Seems Apache beam has SDK for GO now.

    I want to build pipeline using GO SDK. Java is slow for my application and I don't want waste money on hiring more CPUs to pay for slowness of Java/Python. Now that apache beam supports GO SDK, is there plan to support it in Dataflow?

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Meaningful information about steps output collection in UI

    In the UI, when clicking on a step, adding output collections tag Names or OutPutTag id when available instead of "out"+index would be more meaningful.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Allow custom logger appenders

    Using a custom log appender (e.g. with logback) inside Dataflow is impossible at the moment. Any logging settings I have seem to be superseded by Google's own appender and just show up in the Dataflow logs in Stackdriver. I want to send my logs to an Elasticsearch cluster, since the rest of my logs which are generated by other non-Dataflow systems are there as well.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Add a more helpful error message to "projects.jobs.create"

    I'm trying to launch a batch pipeline from outside the project and the "projects.jobs.create" API is returning:

    {
    "error": {
    "code": 400,
    "message": "Request contains an invalid argument.",
    "status": "INVALID_ARGUMENT"
    }
    }

    With no indication on which argument is invalid. This wouldn't be such a big deal, except that the documentation for this (https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/create?authuser=1) does not indicate which fields are required. :(

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Wait.on() PDone instead of PCollection

    I would like to be able to publish an event to PubSub after writing data to BigQuery - currently there is Wait.on() transform which is intended to be used in this situation; howeverm Wait.on() requires a PCollection as an input to wait on, but BigQuery.IO returns a PDone. As such, I would like to be able to use Wait.On() a PDone before applying a Transform.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. delete or cancel job using the console

    Delete or cancel a job using console

    13 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. It would be great if the dataflow sdk would work with a newer version of Google Datastore than v1

    Whenever I want to use Dataflow together with Google Datastore, I have to use the old Datastore.v1 version. In this version it seems like I have to encode and decode entities of different kinds manually by extracting the Values (and knowing the type of value) and setting them on a new Object. When I compare this with newer Versions of Datastore or the NodeJs implementation, the handling of datastore-Objects is a dream (i.e. nodeJs just gives you the json-representation). Would it be possible to retrieve Entities either by "selecting" a class type like:
    MyObject mObj = entity.to(MyObject.class)
    or what would…

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Painful and sometimes impossible to change pipeline options using Template

    Not all parameters can be configured at runtime. The doc should provide those information to users.

    For example, AWS credentials/regions are configured during template construction time. There is no way (at least for me) to change it during run-time if I want to pass in credentials during run-time.

    Another case, in order to accept dynamic BigQuery queries at run-time, extra code is needed to override the default query.
    What confuses me is that a query/table has to be provided during template construction otherwise BigQueryIO complains.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. gradle

    I would love an example gradle project. Of course gradle is super popular in the java community so it is very odd that there is only documentation on how to use dataflow with maven.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Beam SDK for Python: support for Triggers

    Beam Python SDK doesn't support triggers (or state, or timers)

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Add numWorkers and autoscalingAlgorithm parameter on google provided templates?

    I'm using "Datastore to GCS text" provided Template but with high load (~8M entities) dataflow takes too long to scale. If we can provide numWorkers and autoscalingAlgorithm parameters jobs will take less time to execute.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Bug: Apache beam DataFlow runner throwing setup error

    Hi,
    We are building data pipeline using Beam Python SDK and trying to run on Dataflow, but getting the below error,

    A setup error was detected in beamapp-xxxxyyyy-0322102737-03220329-8a74-harness-lm6v. Please refer to the worker-startup log for detailed information.

    But could not find detailed worker-startup logs.

    We tried increasing memory size, worker count etc, but still getting the same error.

    Here is the command we use,
    python run.py \
    --project=xyz \
    --runner=DataflowRunner \
    --staging_location=gs://xyz/staging \
    --temp_location=gs://xyz/temp \
    --requirements_file=requirements.txt \
    --worker_machine_type n1-standard-8 \
    --num_workers 2

    pipeline snippet

    data = pipeline | "load data" >> beam.io.Read(
    beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")
    )

    data…

    16 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4
  • Don't see your idea?

Cloud Dataflow

Categories

Feedback and Knowledge Base