Cloud Dataflow

Welcome to the Google Cloud Dataflow idea forum. You can submit and vote on ideas here to tell the Google Cloud Dataflow team which features you’d like to see.

This forum is for feature suggestions. If you’re looking for help forums, look here:

We can’t wait to hear from you!

  1. I have to Dump Data in a table Reading a File From Bucket and Then Fire a 200 Query on that.

    I have to Read a File From Bucket then I Have To Dump that File Data to a BigQuery table and Then I have to Run 200 Queries in that Dumped table My Whole Process is one By one But Due to parallel Work my Work is Not Done Perfectly So i Have to Synchronize my Work Such that one Job get Finished then another get Triggered .
    Can anyone Help me.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Does your worker machine type affect scheduling?

    Please update the docs to describe how machine type affects jobs.

    If you have a serial pipeline and don't do any native threading in your DoFn, is a n1-standard-8 going to be any faster than an n1-standard-1?

    If you have parallel stages and set a max of 50 workers, will you get work done faster on a n1-standard-8 than a n1-standard-1. i.e. will use 400 cores for workers instead of 50?

    [please ignore that n1-standard-8 has more ram and may help groupBy for this discussion]

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. You could update the screen shots to be the current console.

    The screen shots are just old and outdated so cannot be used for troubleshooting. Permissions is not on the left for example.

    The error log messages for permissions fail to say which account is having access issues so you can and I have spent hours trying different random combinations. yet still have failed to get any dataflow most basic examples to work ever.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Painful and sometimes impossible to change pipeline options using Template

    Not all parameters can be configured at runtime. The doc should provide those information to users.

    For example, AWS credentials/regions are configured during template construction time. There is no way (at least for me) to change it during run-time if I want to pass in credentials during run-time.

    Another case, in order to accept dynamic BigQuery queries at run-time, extra code is needed to override the default query.
    What confuses me is that a query/table has to be provided during template construction otherwise BigQueryIO complains.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. I really need Python examples.

    How to read a file from GCS and load it into Bigquery, with Python.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. It would be great if the dataflow sdk would work with a newer version of Google Datastore than v1

    Whenever I want to use Dataflow together with Google Datastore, I have to use the old Datastore.v1 version. In this version it seems like I have to encode and decode entities of different kinds manually by extracting the Values (and knowing the type of value) and setting them on a new Object. When I compare this with newer Versions of Datastore or the NodeJs implementation, the handling of datastore-Objects is a dream (i.e. nodeJs just gives you the json-representation). Would it be possible to retrieve Entities either by "selecting" a class type like:
    MyObject mObj = entity.to(MyObject.class)
    or what would…

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Custom job templates

    Custom job templates

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  8. Allow custom logger appenders

    Using a custom log appender (e.g. with logback) inside Dataflow is impossible at the moment. Any logging settings I have seem to be superseded by Google's own appender and just show up in the Dataflow logs in Stackdriver. I want to send my logs to an Elasticsearch cluster, since the rest of my logs which are generated by other non-Dataflow systems are there as well.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Customizable Columns in Overview Page

    Ability to show total worker time, max number of workers, zone information in the overview page. This should be customizable similar to what we see in the app engine versions page

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. dataprep/ Enable to delete more the one dataset

    In the DATASETS tab on dataproc, it would be extremely helpful to check multiple datasets and delete them together

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Hello!

    I’m not sure I understood the suggestion — perhaps the post is incomplete?

    If you could elaborate further, I’ll be happy to take a look. Thanks!

  14. autodetect

    Enable --autodetect for BigQuery loads, consistent with bq load --autodetect on the command line

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Cannot specify diskSizeGb when launching a template

    When I create a template, It's possible to specify --diskSizeGb, but, If you don't specify it, it's not possible to pass it as parameter

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Bug: Nullpointer when reading from BigQuery

    I believe I'm experiencing a bug in BigQuerySource for apache Beam when running on google dataflow. I described this in details on stackoverflow: https://stackoverflow.com/questions/44718323/apache-beam-with-dataflow-nullpointer-when-reading-from-bigquery/44755305#44755305

    Nobody seems to be able to respond to this. So posting it here as a potential bug.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  17. Button: "copy dataflow job"

    I would like to be able to copy a dataflow job so that i can tweak the parameters and run it again without having to enter them all in manually.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Custom job templates

    Custom job templates

    0 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
1 2 4 Next →
  • Don't see your idea?

Cloud Dataflow

Categories

Feedback and Knowledge Base