Cloud Dataflow

Welcome to the Google Cloud Dataflow idea forum. You can submit and vote on ideas here to tell the Google Cloud Dataflow team which features you’d like to see.

This forum is for feature suggestions. If you’re looking for help forums, look here:

We can’t wait to hear from you!

  1. 83 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  10 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. scala friendly api / library

    Current darkjh/scalaflow library is pretty basic and the DoFn etc is pretty messy. It would be nice if scala was natively supported.

    8 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  3. Add the ability to sort jobs by status (e.g. running vs closed)

    I would like to be able to quickly see the number of jobs that are currently running. Sometimes streaming jobs that have been running for weeks get buried below batch or testing jobs.

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Improve cost effectiveness of autoscaling algorithm

    I'm currently processing log data from multiple days with Cloud Dataflow. According to the defined options it uses 10 to 100 workers and the throughput-based autoscaling algorithm. At the moment there are still 64 workers active, while only one job is still running with around 1500 elements per second. If you look at the CPU graph of the workers you see, that almost all of them are idle for the last 30 minutes. I would prefer a more carefree autoscaling, where I know I always get the optimal cost effectiveness.

    5 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →

    Hi Max,

    We’ve done a few performance optimizations lately that should result in a much improved experience. Could you share a jobID for us to take a look at? (I’m curious to examine the experience you describe).

    Thanks!

  5. Display execution parameters on Job info screen

    On the main info screen for a particular job, a tab for execution parameters would be very useful for debugging and quantifying job performance.

    Pretty much the whole suite of:

    --input
    --stagingLocation
    --dataflowJobFile
    --maxNumWorkers
    --numWorkers
    --diskSizeGb
    --jobName
    --autoscalingAlgorithm

    that dataflow supports as execution parameters would be great to have to the right of "Step" on a tab called "Job".

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Hello!

    I’m not sure I understood the suggestion — perhaps the post is incomplete?

    If you could elaborate further, I’ll be happy to take a look. Thanks!

  • Don't see your idea?

Cloud Dataflow

Categories

Feedback and Knowledge Base