Cloud Dataflow

Welcome to the Google Cloud Dataflow idea forum. You can submit and vote on ideas here to tell the Google Cloud Dataflow team which features you’d like to see.

This forum is for feature suggestions. If you’re looking for help forums, look here:

We can’t wait to hear from you!

How can we improve Cloud Dataflow?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Python 3x support

    Python 3x support is overdue. Python 3.6+ is now very mature and adds some serious speed improvements over 2.7x

    166 votes
    Vote
    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      You have left! (?) (thinking…)
      1 comment  ·  Flag idea as inappropriate…  ·  Admin →
    • BigQuery to PubSub Dataflow Template

      I need a BigQuery to PubSub Template in Dataflow. This will allow us to create a streaming job to export BigQuery Gmail Logs into our SIEM using our PubSub HTTPS Endpoint.

      57 votes
      Vote
      Sign in
      Check!
      (thinking…)
      Reset
      or sign in with
      • facebook
      • google
        Password icon
        Signed in as (Sign out)
        You have left! (?) (thinking…)
        1 comment  ·  Flag idea as inappropriate…  ·  Admin →
      • 54 votes
        Vote
        Sign in
        Check!
        (thinking…)
        Reset
        or sign in with
        • facebook
        • google
          Password icon
          Signed in as (Sign out)
          You have left! (?) (thinking…)
          under review  ·  6 comments  ·  Flag idea as inappropriate…  ·  Admin →
        • 51 votes
          Vote
          Sign in
          Check!
          (thinking…)
          Reset
          or sign in with
          • facebook
          • google
            Password icon
            Signed in as (Sign out)
            You have left! (?) (thinking…)
            0 comments  ·  Flag idea as inappropriate…  ·  Admin →
          • Patch Apache Beam's XmlSource to allow templating in Dataflow

            Background

            In our project, a Cloud Function is used to start a Dataflow pipe in batch modus to upload data to ElasticSearch. The source to the Dataflow is an XML file.

            The Dataflow template is used to upload the data into GCP.

            Problem

            Uploading of templates requires option parameters to accept parameters at runtime. This is implemented by using the ValueProvider interface to embrace the option-type.

            The class for reading XML source XmlSource did not use ValueProvider for its option parameters, we solved this by patching XmlSource and applied these changes to the class.

            Upload of dataflow template should be…

            32 votes
            Vote
            Sign in
            Check!
            (thinking…)
            Reset
            or sign in with
            • facebook
            • google
              Password icon
              Signed in as (Sign out)
              You have left! (?) (thinking…)
              0 comments  ·  Flag idea as inappropriate…  ·  Admin →
            • Example Code is Maddeningly Incomplete

              The example code provided is maddeningly incomplete. The biggest issue I have is that things like the complete code for the default templates is not provided. I want to create a slight variation of the Pub/Sub -> BigQuery example template, but I can't find the code for that template anywhere. It would be nice if that code were available so that I could base a custom dataflow job off of it. This would provide a known working example of exactly what I want from which to build on.

              26 votes
              Vote
              Sign in
              Check!
              (thinking…)
              Reset
              or sign in with
              • facebook
              • google
                Password icon
                Signed in as (Sign out)
                You have left! (?) (thinking…)
                1 comment  ·  Flag idea as inappropriate…  ·  Admin →
              • scheduling dataflow pipeline code as a job in cloud in java

                How can we schedule dataflow pipeline code as a job to cloud in java??

                26 votes
                Vote
                Sign in
                Check!
                (thinking…)
                Reset
                or sign in with
                • facebook
                • google
                  Password icon
                  Signed in as (Sign out)
                  You have left! (?) (thinking…)
                  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                • Ability to load data to BigQuery Partitions from dataflow pipelines (Python)

                  Python Dataflow pipelines fail in _parse_table_reference function when you specify a BigQuery Table name with partition decorator for loading. This is very important aspect if you would want to leverage BigQuery Table Partitioning.

                  25 votes
                  Vote
                  Sign in
                  Check!
                  (thinking…)
                  Reset
                  or sign in with
                  • facebook
                  • google
                    Password icon
                    Signed in as (Sign out)
                    You have left! (?) (thinking…)
                    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                  • Show Total Memory and CPU Usage Alongside Worker Graph

                    It would be amazing to in addition to the worker graph be able to directly see Memory and CPU consumption. This makes it much easier to co-relate & debug different stages and also gives a bit of insight into what machine types perform most optimal. It is possible to do this now by making metrics in Stackdriver, but it's very involved especially if a super simple graph could do the trick...just like in Kubernetes/GoogleContainerEngine

                    24 votes
                    Vote
                    Sign in
                    Check!
                    (thinking…)
                    Reset
                    or sign in with
                    • facebook
                    • google
                      Password icon
                      Signed in as (Sign out)
                      You have left! (?) (thinking…)
                      1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                    • Bug: Dataflow jobs are not shown consistently on cloud console

                      Currently running jobs are sometimes not shown on cloud console. After refreshing the page, they sometimes show up, just to disappear again a couple of seconds later.

                      The behaviour is very inconsistent and therefore I have not found a way to replicate the issue. It seems mostly time dependent. A couple of weeks ago I noticed this issue for the first time. Today, I've been suffering from the issue the whole day, while yesterday everything was working fine.

                      20 votes
                      Vote
                      Sign in
                      Check!
                      (thinking…)
                      Reset
                      or sign in with
                      • facebook
                      • google
                        Password icon
                        Signed in as (Sign out)
                        You have left! (?) (thinking…)
                        4 comments  ·  Flag idea as inappropriate…  ·  Admin →
                      • Add Dataflow Job Logs to Cloud Logging API

                        Dataflow Job Logs are separate from Cloud Logging, so you cannot see Job Logs under Cloud Logging, nor create a Stackdriver alert for failed Dataflow jobs.

                        https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline

                        18 votes
                        Vote
                        Sign in
                        Check!
                        (thinking…)
                        Reset
                        or sign in with
                        • facebook
                        • google
                          Password icon
                          Signed in as (Sign out)
                          You have left! (?) (thinking…)
                          0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                        • 15 votes
                          Vote
                          Sign in
                          Check!
                          (thinking…)
                          Reset
                          or sign in with
                          • facebook
                          • google
                            Password icon
                            Signed in as (Sign out)
                            You have left! (?) (thinking…)
                            0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                          • Bug: Apache beam DataFlow runner throwing setup error

                            Hi,
                            We are building data pipeline using Beam Python SDK and trying to run on Dataflow, but getting the below error,

                            A setup error was detected in beamapp-xxxxyyyy-0322102737-03220329-8a74-harness-lm6v. Please refer to the worker-startup log for detailed information.

                            But could not find detailed worker-startup logs.

                            We tried increasing memory size, worker count etc, but still getting the same error.

                            Here is the command we use,
                            python run.py \
                            --project=xyz \
                            --runner=DataflowRunner \
                            --staging_location=gs://xyz/staging \
                            --temp_location=gs://xyz/temp \
                            --requirements_file=requirements.txt \
                            --worker_machine_type n1-standard-8 \
                            --num_workers 2

                            pipeline snippet

                            data = pipeline | "load data" >> beam.io.Read(
                            beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")
                            )

                            data…

                            15 votes
                            Vote
                            Sign in
                            Check!
                            (thinking…)
                            Reset
                            or sign in with
                            • facebook
                            • google
                              Password icon
                              Signed in as (Sign out)
                              You have left! (?) (thinking…)
                              0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                            • dataprep / dataflow jobs region

                              TLDR: dataprep creates jobs in the us only. This is probably a bug.

                              I'm trying to prep data from BQ - EU to BQ -
                              EU.

                              But dataprep creates dataflow in the us, and because of that I get an error:

                              Cannot read and write in different locations: source: EU, destination: US, error: Cannot read and write in different locations: source: EU, destination: US

                              15 votes
                              Vote
                              Sign in
                              Check!
                              (thinking…)
                              Reset
                              or sign in with
                              • facebook
                              • google
                                Password icon
                                Signed in as (Sign out)
                                You have left! (?) (thinking…)
                                1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                              • Ability to use our own kubernetes cluster as dataflow runner

                                It would be good to use our own container cluster for running dataflow workers as dataflow already use kubernetes for deploying workers. This could even take into consideration of user supplied cluster's current workload and balance workers between user provided cluster and dataflow cluster.

                                13 votes
                                Vote
                                Sign in
                                Check!
                                (thinking…)
                                Reset
                                or sign in with
                                • facebook
                                • google
                                  Password icon
                                  Signed in as (Sign out)
                                  You have left! (?) (thinking…)
                                  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                • Provide more frequent Python updates

                                  Python SDK is not up-to-date with various cloud SDKs, last update was in September...

                                  12 votes
                                  Vote
                                  Sign in
                                  Check!
                                  (thinking…)
                                  Reset
                                  or sign in with
                                  • facebook
                                  • google
                                    Password icon
                                    Signed in as (Sign out)
                                    You have left! (?) (thinking…)
                                    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                  • Labeling Dataflow jobs

                                    Should be able to assign labels to data flow jobs and filter by labels in the overview page

                                    11 votes
                                    Vote
                                    Sign in
                                    Check!
                                    (thinking…)
                                    Reset
                                    or sign in with
                                    • facebook
                                    • google
                                      Password icon
                                      Signed in as (Sign out)
                                      You have left! (?) (thinking…)
                                      0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                    • Show cost of current job according to the new pricing structure.

                                      It could be good to see the cost of the job in job view, I even thought of doing an chrome extension for this cause it's pretty trivial with datas vCPU sec, RAM MB sec, PD MB sec etc.

                                      10 votes
                                      Vote
                                      Sign in
                                      Check!
                                      (thinking…)
                                      Reset
                                      or sign in with
                                      • facebook
                                      • google
                                        Password icon
                                        Signed in as (Sign out)
                                        You have left! (?) (thinking…)
                                        0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                      • PubSub to BigQuery template should allow Avro and/or Protobuf

                                        BigQuery already has support for loading files in Avro, why not streaming them in from PubSub? This seems like an obvious feature to have but I don't see any way to do it currently. The PubSub to BigQuery template is great, but it would be so much better with this one feature turned on.

                                        9 votes
                                        Vote
                                        Sign in
                                        Check!
                                        (thinking…)
                                        Reset
                                        or sign in with
                                        • facebook
                                        • google
                                          Password icon
                                          Signed in as (Sign out)
                                          You have left! (?) (thinking…)
                                          0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                        • Add the ability to sort jobs by status (e.g. running vs closed)

                                          I would like to be able to quickly see the number of jobs that are currently running. Sometimes streaming jobs that have been running for weeks get buried below batch or testing jobs.

                                          6 votes
                                          Vote
                                          Sign in
                                          Check!
                                          (thinking…)
                                          Reset
                                          or sign in with
                                          • facebook
                                          • google
                                            Password icon
                                            Signed in as (Sign out)
                                            You have left! (?) (thinking…)
                                            0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                          ← Previous 1 3 4
                                          • Don't see your idea?

                                          Cloud Dataflow

                                          Feedback and Knowledge Base