It would be nice if GO was natively supported.6 votes
Would like a way to create a Read Transform that can be scheduled to upload an FTP payload to Google Storage for further processing.4 votes
Please update the docs to describe how machine type affects jobs.
If you have a serial pipeline and don't do any native threading in your DoFn, is a n1-standard-8 going to be any faster than an n1-standard-1?
If you have parallel stages and set a max of 50 workers, will you get work done faster on a n1-standard-8 than a n1-standard-1. i.e. will use 400 cores for workers instead of 50?
[please ignore that n1-standard-8 has more ram and may help groupBy for this discussion]1 vote
Currenlty when BigQueryIO.Write try to insert something wrong in streaming mode log contains only call stack, but not the reason of the error (like wrong format, of wrong field name):
exception: "java.lang.IllegalArgumentException: timeout value is negative
at java.lang.Thread.sleep(Native Method)
job: "2016-10-061514_44-8205894049986167886"3 votes
Dataflow Job Logs are separate from Cloud Logging, so you cannot see Job Logs under Cloud Logging, nor create a Stackdriver alert for failed Dataflow jobs.21 votes
I’m not sure I understood the suggestion — perhaps the post is incomplete?
If you could elaborate further, I’ll be happy to take a look. Thanks!
How can we schedule dataflow pipeline code as a job to cloud in java??26 votes
I'm currently processing log data from multiple days with Cloud Dataflow. According to the defined options it uses 10 to 100 workers and the throughput-based autoscaling algorithm. At the moment there are still 64 workers active, while only one job is still running with around 1500 elements per second. If you look at the CPU graph of the workers you see, that almost all of them are idle for the last 30 minutes. I would prefer a more carefree autoscaling, where I know I always get the optimal cost effectiveness.5 votes
We’ve done a few performance optimizations lately that should result in a much improved experience. Could you share a jobID for us to take a look at? (I’m curious to examine the experience you describe).
It would be nice to have a direct link to the logs of the job from the overview page.
At the moment, you have to:
-Click the job
-Wait for it to load
-Click "Worker Logs"
-Wait for it to load
It should be just one click to get the logs :-)4 votes
Thanks for the feedback!
We are working on improving this experience along this lines. Will be happy to discuss more details next time we sync.
On the main info screen for a particular job, a tab for execution parameters would be very useful for debugging and quantifying job performance.
Pretty much the whole suite of:
that dataflow supports as execution parameters would be great to have to the right of "Step" on a tab called "Job".3 votes
Thanks for the suggestion!
I would like to be able to quickly see the number of jobs that are currently running. Sometimes streaming jobs that have been running for weeks get buried below batch or testing jobs.6 votes
Thanks Andrea, we’re looking into it…
Current darkjh/scalaflow library is pretty basic and the DoFn etc is pretty messy. It would be nice if scala was natively supported.8 votes
Thanks for the suggestion, Ankur!
- Don't see your idea?