Improve cost effectiveness of autoscaling algorithm
I'm currently processing log data from multiple days with Cloud Dataflow. According to the defined options it uses 10 to 100 workers and the throughput-based autoscaling algorithm. At the moment there are still 64 workers active, while only one job is still running with around 1500 elements per second. If you look at the CPU graph of the workers you see, that almost all of them are idle for the last 30 minutes. I would prefer a more carefree autoscaling, where I know I always get the optimal cost effectiveness.
We’ve done a few performance optimizations lately that should result in a much improved experience. Could you share a jobID for us to take a look at? (I’m curious to examine the experience you describe).
thank you for your answer. The jobID is 2016-02-26_07_51_26-331422378274284858