How can we improve Compute Engine?

Add NAT routing as a service

Allow instances without an external IP address to access the Internet via NAT, using a single external IP address for all instances.

We often need to connect to client services which are restricted by IP address. We can't whitelist instances' external addresses (even if they had one) because we use managed instance groups and frequently delete and recreate instances when deploying updates, which means the external IP address would change.

We can work around this by provisioning an instance to act as a NAT router, but the difficulty of doing HSRP-style IP failover at GCE means this adds a single point of failure.

402 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Felicity Tarnell shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

    21 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...
      • Peter Vandenabeele commented  ·   ·  Flag as inappropriate

        Is this NAT service not the solution for this? https://console.cloud.google.com/net-services/nat
        I am trying it now. It states:

        ```
        Cloud NAT lets your VM instances and container pods communicate with the internet using a shared, public IP address.

        Cloud NAT uses NAT gateway to manage those connections. A NAT gateway is region and VPC network specific. If you have VM instances in multiple regions, you’ll need to create a NAT gateway for each region.
        ```

      • Kieran P commented  ·   ·  Flag as inappropriate

        This is a must. We're having to roll our own nat gateway servers right now. GCP needs a nat gateway service just like AWS has, to handle this for us and scale up as the rate of outgoing data increases.

      • Dario Nieuwenhuis commented  ·   ·  Flag as inappropriate

        +1000 to this.

        Another use case is reducing the amount of public IPs for PCI DSS compliance. PCI requires you to get quarterly scans of all your public IPs from an "approved scan vendor", which usually charge a fixed (and quite high!) amount per IP.

        This gets expensive quickly in cloud environments where you have lots of instances.

      • Rob Archibald commented  ·   ·  Flag as inappropriate

        I would also love to see this on GKE! I need to connect to external services which use a whitelist and it's critical that I have a stable IP address that I can provide. Today I just have to give the IP addresses of my GKE instances and update whenever there is a change.

      • Joe commented  ·   ·  Flag as inappropriate

        I wonder if set nginx proxy other than NAT gateway would also help?
        Any idea about using nginx proxy to solve this problem?

      • Anonymous commented  ·   ·  Flag as inappropriate

        Would love to see this. I'm using GKE to run a few clusters and we need to have a single public egress ip to connect to some third-party ip's which use ip whitelisting. A NAT as a service would help here to push the traffic from the nodePools trough this NAT service for specific routes.

      • Blake Acheson commented  ·   ·  Flag as inappropriate

        it's crazy that this doesn't exist....doing this manually is not scalable with hundreds of instances

      • AdminPaul Nash (Product Manager, GCE, Google) commented  ·   ·  Flag as inappropriate

        Felicity, thank you very much for posting the detailed information on your workaround! I'm sharing this with the networking and support teams for their information, and I'm sure other users will find this helpful too until we can hopefully provide a service like this. Thank you for helping build our user community here on Uservoice!

      • Felicity Tarnell commented  ·   ·  Flag as inappropriate

        We have worked around this for now using two instances running Pacemaker. Since GCE doesn't support multicast, Corosync has to be configured with UDP unicast (udpu) and a static member list.

        To manage the default route, I wrote a custom resource type called "gcproute": https://github.com/torchbox/pcmk-gcproute

        The Pacemaker configuration looks like this:

        primitive ip-default-route ocf:tbx:gcproute \
        params name=my-default-route network=my-gcp-network prefix=0.0.0.0 prefix_length=0 \
        op monitor interval=10s timeout=30s \
        op start timeout=30s interval=0 \
        op stop timeout=30s interval=0 \
        meta target-role=Started

        This creates a route for 0.0.0.0/0 (i.e., a default route) called my-default-route with the instance as the destination. Pacemaker will ensure that the resource fails over if a node fails or is down for maintenance.

        For this to work, you need to delete the default default route (sic) that GCP creates, tag your router instances with some tag (like "router"), and create a second static default route:

        kind: compute#route
        description: Router traffic via GCP gateway
        destRange: 0.0.0.0/0
        name: router-default-route
        network: https://www.googleapis.com/compute/v1/projects/myproject/global/networks/my-gcp-network
        nextHopGateway: https://www.googleapis.com/compute/v1/projects/myproject/global/gateways/default-internet-gateway
        priority: 0
        tags:
        - router

        This means traffic from the router instances will use GCP's default gateway (which is fine, as they have external IP addresses), while other instances will use the Pacemaker-managed default route that goes via the router instances.

        Of course the router instances also need to be configured to do NAT with iptables.

        I don't really like this solution: failover is quite slow (sometimes on the order of 30 seconds), and it's more complicated than I feel it should be. But until GCP provides this as a service, it's a reasonable alternative.

      • Omar Juma commented  ·   ·  Flag as inappropriate

        Yes, my team would love this. Right now we have created our own instances to manage this but it is a pain, no failover, and must be recreated for each project & network.

      • AdminPaul Nash (Product Manager, GCE, Google) commented  ·   ·  Flag as inappropriate

        Thanks, Felicity. I didn't see your comment, but moved it there anyway. I moved an older, less complete idea into this one. Stay tuned, we might have some news here soon, I know the product team has been exploring this due to customer demand.

      • Felicity Tarnell commented  ·   ·  Flag as inappropriate

        I posted this in load balancing by accident, but I don't see a way to move it... it's meant to be in Compute Engine.

      ← Previous 1

      Feedback and Knowledge Base