This week’s Ask the Expert is answered by Ameet Naik, technical marketing manager at ThousandEyes.
Ask the Expert: To direct connect, or not direct connect to the cloud?
The cloud is a complex, distributed system that is incredibly hard to debug. You don’t own the infrastructure, but as an IT admin or network engineer you still own the outcome. To that end, many enterprises take extra steps to try to control as much of their application or service delivery as possible. For example, many opt to build in what they believe to be a guarantee of delivery by purchasing a direct connection between their own enterprise networks and their IaaS or PaaS of choice. But is a direct connection enough to guarantee cloud performance?
Lessons Learned, the Hard Way
On Friday 2nd March 2018, Amazon AWS’ US-East-1 region, located in Ashburn, VA, experienced a severe outage that impacted not only Amazon’s own Alexa, but also dozens of business-critical apps and services hosted within the IaaS provider, including Slack, Twilio and Atlassian JIRA. While the infrastructure recovered quickly from what happened to be a weather-related power outage, it had a global impact. There were cascading impacts on many software applications and services running on AWS.
As it turned out, the outage mostly affected customers relying on AWS Direct Connect, the exact service many believe to be a saving grace should outages occur. Internet access, though, recovered quickly and Amazon’s suggested workaround to its customers was to use their IPSec VPN service over the Internet.
So is a direct connection the solution to all your problems? The short answer is obviously no. So what’s an enterprise to do?
Cloud Connectivity Options, Explained
IaaS and PaaS services like Amazon AWS, Microsoft Azure and Google Cloud Platform let you create virtual server instances on demand. These instances live on a virtual private cloud (VPC), which typically lives on an isolated private network. There are three ways you can talk to applications living on these VPCs.
The first option is to assign public IPs to these servers, so they can communicate with the wider Internet. While this approach is great for external access (typically the web layer of public facing apps) it’s not so great for your internal database servers.
Another choice is to build an IPSec VPN tunnel from your enterprise network into the IaaS, and make the private address space routable within your enterprise. This option works well for microservices APIs and internal applications that are only accessed from within the corporate network. On the flip side, IPSec VPN tunnels require costly encryption hardware, and can impose unwanted latency onto application flows. Also, this option relies on the Internet as the underlying transport which, as we know, is inherently dynamic.
The third option is to establish a private connection between your enterprise network and the cloud provider, so your cloud network addresses are routable from within your enterprise networks and vice versa. The AWS version of this is Direct Connect, Microsoft Azure calls it ExpressRoute, and Google calls it Cloud Interconnect. Each IaaS platform, despite variations in access methods and redundancy, allows for your cloud resources to be routable from within your enterprise network.
Regardless which provider you choose, all three services involve establishing a connection to, and peering with, your cloud provider at one of many available exchange points. These connections can be capped at a certain bandwidth tier, or can be billed based on actual usage. Most offer redundancy options too, so that a failure on one link or router will not impact the connection.
The “Plain Internet” alternative to direct connections
Then there’s the alternative to direct connections: IPSec over the Internet. How does this compare? Performance-wise, once you hit a certain bandwidth level, IPSec VPN tunnels get prohibitively expensive, and they throttle your cloud bandwidth.
By contrast, private connections let you scale up seamlessly as you need more bandwidth. Consistency-wise, you have far more control over network paths, which are less likely to change with time, unlike the Internet, which is always changing. And finally, cost-wise, direct connect bandwidth is typically cheaper per Gbps relative to Internet bandwidth.
All that said, private peering connections are no silver bullet. One of the biggest advantages of the Internet is its resiliency – thanks to its high degree of connectedness, data will usually find a path to get from point A to point B. Keep in mind that the Internet was designed at the height of the cold war, with enough robustness in its design to survive a major catastrophe. But, Internet routing protocols don’t always find you the fastest or most optimal path, which is what today’s microservices apps demand. And then there’s the simple fact that you have to share this path with countless other traffic streams.
The Case for Both, Plus Visibility
It has probably become incredibly clear lately that the cloud is critical to your business, and therefore, you don’t want your private peering connections to turn into a single point of failure, as was the unfortunate case for so many applications on 2nd March. These cloud applications, which relied solely on a direct connection to AWS, simply failed to detect and recover from the loss of back-end connectivity. While neither direct connections nor IPSec VPN over the Internet are fail-proof on their own, the winning formula is clearly a combination of the two. Direct connections offer better performance and lower latency despite their dynamic nature. Ultimately, the Internet is a highly resilient fallback path that is always available to help you maintain service availability.
But perhaps most importantly, regardless of your cloud application delivery architecture of choice, the dynamic and uncertain nature of both options underscore the need for visibility and monitoring of your cloud applications at multiple layers of the protocol stack. Quite simply, without visibility, it is incredibly difficult to determine the scope and root cause of an outage like this, which leads to unnecessary swivel-chair troubleshooting.
Operations teams typically spend over 70% of their time figuring out where a problem lies, and only then can they begin to implement a fix. In the cloud, this ratio can get even worse unless you have sufficient insight into the correlation between application performance, network paths and Internet routing. By putting a visibility solution in place to monitor your ever-changing enterprise network, you’re certainly not preventing outages, but guaranteeing a much faster time to issue identification and, ultimately, the fastest possible time to repair.