iStock 000026723014 Small
18.4.2016

The risk elephant in the room: the vulnerability of cloud automation beyond the first line of defense.

While a lot of people lately seem to agree that cloud security is now getting good enough to host sensitive data (really? why now? what has matured to support such general statements?), there is a disturbing lack of attention to one of the most pressing issues - security of cloud fabric automation, a vital part of the cloud service provider's space of responsibility.

Remember the times when we did everything (and then some) to protect the management interfaces of our infrastructure - two-factor authentication for all sessions, special out-of-band management networks, firewalls and VPNs dedicated to management traffic, dedicated management interface IPS sensors and whatnot. We were doing this because we understood that if an attacker manages to compromise our infrastructure/application management systems, the consequences would be catastrophic and system-wide. We designed a strong layer of defense for the management system, and then as we assumed that it could still be compromised, we designed devices and apps managed by it to not fully trust it using RBAC and similar tools.

automation

Today, we expose an entire cloud's API to the Internet, and over it, aiming to provide as much cloud automation functionality as possible.

Think about this for a minute.

What Cloud Automation?

Cloud automation is a great thing - we can build fantastic levers and wheels in the cloud to accomplish complex tasks without human intervention, and expose their push-buttons using a cloud API or a portal. However, as safety engineers will attest, automation can also be extremely effective in causing catastrophic system failures (learning from the safety-related cascading failures of the Quantas Flight 32, AT&T frame-relay network meltdown or Amazon AWS storage network/EBS outage), and that also very much applies to catastrophic cloud security incidents. While a cloud provider should design their fabric to be resistant to cascading failures, consider the possibility that such safety failures can turn into security failures much more easily than by chance, if they are triggered by a focused attacker.

In order for cloud automation software to be cost effective, it needs to control A LOT of the cloud to avoid manual labour. A lot of control means a lot of potential damage when things go wrong. And then, maliciously-induced automation failures do not only result in denial of service: they can easily be used to expose hosted systems and data, to extract sensitive information, or to scalably inject back-doors of all sorts into an enormous number of cloud workloads and data. 

The Current State of Cloud Automation Security

To be fair, there is some awareness of cloud API security, but everyone only looks at the façade: how we control interaction of cloud users with the outermost layer of the cloud - authenticating and authorizing API/portal users, cryptographically protecting communications with the API/portal, and auditing. If, or better, WHEN these controls fail, the soft, white underbelly of the cloud (the automation routines at the attacker's disposal) is exposed and the attacker can wreak havoc on a HUGE collection of data and workloads, if automation is not DESIGNED to prevent it using smart defense-in-depth. Remember that thing called "perimeter security" and the terrible realization of its limitations in the late 1990s? This is exactly the same when it comes to cloud fabric security.

api perimeter

It Is Just Software, Right?

Cloud automation is a collection of software that controls cloud resources. As any software, it contains design flaws and implementation errors TODAY, resulting in vulnerabilities that are waiting to be discovered and exploited. Sure, we can test and patch our APIs using OWASP Top-Ten band-aids, but shouldn't critical software require a better approach, perhaps one mirroring the lessons learned in safety engineering of complex systems (and of course, upgrading these lessons to account for malicious intent).

Solutions

As Slavoj Žižek once famously said, we first and foremost need to ask the right questions related to complex problems. How can automation fail security-wise? How can we limit the impact of security failures? Which existing lessons can we apply to cloud automation security? What are the most cost-effective controls? Stay tuned, as I will provide guidance and examples in next instalment of Loose Lips.

Feedback

Have you witnessed a automation-related cloud security collapse? Have you designed security controls INSIDE a cloud automation framework? I'd love to hear from you in the comments section!