Matthew Garrett, principal security engineer at CoreOS, is one of the most respected names in the Linux world when it comes to security and cloud computing. I sat down with Garrett at CoreOS Fest to discuss the risks around containers and Linux in general. Here is an edited version of that discussion.
Lately we’ve been hearing a lot about container security. What are the risks involved with containers and what are open source projects like CoreOS doing to mitigate them?
Security is of a special concern around containers because many people think of containers in the same way that they think of VMs (virtual machines). The degree of isolation between containers is somewhat weaker than the degree of isolation between virtual machines. From that perspective it’s very easy to get into the mindset of containers being something that reduces security. In reality this depends very much on what you’re trying to do with containers, and containers do improve isolation compared to process that are simply running on the same host.
Traditionally, if you had a Web server and a database run on the same system and if someone compromised the Web server then they’re in a fairly strong position if they want to attack the database. If the Web server and the database are running in different containers, there’s still a high level of isolation between them and [hacking into the database is] much more difficult.
Now people want to do more with containers. People want to use containers as a means to make it easier to run un-trusted code to distribute applications that may have very different levels of security. From our perspective we want to improve the security of containers to the point to where ideally there is the same level of expectation of security that there is with VMs. There’s certainly much more work that can be done in that space before I think we can say, “Yes, you can with confidence run un-trusted code in one container on the same system that you’re running code that is mission critical.”
There are things that we’ve done that bring us closer to this. We are now generally using SELinux as an initial level of isolation. So even if someone is able to take advantage of flaw in the kernel that allows them to escape from the container they are still confined by the SELinux; they are still unable to run any executable that is not in the container that they started in. They’re not able to access any files that they were not already able to access. That gets us much closer. The downside is we’re still relying on the kernel to provide us with strong isolation. If there’s a flaw in the kernel it’s still conceivably possible for someone to elevate their privileges to be able to break out of not only the container but also to break out of the SELinux confinement. That’s a real genuine concern.
Mitigating that is a multifaceted thing. There is no single way to help avoid that. As Greg Kroah-Hartman mentioned in his keynote yesterday, maintaining an updated kernel is a pretty vital part of that, but we also need to get better at improving the security of the kernel itself. And the work that Kees Cook and certain other developers have been doing on trying to bring more mitigation features into the kernel, making it more likely that if there is a kernel bug that it will be blocked by some other piece of technology, rather than allowing direct access to elevated privileges. That’s massively important work and it’s great to see that it’s happening now. It’s disappointing because it’s taken us this long to start caring about it. The fact that people are working on this gives me much more hope for the state of security in the container space and in free software in general for the future.
Many vendors are still using very old, unmaintained versions of the Linux kernel. There is a traditional notion that once a machine is up and running, you shouldn’t touch it. How do you change that mindset?
One part of this that I think is especially interesting from the container world is that single machines shouldn’t matter that much. In the more traditional deployment mindset we care very strongly about the stability and functionality of a single machine.
Once you get into the mindset that instead of there being a Web server you have a set of machines and being a Web server is one of their functions, when we start decoupling the normal functionality from the operating system, that’s less of a problem. If you’re worried that a new kernel is going to be fine in testing and then fail in deployment, with CoreOS, for instance, you can update a subset of your systems to the new version, boot that, and then if they fall over then sure you’ve lost some of your Web server containers, but all you’ve lost then is some performance. It’s immediately possible for you to roll back those systems to the old version. If you can maintain a setup where you have some systems running off the new kernel, some systems running on the old kernel, and test them in the real world and then once you’re confident move over to the new kernel.
This is, obviously, not how we’ve been doing things for the past 20 years, and it is going to take time for people to start trusting this, and it is going to take time for people to re-architect their processes to change the way they’ve set up their systems. It isn’t impossible. I think this kind of distributed computing is going to be a strong weapon in the fight against vulnerable pieces of software. Using containers this way does, in fact, make it easier to deploy security updates.
This is exactly how CoreOS works. When you do a CoreOS upgrade the old version remains on disk, and if you want to revert back to it you do that. The only downtime is the time it takes the system to reboot.
What are the new challenges in the container space? There were many new open source technologies like Stackenets, rktnetes that were announced by CoreOS.
It will be pretty interesting to bring some of the other security work we are doing around CoreOS into the distributed Trusted Computing work that allows us to verify that system have not been tampered with, that they are still running the original firmware. To bring that into Stakenetes means that it’s going to be very straightforward to deploy OpenStack on a completely trusted environment where you can verify full state of system before bringing them into OpenStack cluster.
We still have to deal with the fact that we are still relying on the security of Linux, and that’s a huge quantity of code. I think we’re going to be spending more time helping develop various mitigation features, and potentially looking at new technologies that allow us to have even greater confidence in the kernel’s ability to isolate us, or at the very least to automatically detect that the system has been compromised, tampered with and is no longer in a trustworthy state.
CoreOS recently open sourced another project called Clair that helps in monitoring the security of container; how different is this approach compared to solutions?
Clair is our fully free software implementation for scanning containers and then using the associated package metadata to identify containers that contain old versions of software; software with known CVE (common vulnerabilities and exposures).
Approaching it in this way, looking at the package data, means that as long as the distributions are providing accurate data about which versions contain which security issue we can provide users with a strong level of confidence that there are no known security issues within their containers, or to identify which images need to be updated in a fully automated way.
There are theoretically other things we could do here. Static code analysis would potentially allow us to identify individual binaries with security issues, but the problem with that approach is that if it says you have a vulnerability, that’s great. You probably do have a vulnerability. If your scanning tool says you don’t have a vulnerability that’s not particularly useful information. You don’t know whether that is because there are no known vulnerabilities or because the tool was unable to identify it.
Approaches like Clair will not cover as many situations, but it means that the situations it does cover are giving you results you can be confident in.
This is really important because one of the things people have been very worried about in the container space is identifying which versions of containers do contain, for instance, old versions of OpenSSL. If you are running such containers are you risking or compromising your system security? I think this kind of tool gives system admins the confidence they need to be able to deploy containers without having to worry about that.