In 2007, David Snowden and Mary Boone published an article in Harvard Business Review called \u201cA Leader's Framework for Decision Making.\u201d In it, they describe a way of looking at different classes of problems, and how the methods used to solve those problems will be different depending upon in which context you are operating. They called this framework \u201cCynefin,\u201d which is a Welsh word that describes the often-unforeseen factors that influence our decisions.\nWhen learning about this framework, I could not help but think of their descriptions in the context of many problems that I\u2019ve had to solve over the course of my career in Production Operations and Engineering. The authors even describe Cynefin in a context that will look very familiar to those who have been in this same role:\n\n\u201cLeaders who understand that the world is often irrational and unpredictable will find the Cynefin framework particularly useful.\u201d\n\u2013David Snowden and Mary Boone, "A Leader's Framework for Decision Making" (Harvard Business Review, 2007)\n\nIrrational and unpredictable? How many production outages have I been involved with that appeared irrational and unpredictable? Most of them! I began to think about how Cynefin could be applied in a DevOps context.\nWhat is Cynefin?\n IDG\n\nThe Cynefin framework.\n\n\nAs can be seen in the diagram, Cynefin separates our problem or decision types into 4 distinct quadrants: Simple, Complicated, Complex and Chaotic. In each of the scenarios, there are different leadership skills that must be applied to successfully navigate the scenario.\nSimple\/Obvious\nThe Simple or Obvious domain is one characterized by simple inputs and outputs. A simple input leads to a well-defined output. There is no ambiguity. These outputs are often characterized as best practice. If I need to make fries at a fast food restaurant, there is a specific volume of fries and a specific amount of time they must be cooked at a specific temperature. Any deviations from the norm should be minor and should easily be handled by the operator.\nComplicated\nIn the realm of the Complicated, there needs to be some expert knowledge applied to the problem set, in order to arrive at a decision. The authors called this \u201cgood practice\u201d and there needs to be an interpretation of the problem before a decision is made. It is not simply deciding which \u201cbest practice\u201d is applied to each situation. An answer is definitely achievable, but it will not necessarily be immediately obvious.\n\n\u201cReaching decisions in the complicated domain can often take a lot of time, and there is always a trade-off between finding the right answer and simply making a decision.\u201d\n\u2013David Snowden and Mary Boone, "A Leader's Framework for Decision Making" (Harvard Business Review, 2007)\n\nComplex\nAccording to Snowden and Moore, many problems in organizations can be characterized as complex. These are situations for which there is no clear well-defined outcome, and the problem must be probed in order to ascertain the correct path forward. I have seen many production environments that I would consider to be complex. \u201cComplex distributed systems\u201d is a very common phrase in our profession. There are many outages that have happened, because some input to the system had a completely unexpected outcome and resulted in a major problem.\nThe Knight Capital disaster is a classic example. No one had predicted that a deviation on one system, would lead to a catastrophic outcome for the company. \u00a0When dealing in the realm of the Complex, caution is warranted and decisions should be made based on evidence, not simply past experience.\n\n\u201cMost situations and decisions in organizations are complex because some major change...introduces unpredictability and flux. In this domain, we can understand why things happen only in retrospect. Instructive patterns, however, can emerge if the leader conducts experiments that are safe to fail.\u201d\n\u2013David Snowden and Mary Boone, "A Leader's Framework for Decision Making" (Harvard Business Review, 2007)\n\nChaotic\nThe chaotic is the area of unknown unknowns. As it is described, the only objective in the Chaotic arena, is to remove oneself from that arena, as quickly as possible. Leaders in this area are advised to make a decision and try and move to another quadrant, any quadrant, from which a definitive path forward can be taken.\nCynefin in practice\nSo, how can we apply Cynefin in a DevOps context? What can we recognize about these four domains that is applicable to our responsibilities of keeping the site up, and keeping developers moving as fast as possible?\n\n\u201c...then sense where stability is present and from where it is absent, and then respond by working to transform the situation from chaos to complexity, where the identification of emerging patterns can both help prevent future crises and discern new opportunities.\u201d\n\u2013David Snowden and Mary Boone, "A Leader's Framework for Decision Making" (Harvard Business Review, 2007)\n\nWhat I came to realize, was that our job in operations is to move problems clockwise around the Cynefin diagram, trying to make most problems faced by developers simple. \u00a0For example, if I want a new virtual machine in AWS, it is a simple, well defined API call that needs to be made in order for this to happen. \u00a0All the inputs are well defined, and all the outputs are well understood. Exactly like the bottom right quadrant.\nDamon Edwards likes to say that \u201cOperations provides a platform.\u201d As this is the case, then part of our jobs in Operations is to provide a platform, similar to that presented by the AWS API, which enables self-service activity by the development teams, so that tasks they are trying to accomplish are simple and obvious. To ensure it does not require them to apply any expert knowledge to get their work done. \u00a0I once worked with an engineering team that estimated they spent more than 60% of their time on \u201cplumbing\u201d or wiring up the virtual hardware necessary for them to accomplish their task. Work that could be provided by a platform developed by Operations. Coaching these teams to a new way of working provided some very quick ROI for that client!\nIf our goal is to be as close to the Simple quadrant as possible, we can look at some examples where this is not the case, and some ways in which we can do better.\nEnvironments\nI have often worked with clients who have made a large effort to build out their production environments where everything is very clean and well defined. That does not mean that the environment is trivial (or obvious) to understand, but they make it possible. They are using Infrastructure as code, they package everything into containers, they do regular deployments and there is plenty of documentation. I would characterize those environments as Cynefin Complicated. \u00a0They do require some expert knowledge to understand, but we can reason about them.\nWhen it comes to their staging environments however, these same clients have left it so that everything is a mess. In a misplaced effort to \u201csave money\u201d, the staging environment is where all the corners are cut. Instead of 5 separate web tiers like production, there are 5 web configurations jammed into one host on different ports. Instead of an Oracle RAC database, there is a single Postgres instance that is \u201cclose enough\u201d. Of course, as this environment looks nothing like production, it\u2019s basically worthless for testing and because it\u2019s such a hack of previously isolated things jammed together, we\u2019ve actually moved from Complicated to Complex, and have a much harder time maintaining the environment.\nA simpler way to deal with the problem (and save money) is to simply run smaller instances of the production tiers in the staging environments and use the exact same business logic to build both. If we are running on a c4.4xlarge instance in production, then we can use a c4.large instance in staging (or whatever is appropriate). This way, the environments are basically identical, except for load. \u00a0This also means that any code intended to manage production can be tested in the staging environment first, and as Gene Kim says: The ability to build representative test environments on demand is one of the strongest indicators of high performing IT teams.\nWe may not have moved all the way to Simple in this case, but we\u2019re in a much better place than when operating in the Complex.\nDeployments\nAnother example of Cynefin in action can be in our deployment processes. For many years, we have seen deployments as nightmares for Operations teams. Deployments that happen infrequently batch up large amounts of changes just waiting to interact in new and exciting ways under production load.\nOften these infrequent releases involve multiple teams, executing a series of steps, all designed to work together over a series of multiple hours, until the deployment is finally complete. If there are any problems, there are complicated rollback procedures, only some of which have been tested. Generally, each application will have its own deployment procedure depending on its age, coding language, development team, etc. This is definitely in the realm of the Complex, because not only do we need to apply expert knowledge like in Complicated, but because every procedure is a unique snowflake, i.e. we don\u2019t know what effect any one action may have on any other system.\nThe first step in moving to the complicated would be to try and align all the different deployment schemes around a common pattern or three. In this way, for any one deployment, we only have a limited amount of possibilities to reason about. \u00a0This can bring us into the area of \u201cgood practice\u201d, where we do not need to consider a bunch of anomalous outliers.\nIf we wish to make the final jump to Simple, we need to create an environment where developers have a self-service platform that is constructed with well-defined inputs and outputs. We can use a Chatbot like Hubot, Lita or Errbot to make the inputs, the interface, uniform for any type of deployment. Regardless of the deployment itself, the interface to the chatbot will make everything appear the same, and return the same well-defined output, even as the actual mechanisms for deployment are hidden from the end user. Thankfully, even in this case, the documentation of how the actual deployment is done, is the source code itself, so the mechanisms can be explored and understood as well. In this case, we\u2019ve moved our deployments from Complex to Simple. There is no question: this is a large but worthwhile investment.\nCynefin as a leadership tool\nOften as leaders, we are asked to make decisions about which is the right path forward. Depending on the context of the situation, there can be different choices made. The Cynefin framework gives us a way to look at these situations and decide what is the appropriate response.\nBy applying this same framework to Operations work, we can move toward more self-sufficient, high performing engineering teams. As we create platforms that present engineers with interfaces that are Simple, that are well defined and don\u2019t require a lot of creativity and expertise to utilize, we allow them to focus on things that do require those skills, like writing code and growing our businesses.