Banks, especially retail banks, have to be very risk-averse, especially when it comes to their IT ecosystems. For anyone who has ever worked in IT at retail banks, their deployment model is notorious. Firstly, system changes are implemented six to ten times a year, usually with a “freeze” over periods of high retail activity, such as Christmas and Thanksgiving.
Secondly, the movement from development to finally entering production is a long and slow path; some banks have as many as three preproduction environments, where the changes can sit for up to a month before progressing to the next environment. Finally, despite all these precautions, system failures once the changes are applied to the production environment are disturbingly high, with predictably adverse reactions from the bank’s customers and the potential of financial loss.
The irony is that all the safeguards that are put in place to avoid risk create situations of high risk.
How being ultra-cautious creates risk
There is an old saying, “practice makes perfect.” This is one of the fundamentals of Lean manufacturing or the “Toyota Way.” Frequent repetition improves the ability to perform a task.
On the face of it, the practices mentioned above would seem to be sound. They are the tried and tested ways of implementing changes using the “waterfall” or traditional IT project approach and the system development lifecycle. Requirements are painstakingly defined, interpreted as technical specifications, and handed off to developers to write the code. The code then passes into the testing hierarchy that starts with unit testing through to user acceptance testing. Change Management now decide whether to accept the change; then, it’s scheduled to move into production. This has historically been done by taking the system offline at a quiet time, such as midnight on Saturday to 5 a.m. on Sunday.
While this seems a proper and orderly way to implement change, it is full of systemic risk:
- If the changes are being applied to one of the bank’s legacy suites, the application is very large and complex. Patches and modifications have been made over the years, and, if the bank is very lucky, there are still one or two “heroes” who have a thorough understanding of the application. Chances are, there is no one who entirely understands the application.
- The time lapse from original specification of requirements to delivering a finished product can be multi-year, by which time the requirement has changed.
- In the time that has elapsed, at least some of the original team have moved on; no amount of documentation can replace the tacit information gathered during development
- Even the team members who have been on the project since inception may have been waiting for three months for the change and have forgotten much of the rationale behind what was developed.
- There are always a few “minor” changes that are piggybacked into the main request when deployment is infrequent. These add high risk because their impact on the original system change is often not understood, and comprehensive integration testing was not done.
- The test environment is hardly ever a mirror image of production. Also, with complex systems, different results can happen, even where the test and production environments seem identical.
- Because changes are only permitted a few times a year, the capability to implement the change without a hitch is something neither Development nor Ops are practised in.
- A poor understanding of the system architecture will result in a delay in trying to identify the root cause of any errors and probably a roll-back, because there is not enough time to fix what ails the application. It must be restored to the version before the changes were applied, if possible, and implementation must be rescheduled after the repairs and testing have been done. This is not always managed in the scheduled timeframe.
When one considers these risks, it is no wonder that deployment creates anxiety and tension among all involved.
Banks are agile, too
It must be stressed that this approach is usually limited to the legacy core banking systems and that most banks have different approaches to customer-facing products such as internet banking and mobile apps. These are developed using agile techniques and may even operate under different change management processes. Some banks choose to outsource this work; others build in-house capabilities, but in both cases the bank’s IT marches to different drums. There is a general recognition that the core banking systems have to be replaced in order to have a unified and stable IT infrastructure, but this is not an overnight job, mainly because it involves a major overhaul of all the processes involved.
Agile is not enough. Introducing devops
Even if a bank has managed to unify their IT development, they need to take the next leap to continuous delivery. Agile works up to actual deployment; then the wheels come off, because the problem now moves to Operations, who have largely been excluded from the development process. The merging of Development and Operations (not forgetting Security) creates an environment where continuous development, integration and deployment become viable. This is how Amazon, Facebook and other websites that do not have the luxury of downtime operate. They are able to achieve multiple deployments per day of which their customers are totally unaware. While there is a great deal of automation required to achieve this, devops is predominantly a culture shift.
Banks that still want to be around in the next ten years have to mature to a devops culture. The challenge is to introduce the new way of thinking, which is best managed by bringing in Devops-as-a-Service to gradually implement the new processes and instil the culture change. Where the banks have taken the plunge, they have achieved remarkable breakthroughs.
Some banks that are changing
There are banks that have taken up the challenge across the globe.
In 2011, ING Bank had a well-established CMMI and ITIL environment, using Prince2 for project management, but were experiencing instability despite all the governances of these best practices. They gradually moved to Agile, with mixed results initially, and then introduced devops, with spectacular results. They were able to implement more changes by introducing Agile, but still encountered the same number of incidents. Their implementation risk was still high until they introduced devops.
What was more impressive was that, although the number of incidents stayed the same, because the frequency of changes increased, the number of incidents per change decreased dramatically. What ING do emphasize is the scarcity of the best skills for the job, and how important it is to acquire them.
Otkritie FC Bank, one of Russia’s largest financial corporations, optimized their internal systems and improved business processes by shifting to devops. Their traditional infrastructure was typical of most banks, with quarterly releases, and a major outage compelled Otkritie FC Bank to make the change. They took the plunge, changing their business processes, focussing on automated testing and a robust approach to changing old habits. They achieved remarkable success in transitioning into a digital bank. Kirill Menshov, VP and IT Director of Otkritie FC Banking Group attributes much of their achievement to automation of testing and deployment.
In Singapore, DBS is a major bank who also experienced a massive outage in 2010. Although IBM admitted liability for the failure, this did not reassure the bank’s clients. It also did not impress the Monetary Authority of Singapore, who instructed DBS to get their house in order. Seven years later, DBS has made the shift to devops, and are optimistic of a brighter and more stable future, both for their 4.5 million customers and themselves. Their focus is on microservices, which will help them deploy faster and better, and they are obviously doing something right, as they were recognised at the world’s “Best Digital Bank” by Euromoney in 2016.
What’s next: Google and SRE (Site Reliability Engineering)
Devops is almost a decade old now, and ten years is a long time in IT and technology in general. Google state that they have moved beyond devops to SRE and have published a book describing what SRE is and how to get there. For those banks still struggling with their core banking systems, SRE must seem as attainable as a distant planet, but once they are confident in their devsecops capability, they can aim for the new target.