IT is synonymous with business operations for just about any company of any size. So when tech goes down, the company can go down with it.\n\nIT failure, whether it\u2019s a complex system or project, is increasingly shooting to the top of the business news section, where its impact can become even more detrimental \u2014 and embarrassing.\n\nWe\u2019ve gathered eight of the biggest tech crises of 2021 to spotlight the kinds of near catastrophic IT issues that can not only arise but have an outsize impact on your business. Beyond schadenfreude, we hope these tales of IT disaster have lessons for you, even if your organization is nowhere near as big or the stakes aren\u2019t as high as some of the protagonists from these tales.\n\nWhy you should design better UIs (and not make your creditors mad)\n\nMany companies tend to take an \u201cif it ain\u2019t broke, don\u2019t fix it\u201d attitude toward their IT tools, and if you\u2019ve ever been part of a botched upgrade or rollout, you know why. But that can result in some truly outdated systems in production use with UIs dating from the earliest days of the software industry \u2014 which in turn can mean usability problems with real-world consequences.\n\nOne of Citibank\u2019s back-end systems is a good example of this trend, and is one of the main causes of a half-billion dollar screwup. The story goes like this: Citibank was attempting to send a $7.8 million interest payment on behalf of Revlon, one of its customers, to several of Revlon\u2019s creditors. Doing that in Flexcube, an ancient piece of in-house Citibank software, was a particularly clunky process: Citibank\u2019s employees had to set up a transaction as if they were paying off the whole loan so that the interest could be calculated correctly, then check multiple boxes in order to send the bulk of the payment to an internal Citibank account while only the interest portion went out to creditors. Despite the fact that three different people signed off on this transaction for Revlon, it went through without all the proper boxes checked, and $900 million, most of which wasn\u2019t due to creditors until 2023, was sent out.\n\nYou may find it surprising that this sort of mistake isn\u2019t unheard of \u2014 and that the benefitting party usually returns the money sent in error back to the company that made the goof. But this time around things went differently: More than half the money sent out went to various hedge funds still bitter that the terms of the loan had been previously renegotiated to Revlon\u2019s advantage. They said they regarded the money as an early payment of the debt they were owed, and this year a judge ruled that they didn\u2019t have to give it back.\n\nThe big lesson here is to at least modernize your UIs to ensure employees can perform their duties in a streamlined, coherent fashion \u2014 and that it can be less painful to make mistakes if people aren\u2019t mad enough at you to take advantage of it.\n\nSacre bleu! French bank customers see each other\u2019s accounts\n\nCustomers of the French bank LCL logged in to their banking app on Feb. 23 only to find that they were looking at someone else\u2019s information. The word quickly spread on Twitter and many speculated that this could have been the result of a cyberattack. But according to the bank itself, it was actually the upshot of a software error that was corrected within a day.\n\nOf course, these sorts of development mistakes are a sign of internal failures at the companies where they occur, and they especially shouldn\u2019t occur in the banking industry. The fallout illustrated the typical dance that follows on from these kinds of mistakes, with the company at fault minimizing matters: LCL said that no personal information was revealed, that customers could only see other customers\u2019 accounts but not transfer money, and perhaps only a few hundred customers were affected. Others pointed out that transaction information could\u2019ve been used to suss out customer identities, and potentially tens of thousands of users were logging in while the bug was running on live code. In the end, LCL had to scramble to avoid a massive fine from European privacy regulators.\n\nWhen software keeps the cell door locked\n\nIn 2019, the Arizona Legislature passed a law to allow certain prison inmates convicted of nonviolent offenses to complete programming in state prisons that would accelerate their release. But whistleblowers in February revealed that, more than a year later, the software that keeps track of prisoner release eligibility still hasn\u2019t been updated to accommodate the new law. While the state insists eligible prisoners can and do have their sentences recalculated manually, the truth is that many may not know they\u2019re eligible for release, or don\u2019t have advocates on the outside to press their case, and so are languishing in prison when by law they have the right to go free.\n\nThere are several lessons for IT here. One is the importance of building flexibility and extensibility into any system. Another is that software isn\u2019t just software: It has real and profound impacts on human lives. Finally, there\u2019s the question of how law can be implemented in the form of code \u2014 and whether the algorithms for enforcing the law should be developed during the legislative process rather than being left to be written after it\u2019s already on the books.\n\nMaine\u2019s ancient HR system limps on\n\nThe state of Maine\u2019s HR and payroll is, as the Portland Press Herald describes it, run by a \u201c40-year-old system programmed in an obsolete language only one state employee knows how to use.\u201d The system had already outlasted a 2016 attempt to replace it that flopped; another attempt, which was supposed to wrap up in 2020, imploded in mutual acrimony this past March, as Workday, the company hired to roll out a new cloud-based system for Maine, walked away from the project.\n\nRollouts of ERP systems and similar platforms are notoriously disaster prone, and Maine\u2019s payroll needs were devilishly complex (state police were paid differently hourly rates if they carried a weapon, worked with a K9, or wore scuba gear, for instance). At the core of the dispute is a story that should sound familiar to anyone who\u2019s been involved in a big project like this: Maine says that the system came online with a 50% error rate, and Workday said Maine\u2019s data as imported into the system was hopelessly riddled with errors. More fundamentally, it seems that Maine was hiring staffers to work on the project who didn\u2019t have the needed skills, and the state wasn\u2019t willing to pay enough to find workers who could make the grade. Throw in some accusations of nepotism and sexual harassment and you have a real IT management mess. Maine is still using its 40-year-old HR system.\n\nAmazon\u2019s leave problems\n\nIf your takeaway from those previous two items is that government is incapable of competent project management, we regret to inform you that a not dissimilar crisis came to light this year in a private sector company \u2014 and not just any private sector company, but Amazon, the archetype of the hyperefficient new economy that IT and the web made possible.\n\nA New York Times investigation revealed that Amazon\u2019s internal processes for offering various types of leave to its employees are extremely broken. This has resulted in a litany of horror stories affecting white and blue collar workers alike, such as employees being fired for not showing up to work even though they\u2019re on approved leave, new mothers on maternity leave seeing mysterious cuts in their paycheck, and an injured worker on disability forced to sell his wedding ring for cash because his checks simply stopped showing up.\n\nIt turns out Amazon manages its leave system using multiple software products from a variety of vendors, a legacy of its rapid initial growth, so perhaps the lesson here is that the choices you make early in a company\u2019s history may reverberate years or decades later. Like the Arizona prison system, Amazon tries to make up for IT dysfunction with human labor: 67 full-time employees are dedicated to inputting data on employee leave, a job so stressful that many end up needing to take leaves of absence themselves.\n\nEating too much of your own dog food\n\nOn Oct. 4, people all over the world were unable to access Facebook, Instagram, or WhatsApp, as all the services run by the company now known at Meta were disconnected from the internet. We won\u2019t get too deep into the actual cause of the crisis, which involved an error in the Border Gateway Protocol essentially severing Facebook services from the rest of the internet\u2019s DNS system. Instead, we want to focus on one detail that might be relevant to any IT shop, even those that aren\u2019t part of one of the largest tech companies in the world.\n\nEarly in the outage, New York Times tech reporter Sheera Frenkel reported that Facebook employees couldn\u2019t enter company HQ because their ID badges no longer opened the doors. This in turn prevented techs from getting physical access to the servers they needed to fix the overall problem. Improbably, Facebook\u2019s electronic door locks were powered by ... Facebook. It seems that Facebook is rather obsessed with running all its internal systems on Facebook\u2019s own infrastructure, which meant its in-house communications system was also down and unable to deal with the crisis. The industry term for a company that does this is \u201ceating its own dogfood,\u201d and it\u2019s generally seen as a vote of confidence in your own products, but Facebook\u2019s disaster goes to show that you need a backup food supply handy.\n\nA lurking bug takes down Fastly\n\nOn June 8, millions of Internet users trying to access sites ranging from Reddit to important UK government departments found themselves confronted by 503 error codes, indicating that the server hosting the website wasn\u2019t able to handle the request. (Twitter was still working but, tragically, it could no longer display emojis.) How could so many different sites go offline at once? The answer, it turns out, is related to the rise of content delivery networks, which deploy proxy servers at strategic points across the internet for their clients to ensure superfast load times. Nearly every big content site uses CDNs these days, and there aren\u2019t that many players in this space, and so when one goes down, it can lead to a big chunk of the internet going with it.\n\nIn this case, the single point of failure was Fastly, an edge computing provider with a booming CDN business. Fastly rolled out a software update on May 12 that included a bug that could be triggered by a specific customer configuration under just the right conditions. On June 8, a customer unwittingly updated their configuration and caused a crisis that lay at the intersection of software development and industry consolidation.\n\nShooting the messenger\n\nIn October, a reporter from the St. Louis Post-Dispatch, working with security expert Shaji Khan, discovered that a website that allowed the public to search teacher\u2019s certification and credentials also inadvertently revealed those teachers\u2019 Social Security numbers. While the numbers weren\u2019t actually displayed on the search results page itself, they were in clear text in the HTML for the page, making them trivially easily to find. The Post-Dispatch informed the state education department about the flaw before the story was published, giving them time to correct it, and if matters had stood there we probably wouldn\u2019t be talking about this story now.\n\nBut two days after an Education Department spokesperson started crafting a (never sent) statement thanking the media for bringing the matter to their attention, the governor publicly accused the paper of hiring \u201chackers\u201d to embarrass him and the state government and promised to launch a criminal investigation. After doubling down, he faced backlash and ridicule, including blowback from members of his own political party, and we definitely are talking about the story now. So maybe the lesson here is that how you deal with the fallout from an IT disaster is almost as important as the disaster itself.