In your rush to resolve incidents, don’t forget the most important element: communication. When one of your IT services is on fire there is no time to waste. Especially if that fire is blocking your users from getting stuff done. Rapid resolution tends to eclipse all else during an incident, often causing your team to ignore or forget pieces of the incident response process – like keeping people in the loop. It’s one of those little problems that compounds into a big one if not handled correctly. Pretty soon, you’re stuck in an endless loop of shoulder-taps and email threads, trying to explain to the CEO why things went wrong. While there’s no shortage of tools to help your team detect, alert, swarm on, and resolve incidents, even the best tools can’t replace clear communication to internal and external stakeholders. And let’s be real: The stakes can be high, very high. Reputation, customer attrition, time spent on damage control, just to name a few. Luckily, downtime doesn’t have to turn into a customer service nightmare. Informed users are happy users. But first you need to know who to communicate to, how to reach them, and how to do it with the least friction and fewest resources possible. Communication during times like this is like ripples from a rock tossed into a pond. The circles closest to the incident get the biggest, most frequent and most immediate feedback. This is your core on-call team – AKA the folks who need to identify and fix the problem. It’s a small circle, but the ripples (communication) need to be big, immediate, and frequent. As you move further from the core circle — to adjacent IT teams, managers, the organization as whole, end users and the general public — the audience gets bigger, but the ripples get smaller and less frequent. While every organization is different, in general it helps to think of these audiences as 5 distinct groups that need to be communicated with: Core on-call team: The first to know something is wrong, almost immediately upon impact (usually from monitoring and alerting tools). Front-line support team: Those who will be directly answering questions and giving customers updates during the incident. It’s an incredibly important role, so this team must get the right information to pass along to end users. Managers and executive team: The core team needs to communicate with this group so they know what’s going on, the potential impact on the following two groups, and hopefully an estimate of how long it could last. General employee population: Employees need to be kept informed as services they rely on go down and up. Proactively communicating with these users means less “what’s the status of this” questions, fewer duplicate IT support tickets, and more focus to fix the problem at hand. External customers: If the incident affects external customers some communication must be sent out to explain the problem and when they can expect a fix – or at least an update every nth amount of time. For issues that are still currently affecting your customers’ ability to use your product, we recommend never going more than one hour without sending an update. You should also always indicate when to expect the next update. If it is a severe enough incident – especially one involving security or data loss – you will definitely want to expedite external comms and pull in the necessary other teams (legal, HR, security, etc.) xMatters and StatusPage are tools that have an interesting intersection between integrating solutions across your technology stack and then communicating status information out to drive workflow. With some of the biggest cloud companies as customers, we’ve seen how the highest performing IT teams are resolving incidents more efficiently while keeping users happier through a solid incident communication plan. Creating your own incident communication plan: Before an incident: Define priority/severity levels (how many users are affected, how long the incident lasts, etc.) Create incident templates for common issues to save time between detection to communication Document defined roles during an incident (how to identify the incident commander, who owns the communication, etc.) Determine how to communicate with affected users (what channels will be used for each priority level, etc.) During an incident: Communication with first responders: Alert those “on-call” and make sure they know where to go for more information about the problem. A tool like xMatters can help drive resolution by relaying data between systems while engaging the right people. This way, you never have to worry about keeping your technology infrastructure aligned with key resolution processes. Communication with affected users (both internal and external) and other stakeholders (i.e. executives): Use your pre-determined channel(s) to tell users what’s going on. This may be e-mail, a blog, Twitter, or a status page where they can subscribe to notifications about services they care about most. Whatever tool you choose to use, we recommend that you identify one as your primary communication vehicle and funnel everyone there from the other channels. For example, we have a dedicated status page but we also tweet out updates and display a notice in our webapp during downtime. The tweets and in-webapp notices funnel users back to the status page for the full story. After an incident: Hold a retrospective on the incident and figure out what (if any) post-incident comms are necessary — as well as what you can do to prevent similar incidents from happening again. If necessary, send out your postmortem to affected users. A good postmortem can actually generate a lot of goodwill with your customers. Ideally it will enable you to: Apologize personally Explain exactly what happened and how your team was able to fix it Talk about your plan to avoid a similar situation in the future Even 99.99% of uptime means 52 minutes of downtime a year. Every IT team should be prepared for those 54+ minutes. Providing legendary service isn’t just about resolving incidents quickly – it’s also keeping users informed while you do. Learn more about using xMatters for IT alerting and StatusPage for IT incident communication and see how they can work together to increase transparency. Related content brandpost Sponsored by Atlassian Credibility Wars: 5 ‘Soft’ Skills for ‘Hard-Nosed’ IT Professionals For greater credibility, we must recognize and develop u2018softu2019 skills in u2018professionalismu2019, distinguishing ourselves from the robots that are coming to take our jobs.rn By Barclay Rae Jun 15, 2017 5 mins IT Leadership brandpost Sponsored by Atlassian The best incident management is value-driven – here's why We all want to deliver world-class service, resolve issues faster, and build lasting trust. Each is rooted in strong incident management values. By Patrick Hill Jun 09, 2017 6 mins Cloud Computing brandpost Sponsored by Atlassian The 5 secrets of high-performing IT teams High performing IT teams are all about helping their businesses succeed, thanks to automation and laser sharp focus on the end goal. By Sidharth Suri May 30, 2017 6 mins IT Leadership brandpost Sponsored by Atlassian ITSM vs. DevOps: Which Side Are You On? Donu2019t choose between ITSM and DevOps; embrace both disciplines for greater flexibility, agility, and control.rn By Barclay Rae May 15, 2017 5 mins Cloud Computing Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe