10 Secrets to Troubleshooting Technology Problems

Years of experience has revealed some key approaches to resolving problems like network outages.

By John D. Halamka
Mon, October 04, 2010

Computerworld — I recently joined my team in troubleshooting a complex infrastructure problem affecting the private cloud that hosts our community electronic health records system. The incident put me in mind of the things I have learned from such experiences over the years.

15 Internet Annoyances, and How to Fix Them
Standardize This! 10 Technology Messes That Need Fixing

1. Once the problem is identified, ascertain the scope. Call the users and ask them what they are experiencing. Test the application or infrastructure yourself. Do not trust the monitoring tools if they indicate all is well but the users are complaining.

2. If the scope of the outage is large and the root cause is unknown, raise alarm bells early. It's far better to make an early all-hands intervention with occasional false alarms than to intervene too late and have an extended outage because of a slow response.

3. Bring visibility to the process by having hourly updates, frequent bridge calls and multiple eyes on the problem. Sometimes technical people become so focused, they do not have a sense of time passing or insight into what they do not know. A multidisciplinary approach with predetermined progress reports prevents working in isolation and the pursuit of solutions that are unlikely to succeed.

4. Although frequent progress reports are important, you must allow the technical people to do their work. Senior management feels a great deal of pressure to resolve the situation. However, if 90% of the incident response effort is spent informing senior management and managing hovering stakeholders, then the heads-down work to resolve the problem cannot get done.

5. Remember Occam's razor: The simplest explanation is usually the correct one. In our recent incident, all the evidence pointed to a malfunctioning firewall component. But all vendor testing and diagnostics indicated the firewall was functioning perfectly. Some hypothesized that we had a very specific denial-of-service of attack. Others suggested a failure of Windows networking components within the operating systems of the servers. Others thought we had an unusual virus attack. We tested the simplest explanation by removing the firewall from the network, and everything came back up instantly. It's generally true that complex problems can be explained by a single simple failure.

6. It's very important to set deadlines in the response plan to avoid the "just one more hour and we'll solve it" problem. This is especially true if the outage is the result of a planned infrastructure change. Set a backout deadline and stick to it. This is similar to when I climb or hike; I set a time to turn around. Summiting is optional, but returning to the car is mandatory. Setting milestones for changes in course and sticking to your plan regardless of emotion is key.

Continue Reading

Originally published on www.computerworld.com. Click here to read the original story.
What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
As you know, everything is mobile, connected, interactive, and immediate. This is exactly why organizations need a highly agile IT infrastructure in order to keep pace with extreme fluctuations in business demand. This book will help you understand why infrastructure convergence has been widely accepted as the optimal approach for simplifying and accelerating your IT to deliver services at the speed of business while also shifting significantly more IT resources from operations to innovation.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
This white paper describes the major requirements for network management solutions to help the organizations become more profitable, efficient and reliable.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
Enterprises are turning to the Cloud to improve business agility, reduce expenses and accelerate business innovation. Cloud computing redefines the way IT assets are deployed and consumed and dramatically affects the way data center networks are architected and managed. Conventional hierarchical data center networks built to support traditional IT architectures can't meet the security, agility and price/performance requirements of virtualized cloud computing environments. This white paper reviews the impact of cloud computing on data center networks and describes HP's approach to building simpler, more secure and automated networks that fully meet the stringent performance, security, reliability and agility demands of the new data center in the Cloud.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
When AlertBoot switched to the cloud it needed a load balancing solution that would support its migration and prevent as much downtime as possible. The company chose Riverbed® Stingray™ Traffic Manager to use while transitioning its infrastructure to an entirely virtualized environment. The move was a complete success, at one-third the cost of comparable hardware solutions.
Online airline and travel group Meridiana fly needed a faster, more cost-effective way for its growing customer base to book reservations online. They turned to the Riverbed® Stingray™ Traffic Manager, which ensured a fast, responsive website that could cope with increasing high-demand. The company's pages now load much faster, and downtime is a thing of the past.
With over 5,000 requests per second during peak periods, online retailer Gilt Groupe could lose a large percentage of its daily profits in just 10 minutes of downtime. After choosing the Riverbed® Stingray™ Traffic Manager as its load balancing solution, visits to the site have increased thanks to improved customer satisfaction. Real-time traffic views and tracking also make it easy to strategize and plan for the future.
Supply chains require the ability to connect and share information with vendors and partners globally. EDI networks have made this connection possible by allowing various entities to upload information for others to see.
Today's workforce is truly mobile. At the office, from customer sites, even at home or in a hotel - their connectivity and application performance needs remain the same. But even though their requirements don't change, the challenges in meeting their expectations do.
Traditional communication methods are no longer sufficient to meet the pace of business today. Video Conferencing is an essential business tool. Dimension Data is revolutionizing the process of doing business and making video conferencing fast, simple and affordable.
Learn how Expedient, a cloud provider, is using 10 Gigabit Ethernet to boost its services and rein in costs.
As greater numbers of datacenter servers transition from the physical to the virtual world, the components of virtualization success come to the fore. What scores of organizations have discovered is that success is derived from an optimal pairing of the right software platform with the right hardware platform.
Business users increasingly demand 24x7 availability of their data while IT departments face the challenge of ensuring maximum availability while operating with limited budgets.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Sponsored Links

Master the cloud with the power of convergence from HP

Connect with IT leaders redefining mobility at the Enterprise Mobile Hub

Choose New and manage one device instead of 170

Choose New for 8x the firewall and NAT performance

Check out a smart way of mobilizing your business with enterprise-ready Samsung Mobile.

Redefine your data center with HP servers.

Enhance your business with Windstream IT Solutions. Speak to someone local.

BlackBerry® Mobile Fusion. Different mobile devices. One platform.

Click to see how Accenture has delivered high performance to clients

CYBERMARYLAND | Learn Why Maryland is the Epicenter for Cybersecurity

Get Ethernet speeds from 1 Mbps to 10 Gbps - Comcast Business Class

Cognizant. Leading in Business, Application & Technology Services

Collaboration: driving better business outcomes

Gain cutting-edge insights at MIT in 2-5 day executive programs.

Complimentary Gartner Report on BYOD: Media Tablets & Beyond. View Now

Elevate storage agility and efficiency with HP 3PAR storage.

Choose New and slash the number of devices you manage

Customized information views & Twitter events at New Fulcrum Point

Splunk translates machine data into "aha" moments for IT and the business.

ManageEngine Desktop Central - Automate and Audit Your Desktop Management! Learn More...

Cloud Readiness Starts with Intel® Technology

High performance. Delivered. Click to see Accenture's client successes

Visit the Virtually There Learning Page to learn how to use virtualization to your competitive advantage.

Free: Hunter Muller's "The Transformational CIO."

Join us for an upcoming Microsoft 365 live online demo event.

Discover your easiest path to unified communications

Virtualizing Your Infrastructure Just Got Easier

Connect with global CIOs now at Enterprise CIO Forum

Resource Center