IT Troubleshooting: Quickly Identifying and Solving Software Bugs

No software is perfect--who hasn't had a user uncover some hidden flaw--but these tips will help you debug efficiently.

By Dennis Jones
Wed, May 14, 2008

CIO — Nearly every IT project manager, designer, DBA and developer wants to build the perfect software application: the seamless union of hardware and software, intuitive and robust, with eye-popping performance and rock-solid logic. While this pinnacle is difficult to reach&emdash;and flaws will be found—there are steps you can take to resolve them more quickly.

Countless hours can be spent gathering requirements, creating meticulous database and program design, and utilizing the very latest development tools and techniques. We can employ a seemingly endless array of unit, system, integration and regression test scripts, along with the finest implementation and training plans and procedures. Yet all of this massive effort and intent is simply no match for the one entity that reigns supreme when it comes to finding and exposing the most well hidden bug: the end user. Our customer. We might as well face the fact that no matter how many hours are spent bulletproofing code, end users are going to find problems. The tips below will provide the developer or technical support person with methods to quickly identify, verify, isolate and ultimately resolve such technical challenges. (Also read Seven Free Tools for PC Geeks--and One Quick Tip.)

Oh *#@$&!, We've Got a Production Problem!!

The words we hate (but are destined) to hear at some point. What to do? First things first, there must be a procedure in place to allow the user to properly describe and document the problem. Every production application should have a central help or support desk to be contacted when user issues arise. The help desk personnel are critical components to a thorough and complete resolution. As such, they should be functional experts on the system being supported, so they can interact intelligently with the users. They must obtain information and documentation on the entire issue, not just the error condition or message that the user ultimately received. What were all the steps taken? A screen print of any error messages should be obtained. These can prove invaluable when a developer is trying to piece together exactly what portions of code have been executed, and in what order. Users can sometimes leave out details that might be second nature to them, and a screen print may point out these details to the support person.

To learn more about process improvement and workflow, see ABC: An Introduction to Business Process Management and Workflow Gone Wrong.

Lucky You, This Is Now Your Issue to Resolve!

Hopefully it's not a Friday afternoon where you have a lot of documentation to go through. Do that, and request confirmation of any ambiguities. You need to know exactly what steps were taken and the exact verbiage of any error message(s). Here's where all of the time spent coding and testing that pesky error handling logic in your application can pay off. Thorough error handling logic is sometimes overlooked as a necessary part of an application. However, it is extremely important because more often than not, when an error condition does occur, it will be at a critical juncture and will need to be diagnosed and corrected quickly. In your application, you should be able to anticipate most error conditions, and thus handle them gracefully. Do so with a nice message to the user gently telling them how they, (and not your robust application), have somehow made an error. Do not be naive enough however, to think that errors you code for or handle will be the only ones that occur. You must also have "catch all" error logic to handle unexpected errors. Example: You can easily code error logic to inform your user that no records could be found based on some search criteria they entered. But what will happen, say, if the database goes down just as the user hits the "search button? Or, what if there is a power outage while your program is in the middle of saving records? How about when the user presses Ctrl-Shift-F8 while creating a new record, inserting a disc, and playing Solitaire in another window? You can bet on the fact that nearly every conceivable keystroke and concurrent program combination will eventually be attempted by your users. Your error-handling and commit logic must work together to not only capture information relative to error conditions encountered, but also preserve the integrity of your data in such events. (Also check out the podcast 20 Top Tips for Software Testing

It is an excellent idea to have a common error-handling routine that writes to an error log. Your "catch all" error logic can call this routine whenever unexpected errors occur. In this error log you can record the exact date and time, the name of the offending program and/or any subprograms, any pertinent record names or ID's, and any error codes and text generated by either your application programs or by the database. Bottom line: Provide the user with an error message that means something to them (i.e., "An error has occurred while processing this record. Please contact the help desk immediately."), and provide the support person (via the error log) with information that means something to them. All of this information is necessary because your initial goal is to recreate the error condition in your development or test environment. In general, error conditions must be recreatable in order to be correctable. Many times, error conditions that cannot be recreated are a result of a user who has forgotten some of the steps that were taken, or other circumstances that were present. These are the dreaded one-time problems that mysteriously go away by themselves. Guess what? Just as mysteriously, they tend to reappear by themselves at a later date, often with the same user. If you can speak directly to the person who received the error, do so. Go over all steps that led to the error. If possible, visit the location to examine the software, hardware and data. If a site visit is not feasible, an export of the user's data can be very beneficial to resolving errors. Ask some additional questions. What other processes were running when the error condition was encountered? Any other unusual circumstances present? Were multiple error messages received? It is sometimes difficult to get the entire story, especially if the user feels that they have somehow made a mistake in the process. By being sensitive to this, and communicating your sincere desire to help, you can usually get all the details of the event.

Continue Reading

With 1.5 billion instructions in one second (BIPS), while consuming less energy than ever before, Wintergreen Research says IT departments need to sit up and take notice of this hybrid system that combines the System z with servers.
Learn how your answer to this question compares to your peers by taking this quick poll. See how your peers are dealing with the challenge of ensuring a highly capable server infrastructure as technological shifts impact the application server platform.
With increasing data growth, comes increased need for data security.  The existing DLP model, with a focus on compliance/enforcement is not sufficient as the data discovery and classification capabilities are not granular enough.  Read this paper to find how you can efficiently and accurately manage your risk by rapidly inventorying and classifying your data and then developing remediation workflows that support business needs. 
This paper breaks down attack sources into four categories: external, malicious insiders, accidental insiders, and unknown.
The rapid growth of data and technology is creating challenges for organizations as this digital data is considered to be business communications and must be preserved according the same industry-specific regulations governing the retention and discovery of emails and more traditional forms of electronic communications. This paper examines the role that Data Loss Prevention ("DLP") technology can play in helping organizations address the challenges of locating information in response to electronic discovery.
This research, conducted by the Ponemon Institute, focuses on issues relating to the use of data protection solutions such as endpoint encryption and data loss prevention within the workplace.
As greater numbers of datacenter servers transition from the physical to the virtual world, the components of virtualization success come to the fore. What scores of organizations have discovered is that success is derived from an optimal pairing of the right software platform with the right hardware platform.
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn about VMware customer, Navicure, and their experiences testing and evaluating the recovery manager, their progress in implementing it in their environment and their advice other customers considering using vCenter.
Many enterprises have discovered that the use of virtualization to support desktop workloads creates a range of significant benefits. These benefits include price efficiencies, improved IT management and greater agility and choice for end users.

This VMware sponsored webcast with IDC will provide both quantitative measurement of the business value -- defined as the expected ROI -- and qualitative analysis associated with the use of VMware View™. IDC will also provide an analysis of the View Composer and ThinApp™ features of VMware View, including the business value of these solutions and an overview of how they work.

Attend this webcast to learn about:
- Challenges and barriers that might impede the adoption of desktop virtualization
- Navigating roadblocks to facilitate a strategic implementation
- Optimizing qualitative and quantitative benefits to IT and your business
VMware recently announced VMware vFabric™ Data Director, a new database deployment and operations platform that enables enterprise IT organizations to offer database as a private cloud service. Built on top of VMware vSphere 5, vFabric Data Director enables IT organizations to ontrol database sprawl through automation and consistent policy enforcement and accelerate application development cycles with self-service database management. Attend this webcast to learn how vFabric Data Director can help you build database-as-a-service in your datacenter.
A simple, cost-effective disaster-recovery solution for virtual environments is high on the agenda for IT organizations as they virtualize more business-critical applications with VMware. VMware vCenter™ Site Recovery Manager-the market-leading disaster-recovery product-ensures the simplest and most reliable disaster protection for all virtualized applications. VMware vCenter Site Recovery Manager provides centralized management of recovery plans, enables nondisruptive testing and automates site-failover processes.
Traditional disaster recovery solutions are often too expensive, complex and unreliable to meet business requirements. As a result, IT departments are hesitant to expand disaster protection beyond their most critical applications, largely because they are uncertain whether the quality of the protection is really worth its cost. VMware vCenter™ Site Recovery Manager 5 is the market-leading disaster recovery product that addresses this situation for organizations of all kinds. It complements VMware vSphere to ensure the simplest and most reliable disaster protection for all virtualized applications.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Sponsored Links
Resource Center