sponsored

20 of the Most Painful Lessons IT Pros Had to Learn the Hard Way

By David Spark

didata 7 23

You can’t become a great IT pro without breaking a few metaphoric networking eggs. It would be great if all these mistakes happened in a sandbox environment. Unfortunately, they often happen during business operations.

While learnings from these mistakes are invaluable, making too many of them does not bode well for the future of your job or business. Alternatively, if we could collect the knowledge gained from these costly mistakes, we’d all be wiser. The only cost to you would be the time it takes to read this article.

1: Complex systems are expensive to operate

“If systems don't have to be complex don't make them complex or you could risk not having the necessary staff to fix things,” said Nestor Rincon (@RinconDynamic), Founder of Rincon Dynamic, who became the single point of failure for a custom network he designed.

Ian Rae (@ianrae), CEO for CloudOps, concurs, “Avoiding network complexity has a huge operational payoff.”

“If you are the only one in the world to use a particular combination, it means that your headaches are also very unique,” said Michael Bushong (@mbushong), blogger and VP of Marketing for Plexxi. “While it might seem expensive to modify your practices to suit an architecture, think about the risk of being a snowflake when everything is on fire.”

“In order to meet the industry needs of today, enterprises need to move from traditional legacy networking that is manual, requires device-by-device configuration and complex, to a network that is more simple, agile and automated,” advised Kash Shaikh (@KashShaikh), Global Marketing Leader for HP Networking.

2: Complex systems will be replaced

“The temptation for a lot of people is to tune infrastructure for very specific behaviors,” said Plexxi’s Bushong. “Every time you add helpful but not quite necessary things into your network, you add complexity… At some point, the whole thing comes crashing down.”

“Custom server hardware is eventually replaced by commoditized consumer hardware re-purposed as servers, which in turn is eventually replaced by virtual servers,” said Dwight Koop (@dwightkoop), Cofounder and COO of CohesiveFT.

3: Lack of pre-planning and testing will bite you

“It is just so easy to skip the planning, testing, and architectural alignment work to make quick changes or upgrades particularly when you managed to get away with it maybe 75 percent of the time, or at least appear to get away with it,” said Rich Schofield (@DidataInsights), Business Development Director, Networking for Dimension Data. “However, when things go bad, they go really bad.”

Schofield realizes that most of us think the disciplines around change and release management are overdone, but when things go wrong, they go really wrong. 

“A disciplined approach, followed with every change and release, is key to a well-run and cost-efficient network,” said Schofield. “Next time you are planning for changes or releases and you find yourself thinking, ‘this is such a pain,’ remember the pain when things go wrong.” 

4: Don’t underestimate the complexity of running a network remotely

For years, North Coast Security Group had managed networks remotely through secure UTM appliances. With little to no on-site visits to resolve issues, they felt they were ready to handle everything remotely, so they started bidding out-of-state contracts, explained Hassan Abdul-Zahir (@northcoast_sg), North Coast Security Group’s Cofounder and CTO.

While Abdul-Zahir thought he could just ship UTM appliances and have his clients install them on site, the clients ended up moving equipment and changing ISPs without telling the North Coast team. Not being there resulted in far more reengineering, development, and partnerships to make the remote operations possible.

“We vastly underestimated the human component in the contingency plan and that was the hardest lesson we have learned thus far,” said Abdul Zahir.

5: The current design isn’t going to last

“My most painful lesson is succumbing to the temptation to believe that current design will suffice for a period of time longer than what is realistic,” admitted Tom Fountain (@TomFountain9), CTO of Pneuron. “The quicker, cheaper path is to assume that current requirements will suffice rather than ensuring multiple years of support for what the business throws at IT via sound architecture, capacity planning, and built-in flexibility.”

“Design needs to take into account redundancy, which will allow an upgrade to happen without taking the environment offline,” said Isaac Conway (@Latisys), Director, Network Engineering for Latisys. “Actual usage on the new platform should be under 25 percent starting at day one. This will allow you to at least double in size before upgrade conversations need to happen.”

“New paradigms like Big Data and cloud applications are forcing the entire IT industry to rethink the importance of infrastructure flexibility and scalability,” said Tim McIntire (@StackIQ), CEO of StackIQ.

“Avoid network management systems that are so rigid that they restrict the enterprise from simple development, and fast time to market, for new applications and services,” said Bob Rodio (@ciena), CTO for Network Transformation Solutions at Ciena. “You don’t want to forsake available functionality because of management system limitations.”

6: Don’t assume

“Assuming can be the difference between success and failure,” said Adam Haines (@Adam_Haines), Director, Systems for Federated Sample.

Early in Haines’ career he encountered a networking issue that caused production servers to disconnect intermittently. The team went through a battery of tests, network traces, and wild theories as to what was going on, until Haines just looked at the switch and discovered a cable was creating a loop.

“I learned that day troubleshooting is a delicate art form and assumptions cannot be made,” said Haines. “Each and every aspect of the problem has to be considered.”

Rincon Dynamics’ Rincon agrees, “Always try the basics first and see if it fixes it.”

Rincon had a situation of a failed database that couldn’t cross the network. After four hours of troubleshooting, he looked at the firewall. Sure enough it was blocking the database.

“That was a bruise to my ego but it definitely made me a better IT professional,” admitted Rincon.

7: Backups are worthless if you don’t test them

“I learned the painful lesson that the false peace of mind that comes from running regular backups is worthless without testing that the backups are actually working as expected,” said David Reischer (@LegalAdvice), Information Systems Manager for LegalAdvice.com.

“It’s never enough to trust anything critical to a process that has a single point of failure,” warned Dean Wiech (@dwiech), Managing Director of Tools4ever.

Both Reischer and Wiech had to learn the hard way. After they had succumbed to data disasters that couldn’t be restored, they started verifying backups.

This advice also holds true for verifying backup support from vendors.

“Just because something is written in a contract, doesn’t always make it so,” warned Mike Vitale (@TalkPointDotCom), CTO for TalkPoint. “If your network vendors promise N+1 or 2N redundancy, make sure it’s being tested on a regular basis.  Don’t wait for a real emergency to find out your colocation facility or content delivery network has been cutting corners.” 

8: Vet consultants for interoperability and business alignment

“The wrong consultant can ruin your relationships and leave you with a smoking wreck to manage once they have completed their assignment,” said Ben Trowbridge, Chairman for the Outsourcing Center. “Make sure the consultant understands you and your company and can navigate both the cost savings/transformation you seek and also improve your working relationship with your carrier.”

“Despite vendors adhering to a standard, it's quite likely their products will not gracefully interoperate,” said Bernard Golden (@BernardGolden), VP of Strategy for ActiveState. “There's really no way around this except by diligent testing. The alternative approach is to require the vendors to demonstrate interoperability and then write it into the contract.”

“Look at the OEM vendor closely and directly before committing to an enterprise wide solution,” said Eric Ingram Adjunct Instructor at APT College, LLC. “Don't rely solely on the information provided by the distributors.”

Ingram admits to being burnt a couple of times due to OEM products losing support soon after purchase. If he had done just a little research into their background it would have shown they should not be relied upon for future networking needs.

9: Always be monitoring your performance efficiently

“You need to take pro-active steps to automate the alert processing. Or else you’ll find that all your best resources get sucked in to do the day-to-day firefighting and noise control, and that is counter-productive for the organization,” warned Raju Chekuri (@rajunetenrich), CEO of NetEnrich.

With the understanding that networking is a collection of various systems and network components that depend on one another, Bruno Scap (@MaseratiGTSport), President of Galeas Consulting, advises administrators to “configure your network monitoring in a way that follows these dependencies. This will decrease the number of alerts and increase their accuracy. For example, when a network device controlling a remote link fails, you will get notified that the remote network is unavailable, without getting additional alerts for all the other systems that are connected to that particular network device.”

“Having information [about network performance] makes the conversation with your provider much different, and much more productive,” said Matt Larson (@matthewhlarson), CTO of Dyn. “With the right tools in hand, it’s harder for a provider to deflect responsibility or resist taking action in the face of hard data.”

10: Just because you can’t see it, doesn’t mean it’s not out there

“We deploy monitoring and management systems, and we put faith in the results they provide,” noted Jay Botelho (@jaybotelho), Director, Product Management for WildPackets. “Because these systems are mostly accurate we are lulled into a false sense of security.”

What happens when NetFlow data transitions to a sampling mode because the dependent router needs the processing power to route packets? Our monitoring system isn’t collecting the necessary data and we have no idea it’s missing.

“Take the advice of a woodworker,” said Botelho, “Measure twice, cut once. Or in other words, for critical functions, monitor the same data in multiple ways, reducing to the greatest extent possible assumptions, both known and unknown, that are built into our management systems, and ourselves.” 

11: Your redundant systems will be broken

“Continuity planning, including regular failover testing of all systems, is as vital to your business' health as knowing your fastest exit out of a building in a fire,” said Jay Winters (@brinkmatdotcom), Director, Technology for delivery.com.

“Despite all your efforts to build in redundancy, a new situation will arise that is not handled by your redundant systems,” said Jason Lamb (@jasonclamb), IT Systems Operations Manager for Eliassen Group, who points to situations of degradation that don’t trigger redundant systems, yet weaken your office’s ability to operate.

“Engineers need to account for many types of problems, capacity and bandwidth requirements need to be reviewed proactively, and the act of rerouting traffic must be operationalized as much as possible,” explained Pat Harper (@OpenText), CIO for OpenText.

One common issue, an unexpected power outage, can wreak havoc on business operations given the amount of live data.

“Billable hour docketing and unsaved Microsoft Office documents instantly become corrupted files,” said Steve Prentice (@steveprentice), Senior Writer for CloudTweaks. “It can easily turn into a nightmare of downtime and person-by-person recovery.”

12: Crisis management training shouldn’t just be about putting out fires

“An over worked technologist who is completely focused on fighting the day to day fires of desktop support ends up neglecting patches and security updates from pure lack of time,” said Anthony Butler (@Anthonylbutler), CEO of Precision IT.

“The worst mistake for a technology executive is to build crisis management skills and to focus mostly on those to extinguish fires when best practices dictate that prevention and thoughtful planning go a longer way and justify pre-emptive investments,” said Max Dufour (@maxdufour), Partner at Harmeda. “The cost of downtime and the cost of fire drills can be greatly diminished by efficient IT management and strategy.”

13: Remember to think about the future

“The most painful lesson I’ve learned with managing a network is a failure to think towards the future.  I’ve always felt that I was missing pieces of information when I had issues in the past,” said Michael Spratt (@milkmanstl), Sr. Customer Operations Support Specialist IPS for MasterCard Worldwide.

“I now take time from my projects to be innovative and think towards the future. Discovering ways to make parts of my job automated helped me the most,” said Spratt.

1 2 Page
Insider Resume Makeover: How (and When) to Break the Rules
Join the discussion
Be the first to comment on this article. Our Commenting Policies