Troubleshooting Cisco Switches - Global Knowledge

0 downloads 219 Views 621KB Size Report
Troubleshooting Cisco ... This is essentially what network troubleshooting and problem resolution is all ... tive mainte
Expert Reference Series of White Papers

Troubleshooting Cisco Switches

1-800-COURSESwww.globalknowledge.com

Troubleshooting Cisco Switches Joe Rinehart, MBA, CCIE #14256, CCNP/DP/VP

Introduction

Figure 1: Anatomy of a Network Issue

In ancient societies, many rituals existed, some targeted especially at recognizing the passage of an individual from youth to a recognized member of the adult community. These ceremonies or events are often referred to as a “rite of passage” and indicate an important milestone in the life of that person as well as the group of which they are a part. In most western cultures, this is no longer directly relevant, but certain experiences certainly play a similar role. For almost every network engineer, the “rite of passage” is a network outage or problem that was unpleasantly memorable and particularly difficult, and told in stories for years to come. The important concept here is to realize that while these things can and do happen, they should remain infrequent events rather than frequent occurrences. This is essentially what network troubleshooting and problem resolution is all about. In this white paper we will examine five steps for addressing issues, and provide some tools for dealing with issues when they arise. Copyright ©2013 Global Knowledge Training LLC. All rights reserved.

2

Step 1: Prevent as Much as Possible Issues and problems with networks of any shape and size are fundamentally inevitable, almost entirely due to the nature of the human condition, namely, imperfection. On the one hand, the fact that imperfect people created the networking technology used in the world today guarantees that flaws and imperfections will exist in that technology. Algorithms will malfunction, hardware will fail, and software will have bugs in it that can create issues of various kinds. On the other hand, network engineers troubleshooting issues almost always have to deal with a crowd of end-users, managers, and company leadership, both at times when the network is in steady-state and when it is having problems. This underscores topics discussed in other white papers and sources of information, namely that problems can and will arise. The key, then, is to prevent as many issues as possible before they even arise, through activities such as proactive maintenance, device monitoring, and so forth. The best solution to a problem is to keep it from occurring in the first Figure 2: Preventing Problems before They Start place. This helps minimize true problems that are not possible http://magazin.woxikon.de to foresee and builds confidence on the part of the end-users that matters are well in hand.

Step 2: Diagnose the Issue Effectively

Figure 3: Cisco Troubleshooting Methodology Flowchart http://www.cisco.com

Copyright ©2013 Global Knowledge Training LLC. All rights reserved.

3

One of the reasons for pointing out the nature of the human condition at the outset is strategic, since some issues may in fact not even be issues at all. An actual issue in this regard involved Internet access at a large port authority on the west coast, when the customer requested the Internet Service Provider to investigate a problem. Upon examination of the usage reports generated internally by the provider, the customer’s support engineer noted that the spikes consuming all of the traffic were taking place at approximately 2:00 AM PST, when the offices were closed. When the engineer arrived at the customer site to report the findings, the customer reluctantly admitted that a janitor (who was dismissed shortly after) had been illegally downloading movies of questionable content. The port authority began with the assumption that the service provider was having a service issue, but investigation revealed the real source of the problem. The investigation and diagnosis phase is the most critical part of the process, as it sets the stage for rallying resources and helps to narrow the scope of the actual issue. Without sounding disingenuous, understand that endusers will probably not understand networking technology at even a fundamental level, and that the complaint may not even have a technical foundation. For example, the user may report that they are experiencing network slowness and may even be impatient, but when you question further, you may discover that they are downloading large files or streaming videos from the Internet. In reality, that individual may have felt that the network was the issue when it was a problem of their own making. The skill required when interacting with end-users is to ask the right questions to get the information without creating offense.

Step 3: Identify the Root Cause(s) In the healthcare field, patients visit a doctor with a set of symptoms that they need addressed and treated. In some cases, such as the common cold, treating the symptoms themselves is advised, mostly because nothing else can be done. In other situations, the physician may order a variety of tests in order to find out the true root of the problem. Once the actual root cause is discovered, then the healthcare practitioner can set about a treatment plan to address and resolved the problem. Moving from the healthcare analogy and applying the principles directly, as a network engineer you play the role of the provider of care for the network. Your task is to take a collection of reported symptoms, understand the potential causes (sometimes more than one), and discover the root problem. While this sounds straightforward enough, this certainly can present challenges, particularly in a complex network. Having other competent and experienced peers to rely on (whether on staff with you or in other organizations) can help close the gap on understanding the core issue. The acronym RCA (Root Cause Analysis) is important to know in this regard. The RCA may be performed after a particular issue is resolved, in order to prevent it from recurring.

Step 4: Apply the Most Effective Solution While this might appear to be remarkably self-evident, in actual practice, the pressures of time, urgency, and management may encourage a “quick fix” rather than an actual solution. In many networks, temporary fixes for specific issues can be implemented, and if undocumented, can exist for years with many dependencies created as a result. With the turnover of network staff, which happens often, the knowledge regarding these phantom fixes is lost, and if removed as part of a cleanup process, may create immediate problems with no documentation to guide the staff in addressing the new issues.

Copyright ©2013 Global Knowledge Training LLC. All rights reserved.

4

As a side note, one of the more neglected parts of many networking environments is end-user education regarding the solutions that are in place. The more the user community knows and understands, the easier the network is to support as a whole. For example, users may complain that they strongly dislike having to change their user credential password every 90 days (often required by Microsoft products). Creatively explaining about the business impact of a security breach may not make them like the issue any less, but it can give them a better handle on why it is necessary.

Step 5: Document the Issue and Resolution As mentioned before, staff turnover is an inevitable part of the networking staff at a given organization. Some may find other positions within the same organization, others may go elsewhere, and yet others may be terminated for cause. When any of these things happen, information is often lost, which underscores the importance of a documentation repository. As a network professional, I have often been amazed at the lack of accurate network information (diagrams, addressing schemes, configurations, etc.) and documentation in place. Preserving critical information ensures that any future staff will have an easier time dealing with the infrastructure that they inherit, and potentially can help address liability and compliance issues should they arise.

Conclusion When an outage or network incident takes place, it can often create an intense burden for the engineers called upon to assist. As is the case with many emergencies, the demand for action and results may supersede more effective ways of resolving the issue(s) at hand. Having a solid strategy for how to approach incidents can give you the calm confidence to effectively resolve the problem even in the midst of the proverbial storm.

Learn More To learn more about how you can improve productivity, enhance efficiency, and sharpen your competitive edge, Global Knowledge suggests the following courses: TSHOOT - Troubleshooting and Maintaining Cisco IP Networks v1.0 ICND1 - Interconnecting Cisco Network Devices 1 Visit www.globalknowledge.com or call 1-800-COURSES (1-800-268-7737) to speak with a Global Knowledge training advisor.

About the Author Joe Rinehart, MBA, CCIE #14256, CCNP/DP/VP is a professional trainer specializing in technology, business, and social media. He is also a successful speaker and published author, as well as a columnist for the Federal Way Mirror. He is active in the social media space, managing one of the largest groups on LinkedIn, as well as serving on the national steering committee of the Cisco Collaboration Users Group. Joe also serves as president of the Seattle Cisco Users Group, serving technology professionals throughout the Puget Sound region. Joe Rinehart MBA, CCIE #14256, CCNP/DP/VP President and Chief Edutainment Officer

Gracestone Professionals, LLC [email protected] Twitter: jjrinehart

Copyright ©2013 Global Knowledge Training LLC. All rights reserved.

5