What is fault management? Describe five steps process in fault management. (2024)

written 7.9 years ago byWhat is fault management? Describe five steps process in fault management. (1)teamques10 ★ 64k

Fault Management:

Fault in a network is normally associated with failure of a network component and subsequent loss of connectivity.Fault management involves a five-step process:

(1) Fault detection, (2) Fault location, (3) Restoration of service, (4) Identification of root cause of the problem, and (5) Problem resolution.

i. The fault should be detected as quickly as possible by the centralized management system, preferably before or at about the same time as when the users notice it.

ii. Fault location involves identifying where the problem is located. We distinguish this from problem isolation, although in practice it could be the same.

iii. The reason for doing this is that it is important to restore service to the users as quickly as possible, using alternative means.

iv. The restoration of service takes a higher priority over diagnosing the problem and fixing it.

v. Identification of the root cause of the problem could be a complex process, which we will go into greater depth soon.

vi. After identifying the source of the problem, a trouble ticket can be generated to resolve the problem.

vii. In an automated network operations center, the trouble ticket could be generated automatically by the NMS.

Fault Detection:

i. Fault detection is accomplished using either a polling scheme (the NMS polling management agents periodically for status) or by the generation of traps (management agents based on information from the network elements sending unsolicited alarms to the NMS).

ii. An application program in NMS generates the ping command periodically and waits for response. Connectivity is declared broken when a preset number of consecutive responses are not received.

iii. The frequency of pinging and the preset number for failure detection may be optimized for balance between traffic overhead and the rapidity with which failure is to be detected.

iv. The alternative detection scheme is to use traps. One of the advantages of traps is that failure detection is accomplished faster with less traffic overhead.

Fault Location and Isolation Techniques :

i. Fault location using a simple would be to detect all the network components that have failed. The origin of the problem could then be traced by walking down the topology tree where the problem starts.

ii. Thus, if an interface card on a router has failed; all managed components connected to that interface would indicate failure.

iii. After having located where the fault is, the next step is to isolate the fault (i.e. determine the source of the problem).

iv. First, we should delineate the problem between failure of the component and the physical link. Thus, in the above example, the interface card may be functioning well, but the link to the interface may be down. We need to use various diagnostic tools to isolate the cause.

v. Let us assume for the moment that the link is not the problem but that the interface card is. We then proceed to isolate the problem to the layer that is causing it. It is possible that excessive packet loss is causing disconnection.

vi. We can measure packet loss by pinging, if pinging can be used. We can query the various Management Information Base (MIB) parameters on the node itself or other related nodes to further localize the cause of the problem.

vii. For example, error rates calculated from the interface group parameters, ifInDiscards, ifInErrors, ifOutDiscards, and ifOutErrors with respect to the input and out-put packet rates, could help us isolate the problem in the interface card.

Service Restoration:

i. Whenever there is a service failure, it is NOC's responsibility to restore service as soon as possible. This involves detection and isolation of the problem causing the failure, and restoration of service.

ii. In several failure situations, the network will do this automatically. This network feature is called self-healing. In other situations NMS can detect failure of components and indicate with appropriate alarms.

iii. Restoration of service does not include fixing the cause of the problem. That responsibility usually rests with the I&M group.

iv. A trouble ticket is generated and followed up for resolution of the problem by the I&M group.

Root Cause Analysis (RCA) :

Root Cause Analysis (RCA) is a popular and often-used technique that helps people answer the question of why the problem occurred in the first place.

It seeks to identify the origin of a problem using a specific set of steps, with associated tools, to find the primary cause of the problem, so that you can:

  1. Determine what happened.
  2. Determine why it happened.
  3. Figure out what to do to reduce the likelihood that it will happen again.

Problem Resolution:

Correcting the problem (indicates that the problem has been solved) by hardware & software techniques, managed objects are repaired or replaced, and operations returned to normal.

ADD COMMENTEDIT

What is fault management? Describe five steps process in fault management. (2024)

FAQs

What is fault management and describe the steps in fault management? ›

Fault management is a discipline of IT operations management focused on detecting, isolating, and resolving problems. Faults occur any time a configuration item (CI) malfunctions or whenever an event interferes or prevents proper operation or service delivery.

What is the fault management life cycle? ›

Fault management follows a sequence of actions: error detection, error diagnosis, and error recovery. Error detection monitors events such as alarm signals from network devices (when thresholds are exceeded or in the event of hardware failure), deterioration of performance, or application failures.

What are the steps in fault detection? ›

2, fault diagnosis consists of three stages: detection, isolation, and estimation. Fault detection is to check whether a fault has occurred. Fault isolation is to locate in which system component a fault has occurred. Fault estimation is to know the magnitude or severity of the faults.

What are the four basic steps of fault management? ›

Network fault management
  • Detect: Finding performance anomalies or interruptions in service delivery.
  • Isolate: Locating and isolating the event to present actionable faults.
  • Alert: Notifying network admins through alarms or notifications.
  • Resolve: Fixing faults through automation or manual intervention.

What is fault management system? ›

Fault management is proactively identifying and correcting system problems before they cause system outages or performance degradation. The four basic steps of fault management are: Problem detection: This step involves identifying system problems before they cause system outages or performance degradation.

What is the 5 fault? ›

Extreme caution and preparedness are needed if living along these five major fault lines in the Philippines: the West Panay Fault, the West Valley Fault Line, the East Valley Fault Line, the Surigao Fault Line, and the Bangui Fault.

What are 5 major faults? ›

Answer. Explanation: There are five active fault lines in the country namely the Western Philippine Fault, the Eastern Philippine Fault, the South of Mindanao Fault, Central Philippine Fault and the Marikina/Valley Fault System.

What are the six steps when investigating faults? ›

What are the six key steps to approach electrical fault finding?
  • Collect the Evidence. All the evidence collected must be relevant to the problem at hand. ...
  • Analyse the Evidence. ...
  • Locate the Fault. ...
  • Determination and Removal of the Cause. ...
  • Rectification of the Fault. ...
  • Check the System.
Jun 17, 2023

How many stages of lifecycle management are there? ›

The project management life cycle is usually broken down into four phases: initiation, planning, execution, and closure.

What are the life cycle management processes? ›

Lifecycle management is the process of managing the lifecycle of a product. Lifecycle management starts at the very beginning of the product in the design phase and continues through end of life or retirement of the product.

What is fault and performance management? ›

Proactive fault & performance management means problems can be addressed before the customer is impacted. Drive Revenues and Reduce Costs with Proactive Fault & Performance Management.

What are the 4 basic types of fault? ›

There are four types of faulting -- normal, reverse, strike-slip, and oblique. A normal fault is one in which the rocks above the fault plane, or hanging wall, move down relative to the rocks below the fault plane, or footwall.

What is the process fault detection and diagnosis? ›

Fault detection and diagnosis (FDD) focus on abnormal situations instead of univariate alarms, essential to maintain favorable operating conditions and predict risks of chemical processes.

What is the meaning of fault steps? ›

A series of parallel faults that, all inclined in the same direction, gives rise to a gigantic staircase; hence these are called step faults. Each step is a fault block and its top may be horizontal or tilted.

What is the goal of fault management? ›

The primary goal of fault management is to ensure the smooth and uninterrupted operation of network services by identifying and rectifying problems as they arise.

What are the different fault management options? ›

There are two primary ways to perform fault management - these are active and passive. Passive fault management is done by collecting alarms from devices (normally via SNMP traps) when something happens in the devices.

What is the process of fault development? ›

One of the major causes of faulting is the presence of dip-slip faults. Whenever the rocks compress against it each other in a vertical manner, there are some rocks that move downwards due to the compression and eventually developing a crack in it.

Top Articles
Latest Posts
Article information

Author: Jamar Nader

Last Updated:

Views: 6041

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Jamar Nader

Birthday: 1995-02-28

Address: Apt. 536 6162 Reichel Greens, Port Zackaryside, CT 22682-9804

Phone: +9958384818317

Job: IT Representative

Hobby: Scrapbooking, Hiking, Hunting, Kite flying, Blacksmithing, Video gaming, Foraging

Introduction: My name is Jamar Nader, I am a fine, shiny, colorful, bright, nice, perfect, curious person who loves writing and wants to share my knowledge and understanding with you.