What is fault management? Describe five steps process in fault management. (2024)

written 7.9 years ago by

teamques10 &starf; 64k

Fault Management:

Fault in a network is normally associated with failure of a network component and subsequent loss of connectivity.Fault management involves a five-step process:

(1) Fault detection, (2) Fault location, (3) Restoration of service, (4) Identification of root cause of the problem, and (5) Problem resolution.

i. The fault should be detected as quickly as possible by the centralized management system, preferably before or at about the same time as when the users notice it.

Fault Detection:

i. Fault detection is accomplished using either a polling scheme (the NMS polling management agents periodically for status) or by the generation of traps (management agents based on information from the network elements sending unsolicited alarms to the NMS).

ii. An application program in NMS generates the ping command periodically and waits for response. Connectivity is declared broken when a preset number of consecutive responses are not received.

Fault Location and Isolation Techniques :

i. Fault location using a simple would be to detect all the network components that have failed. The origin of the problem could then be traced by walking down the topology tree where the problem starts.

ii. Thus, if an interface card on a router has failed; all managed components connected to that interface would indicate failure.

iii. After having located where the fault is, the next step is to isolate the fault (i.e. determine the source of the problem).

iv. First, we should delineate the problem between failure of the component and the physical link. Thus, in the above example, the interface card may be functioning well, but the link to the interface may be down. We need to use various diagnostic tools to isolate the cause.

v. Let us assume for the moment that the link is not the problem but that the interface card is. We then proceed to isolate the problem to the layer that is causing it. It is possible that excessive packet loss is causing disconnection.

vi. We can measure packet loss by pinging, if pinging can be used. We can query the various Management Information Base (MIB) parameters on the node itself or other related nodes to further localize the cause of the problem.

vii. For example, error rates calculated from the interface group parameters, ifInDiscards, ifInErrors, ifOutDiscards, and ifOutErrors with respect to the input and out-put packet rates, could help us isolate the problem in the interface card.

Service Restoration:

i. Whenever there is a service failure, it is NOC's responsibility to restore service as soon as possible. This involves detection and isolation of the problem causing the failure, and restoration of service.

ii. In several failure situations, the network will do this automatically. This network feature is called self-healing. In other situations NMS can detect failure of components and indicate with appropriate alarms.

iii. Restoration of service does not include fixing the cause of the problem. That responsibility usually rests with the I&M group.

iv. A trouble ticket is generated and followed up for resolution of the problem by the I&M group.

Root Cause Analysis (RCA) :

Root Cause Analysis (RCA) is a popular and often-used technique that helps people answer the question of why the problem occurred in the first place.

It seeks to identify the origin of a problem using a specific set of steps, with associated tools, to find the primary cause of the problem, so that you can:

Determine what happened.
Determine why it happened.
Figure out what to do to reduce the likelihood that it will happen again.

Problem Resolution:

Correcting the problem (indicates that the problem has been solved) by hardware & software techniques, managed objects are repaired or replaced, and operations returned to normal.

ADD COMMENTEDIT

What is fault management? Describe five steps process in fault management. (2024)

FAQs

What is fault management and describe the steps in fault management? ›

Fault management is a discipline of IT operations management focused on detecting, isolating, and resolving problems. Faults occur any time a configuration item (CI) malfunctions or whenever an event interferes or prevents proper operation or service delivery.

Tell Me More ›

What is the fault management life cycle? ›

Fault management follows a sequence of actions: error detection, error diagnosis, and error recovery. Error detection monitors events such as alarm signals from network devices (when thresholds are exceeded or in the event of hardware failure), deterioration of performance, or application failures.

What are the steps in fault detection? ›

2, fault diagnosis consists of three stages: detection, isolation, and estimation. Fault detection is to check whether a fault has occurred. Fault isolation is to locate in which system component a fault has occurred. Fault estimation is to know the magnitude or severity of the faults.

Tell Me More ›

What are the four basic steps of fault management? ›

Network fault management

Detect: Finding performance anomalies or interruptions in service delivery.
Isolate: Locating and isolating the event to present actionable faults.
Alert: Notifying network admins through alarms or notifications.
Resolve: Fixing faults through automation or manual intervention.

Explore More ›

What is fault management system? ›

Fault management is proactively identifying and correcting system problems before they cause system outages or performance degradation. The four basic steps of fault management are: Problem detection: This step involves identifying system problems before they cause system outages or performance degradation.

Know More ›

What is the 5 fault? ›

Extreme caution and preparedness are needed if living along these five major fault lines in the Philippines: the West Panay Fault, the West Valley Fault Line, the East Valley Fault Line, the Surigao Fault Line, and the Bangui Fault.

Find Out More ›

What are 5 major faults? ›

Answer. Explanation: There are five active fault lines in the country namely the Western Philippine Fault, the Eastern Philippine Fault, the South of Mindanao Fault, Central Philippine Fault and the Marikina/Valley Fault System.

Learn More Now ›

What are the six steps when investigating faults? ›

What are the six key steps to approach electrical fault finding?

Collect the Evidence. All the evidence collected must be relevant to the problem at hand. ...
Analyse the Evidence. ...
Locate the Fault. ...
Determination and Removal of the Cause. ...
Rectification of the Fault. ...
Check the System.

Jun 17, 2023

How many stages of lifecycle management are there? ›

The project management life cycle is usually broken down into four phases: initiation, planning, execution, and closure.

Read The Full Story ›

What are the life cycle management processes? ›

Lifecycle management is the process of managing the lifecycle of a product. Lifecycle management starts at the very beginning of the product in the design phase and continues through end of life or retirement of the product.

Show Me More ›

What is fault and performance management? ›

Proactive fault & performance management means problems can be addressed before the customer is impacted. Drive Revenues and Reduce Costs with Proactive Fault & Performance Management.

Get More Info ›

What are the 4 basic types of fault? ›

There are four types of faulting -- normal, reverse, strike-slip, and oblique. A normal fault is one in which the rocks above the fault plane, or hanging wall, move down relative to the rocks below the fault plane, or footwall.

Discover More Details ›

What is the process fault detection and diagnosis? ›

Fault detection and diagnosis (FDD) focus on abnormal situations instead of univariate alarms, essential to maintain favorable operating conditions and predict risks of chemical processes.

What is the meaning of fault steps? ›

A series of parallel faults that, all inclined in the same direction, gives rise to a gigantic staircase; hence these are called step faults. Each step is a fault block and its top may be horizontal or tilted.

Find Out More ›

What is the goal of fault management? ›

The primary goal of fault management is to ensure the smooth and uninterrupted operation of network services by identifying and rectifying problems as they arise.

What are the different fault management options? ›

There are two primary ways to perform fault management - these are active and passive. Passive fault management is done by collecting alarms from devices (normally via SNMP traps) when something happens in the devices.

View Details ›

What is the process of fault development? ›

One of the major causes of faulting is the presence of dip-slip faults. Whenever the rocks compress against it each other in a vertical manner, there are some rocks that move downwards due to the compression and eventually developing a crack in it.