Fault management is the component of network management concerned with detecting, isolating and resolving problems. Properly implemented, network fault management can keep connectivity, applications and services running at an optimum level, provide fault tolerance and minimize downtime. Platforms or tools designed specifically for this purpose are called fault management systems.
Faults result from malfunctions or events that interfere with, degrade or obstruct service delivery. Examples of faults include hardware failure, connectivity loss or port status change. Once a fault is detected, the management platform notifies the administrator (and any additional authorized or designated parties) using an alarm or alert. These notifications can be viewed in the fault management system's GUI, and many platforms now can forward these alerts via email, SMS and/or a mobile app.
In addition, network fault management systems may be configured to automatically resolve or even prevent certain events using programs and scripts.
Fault management is one component of FCAPS (fault management, configuration, accounting, performance and security), which is a network management framework established by the International Organization for Standardization (ISO).
Important functions of fault management
As a whole, network fault management comprises a variety of functions. Here are some examples of actions and services performed by fault management systems to keep the network operational:
- definition of thresholds for potential failure conditions;
- constant monitoring of system status and usage levels;
- continuous scanning for threats, such as viruses and Trojans;
- general diagnostics;
- remote control of system elements, including workstations and servers from a single location;
- alarms that notify administrators and users of impending and actual malfunctions;
- tracing the locations of potential and actual malfunctions;
- automatic correction of potential problem-causing conditions;
- automatic resolution of actual malfunctions; and
- detailed logging of system status and actions taken.
Types of fault management
There are two types of network fault management: active and passive.
Active fault management uses various tools, such as ping or TCP/UDP port checks, to continually query devices and determine their status. It's akin to a person asking everyone in the room at repeated intervals, "How are you?" This allows the fault management system to proactively identify and rectify potential issues in real time -- sometimes before they even become problems -- but the tradeoff is more network chatter.
Passive fault management systems, on the other hand, monitor their network environments for events that indicate a fault or failure has occurred. This information may come from error logs or SNMP traps, among other sources. It's comparable to a person who quietly listens until someone calls out for help. While passive fault management is more conservative in its resource usage, its drawback is that it may not discover faults until it's too late.
Fault management process
Although the fault management process used in commercial platforms may vary slightly among different vendors, most generally, follow this lifecycle when issuing an alarm:
- Fault detection: The system discovers that service delivery has been interrupted or its performance has degraded.
- Fault diagnosis and isolation: The source of the fault, such as a component failure or power outage, and its location in the network topology are identified.
- Event correlation and aggregation: Because a single fault can cause multiple alarms, fault management systems often group related events for administrators and provide a root cause analysis.
- Restoration of service: The network management system automatically executes any preconfigured scripts or programs to get services up and running as soon as possible.
- Problem resolution: The source of the fault is corrected, repaired or replaced. Depending on the cause, manual intervention may be required.
This was last updated in February 2018
Continue Reading About fault management
- Here's what you need to know about network fault management in today's complex -- and changing -- data centers.
- No enterprise can perform efficiently without network monitoring tools. Here, SearchNetworking is profiling some of the leading vendors and their products.
- Need to upgrade your network's performance? This checklist includes what you need to know, including and beyond security and improved throughput requirements.
- AWS Network Load Balancer (NLB)
- AWS Network Load Balancer (NLB) is an Amazon Web Services tool that distributes high-performance traffic across multiple cloud ... Seecompletedefinition
- A bogon is an illegitimate Internet Protocol address that falls into a set of IP addresses that have not been officially assigned... Seecompletedefinition
- polling (computing)
- In computer science, polling or a polled operation denotes the process of repeatedly sampling the status of external devices by a... Seecompletedefinition
Dig Deeper on Network management and monitoring
Fault management is a discipline of IT operations management focused on detecting, isolating, and resolving problems. Faults occur any time a configuration item (CI) malfunctions or whenever an event interferes or prevents proper operation or service delivery.What is the best description for fault management? ›
Fault management is the component of network management concerned with detecting, isolating and resolving problems. Properly implemented, network fault management can keep connectivity, applications and services running at an optimum level, provide fault tolerance and minimize downtime.What is fault management in TMN? ›
A TMN provides the capability to monitor NE failures in near-real time. When such a failure occurs, an indication is made available by the NE. Based on this, a TMN determines the nature and severity of the fault. For example, it may determine the effect of the fault on the services supported by the faulty equipment.What are the benefits of fault management? ›
Benefits of a Network Fault Management System
Create substantial savings in initial expenditure, operational, and maintenance costs. Save your investment in legacy remote monitoring systems by extending their working life. Provide advanced features like after-hours monitoring and automatic notifications at low cost.
Fault management follows a sequence of actions: error detection, error diagnosis, and error recovery. Error detection monitors events such as alarm signals from network devices (when thresholds are exceeded or in the event of hardware failure), deterioration of performance, or application failures.What are the five steps of fault management? ›
Fault management involves a five-step process: (1) Fault detection, (2) Fault location, (3) Restoration of service, (4) Identification of root cause of the problem, and (5) Problem resolution.What are the 3 types of faults and description? ›
Normal faults occur when two plates, one on top of the other, slide past each other and create the fault. Reverse faults occur when one plate slides under the other, creating a vertical offset. Strike-slip faults happen when two plates move horizontally past each other.What are the functions of fault management system? ›
In network management, fault management is the set of functions that detect, isolate, and correct malfunctions in a telecommunications network, compensate for environmental changes, and include maintaining and examining error logs, accepting and acting on error detection notifications, tracing and identifying faults, ...What are the 3 parts of fault? ›
There are three main types of fault which can cause earthquakes: normal, reverse (thrust) and strike-slip. Figure 1 shows the types of faults that can cause earthquakes. Figures 2 and 3 show the location of large earthquakes over the past few decades.What is the difference between TMN and eTOM? ›
The main differentiator between eTOM and TMN is that the TMN approach was built on the requirements to manage network equipment and networks (bottom up) while eTOM was built on the need to support processes of the entire service provider enterprise (top down).
A properly managed fault system offers many positive benefits; a properly managed network guarantees availability, minimizing downtime and early detection of faults.What is the performance and fault management? ›
Performance management means ensuring the network is operating as efficiently as possible whereas fault management means preventing, detecting, and correcting faults in the network circuits, hardware, and software (e.g., a broken device or improperly installed software).What is the biggest benefit to fault tolerance? ›
The objective of creating a fault-tolerant system is to prevent disruptions arising from a single point of failure, ensuring the high availability and business continuity of mission-critical applications or systems.Is fault management is really closely related to performance management? ›
Explanation: Fault management is really closely related to performance management. It is important to ensure that the network handles faults as effectively as it handles it's normal functioning to achieve better performance management.Why is fault finding important? ›
Technically, fault finding means testing the electrical equipment to determine whether it's working safely and correctly. However, there are certain guidelines your electrician needs to follow to make sure the electrical issue is solved correctly.What are the 4 main types of faults? ›
Fault surfaces are often nearly planar, and that planar surface is referred to as a “fault plane.” There are four types of faulting -- normal, reverse, strike-slip, and oblique.What are the five parts of fault? ›
Parts of a Fault
The main components of a fault are (1) the fault plane, (2) the fault trace, (3) the hanging wall, and (4) the footwall. The fault plane is where the action is. It is a flat surface that may be vertical or sloping.
The six point fault finding technique has six steps: 1. Test the system, i.e. make measurements or collect other evidence, 2. Analyse the reading, 3. Locate fault based on analysis, 4. Determine the original cause of the fault.What are the six steps when investigating faults? ›
- Collect the Evidence. All the evidence collected must be relevant to the problem at hand. ...
- Analyse the Evidence. ...
- Locate the Fault. ...
- Determination and Removal of the Cause. ...
- Rectification of the Fault. ...
- Check the System.
2, fault diagnosis consists of three stages: detection, isolation, and estimation.
Earth scientists use the angle of the fault with respect to the surface (known as the dip) and the direction of slip along the fault to classify faults. Faults which move along the direction of the dip plane are dip-slip faults and described as either normal or reverse (thrust), depending on their motion.What are the three causes of faults? ›
There are three causes to faults: tensional stress, compressional stress, and shear stress. Tensional stress happens when rocks are pulled away from each other; compressional stress, on the other hand, happens when rocks are pushed towards each other.What type of faults are the most common? ›
Normal Faults: This is the most common type of fault. It forms when rock above an inclined fracture plane moves downward, sliding along the rock on the other side of the fracture. Normal faults are often found along divergent plate boundaries, such as under the ocean where new crust is forming.What are the three types of faults and its basic responses to stress? ›
In terms of faulting, compressive stress produces reverse faults, tensional stress produces normal faults, and shear stress produces transform faults.How faults are detected and prevented? ›
Fault detection can be achieved through various validation techniques. This includes devising comprehensive test cases, continuous integration and testing, cross-verification using traceability matrix, automated testing, and so on.What are the characteristics of fault system? ›
(a) Fault zones usually are irregular, branched, anastomosed, and curved rather than simple and planar. (b) Faults are generally composed of one or more clay or clay-like gouge zones in a matrix of sheared and foliated rock bordered by highly fractured rock.What are the two parts of a fault called? ›
The terms hanging wall and footwall in the diagrams apply to situations where the fault is not vertical. The body of rock above the fault is called the hanging wall , and the body of rock below it is called the footwall .What are the two parts of a fault? ›
The two sides of a non-vertical fault are known as the hanging wall and footwall. The hanging wall occurs above the fault plane and the footwall occurs below it.Where do faults occur? ›
Faults are defined by the kind of motion that happens where they are. Normal faults show cracks where one block of rock is sliding down and away from another block of rock. These faults usually occur in areas where the crust is very slowly stretching or where two plates are pulling away from each other.What is eTOM in simple words? ›
eTOM (enhanced Telecom Operations Map) is a framework for enterprise processes in the telecommunications industry. It is a set of standards, models and best practices. The framework consists of three hierarchically arranged levels of processes.
The typical process levels distinguished are listed below: -Level 0: business activities -Level 1: process groupings -Level 2: core processes -Level 3: business process flows -Level 4: operational process flows -Level 5: detailed process flows The processes on levels 0-3 (see also Fig.Which are the two main types of network management protocol? ›
The most common types of network management protocols include Internet Control Message Protocol (ICMP) and Simple Network Management Protocol (SNMP).Does fault always lead to failure? ›
Failures are a usually a result of system errors that are derived from faults in the system. However, faults do not necessarily result in system errors if the erroneous system state is transient and can be 'corrected' before an error arises.Is every fault a vulnerability? ›
Once someone has figured out a way to attack – or exploit – a flaw, the flaw becomes a vulnerability. If you're still confused, think of it this way: all vulnerabilities are flaws, but not all flaws are vulnerabilities. All flaws have the potential to become vulnerabilities.What is the difference between fault and failure in maintenance? ›
While a failure is an event that occurs at a specific point in time, a fault is a state that will last for a shorter or longer period.What are the 3 types of performance management? ›
- The Balanced Scorecard. ...
- Management By Objectives. ...
- Budget-driven Business Plans.
All five component processes (i.e., planning, monitoring, developing, rating, rewarding) work together and support each other, resulting in natural, effective performance management.What are the three stages of performance management? ›
While comprehensive as a process, performance management can be broken down into three distinct stages: coaching, corrective action, and termination.How do you manage fault tolerance? ›
Improving fault tolerance through redundant designs. One of the simple actions that can be taken to increase fault tolerance is by incorporating redundancies in the design. Redundancy simply means the presence of an alternate system or solution that can take over the intended function should the primary system fail.What are fault tolerance requirements? ›
Requirements. The basic characteristics of fault tolerance require: No single point of failure – If a system experiences a failure, it must continue to operate without interruption during the repair process.
The three most common performance management goals are; developing people so that they can reach their full potential, rewarding people for their contributions, and driving overall organisational performance.Who is responsible for performance management? ›
Performance management is the responsibility of all stakeholders (shareholders, executive management, line management, human resources, unions, and employees). If developed, implemented, and managed correctly, performance management can be to the advantage and advancement of everybody in the organisation.Who is responsible for the overall management? ›
These top managers are responsible for setting the overall direction of a company and making sure that major organizational objectives are achieved. Their leadership role can extend over the entire organization or for specific divisions such as finance, marketing, human resources, or operations.
In every case, the electrician will troubleshoot the entire electrical circuit in your home to pinpoint any faults that need to be fixed. Then every possible source of the problem will be tested to evaluate a theory; every probability will be eliminated one by one until the likely cause is identified.Why are faults a concern? ›
The danger of living near fault lines
Living near fault lines is inherently dangerous but difficult to avoid. Evidence suggests that humans congregating around tectonic faults (areas where the plates that make up the lithosphere above the Earth's mantle travel and sometimes cause earthquakes) was no accident.
We see it in the streets and sometimes in our own families. Somehow, we have forgotten how to agree to disagree. Instead of fact-finding when we encounter someone with a different view, there is fault-finding. Someone's differing view suddenly defines the person.What is the difference between fault management and performance management? ›
Performance management means ensuring the network is operating as efficiently as possible whereas fault management means preventing, detecting, and correcting faults in the network circuits, hardware, and software (e.g., a broken device or improperly installed software).What is an example of fault system? ›
The San Andreas Fault is the world's most famous; it splits California between the Pacific Plate and the North American Plate and moved 20 feet (6 m) in the 1906 San Francisco earthquake. These types of faults are common where land and ocean plates meet.What are the different types of fault systems? ›
There are four types of faulting -- normal, reverse, strike-slip, and oblique. A normal fault is one in which the rocks above the fault plane, or hanging wall, move down relative to the rocks below the fault plane, or footwall.What is the importance of fault system? ›
Fault zones control the location, architecture and evolution of a broad range of geological features, act as conduits for the focused migration of economically important fluids and, as most seismicity is associated with active faults, they also constitute one of the most important global geological hazards.
A fault is a fracture or zone of fractures between two blocks of rock. Faults allow the blocks to move relative to each other. This movement may occur rapidly, in the form of an earthquake - or may occur slowly, in the form of creep. Faults may range in length from a few millimeters to thousands of kilometers.What is the fault requirement? ›
The basic principle is that a defendant should be able to contemplate the harm that his actions may cause, and therefore should aim to avoid such actions. Different forms of liability employ different notions of fault, in some there is no need to prove fault, but the absence of it.How do you diagnose a fault? ›
Using analytical redundancy, fault diagnosis is achieved by direct comparison between measured signals (from the actual system) and generated signals (estimated from a mathematical model of the process).What are the three critical parts of performance management? ›
A framework for performance management
An effective process will address these three interlinked components: Planning – do employees know what you're evaluating? Cultivation – creating the space for employees to bloom. Accountability – making performance a proactive process.
There are three main types of fault which can cause earthquakes: normal, reverse (thrust) and strike-slip. Figure 1 shows the types of faults that can cause earthquakes. Figures 2 and 3 show the location of large earthquakes over the past few decades.What are 5 example of faults? ›
Types of faults include strike-slip faults, normal faults, reverse faults, thrust faults, and oblique-slip faults.What are the 4 parts of fault? ›
Parts of a Fault
The main components of a fault are (1) the fault plane, (2) the fault trace, (3) the hanging wall, and (4) the footwall.