An alarm management framework for automated network fault identification
نویسندگان
چکیده
Many timing constraint (or real-time) distributed systems, such as real-time database systems, are now being used in safety critical applications. However, they are subject to system failures caused by the malfunction of underlying network components. Without the helps of network experts or sophisticated management tools, most users cannot resolve these network problems by themselves. Sometimes, worse, it is usually prohibited to use these management tools, e.g. the ‘ping’ command, for the security sake. Accordingly, we develop a management system to automate network fault identification merely based on the analysis of the abnormal events from the monitored timing constraint distributed system. In this system, a fault identification framework is designed to identify automatically faulty network elements by using a two-level fault propagation model which combines Timing Constraint Petri nets with an alarm clustering mechanism. In addition, the concepts of redundant/ringleader alarms and innocent network elements are also introduced into the framework to obtain an effective diagnosis. At last, the management system is implemented according to the framework to demonstrate the performance of our fault identification. q 2004 Elsevier B.V. All rights reserved.
منابع مشابه
Automatic Alarm Correlation for Fault Identification
In communication networks, a large number of alarms exist to signal any abnormal behavior of the network. As network faults typically result in a number of alarms, correlating these different alarms and identifying their source is a major problem in fault management. The alarm correlation problem is of major practical significance. Alarms that have not been correlated may not only lead to signi...
متن کاملDiscovering Rules for Fault Management
At the heart of the Internet revolution is global telecotnmunication systems. These systems, initially designed for voice trafJic, provide the vast backbone bandwidth capabilities necessar)l for Internet trafJic. They have built-in redundancy and complexity to ensure robustness and quality of service. To facilitate this, this requires complex fault identification and management systems. Fault i...
متن کاملFDMG: Fault detection method by using genetic algorithm in clustered wireless sensor networks
Wireless sensor networks (WSNs) consist of a large number of sensor nodes which are capable of sensing different environmental phenomena and sending the collected data to the base station or Sink. Since sensor nodes are made of cheap components and are deployed in remote and uncontrolled environments, they are prone to failure; thus, maintaining a network with its proper functions even when und...
متن کاملData Mining Meets Network Management: The NEMESIS Project
Modern communication networks generate large amounts of operational data, including traffic and utilization statistics and alarm/fault data at various levels of detail. These massive collections of network-management data can grow in the order of several Terabytes per year, and typically hide “knowledge” that is crucial to some of the key tasks involved in effectively managing a communication n...
متن کاملEfficient Alarm Management in Optical Networks
As the capacity of optical transport networks increases, rapid fault identification and localization become increasingly important. These problems are more challenging than in traditional electronic networks because of optical transparency. In a transparent optical network which does not regenerate optical signals, a fault may propagate to various parts of the network from the origin, and multi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Communications
دوره 27 شماره
صفحات -
تاریخ انتشار 2004