Event Analytics

CategoryArchitecture Component

Event Analytics is the practice of gathering, processing, analyzing and interpreting data from various systems to predict potential issues and trends, and guide business decisions. It has gained momentum in IT operations, device control, customer service and other domains.


Component Overview


Making Sense of Event Volume

Event Analytics has evolved as the technical process of resolving and triaging issues in IT environments, to ensure the availability and performance of managed systems according to Service Level Agreements (SLA) in place.

Operational information is gathered from the running infrastructure components, as well as earlier logged application and platform data. Other sources of data include software agents running in operating systems or containers gathering data relevant to I/O, transactions and resource usage. The results of scripted event analysis are collected and delivered to Event Monitoring dashboards and alert notification targets.

Event analysis and correlation is a multi-step process which employs a number of techniques:

  • Event Filtering — processing event streams and discarding events that are deemed to be irrelevant.
  • Event Aggregation — combine multiple, similar events into an aggregate of the underlying event data.
  • Topological Masking — ignore events pertaining to systems that are downstream of the event emitter.
  • Root-cause Analysis — analyze dependencies, to detect whether some events can be explained by others.
  • Action Triggering — trigger notifications or automatic corrective actions based on the event severity.

Following are examples of technology domains where event management and alerting delivers tangible benefits:

  • Network Management — event analysis is performed on a Network Management System (NMS), for example, to notify that a device has just rebooted or that a network link is currently down.
  • Systems Management — an event may report that the CPU utilization of an application server has been higher than a pre-defined threshold for over the acceptable period of time.
  • Service Management — an event may notify that a Service-Level Objective (SLO) is not met for a given client, and initiate measures to remediate the detected system quality degradation.
  • Security Management — a Security Information and Event Management (SIEM) platform may read records from the security audit log, and report suspicious activity or block it until final resolution.

Event Analytics discipline has also proven to be highly effective in business activity monitoring, operational device control and customer experience management — areas that generate large volumes of data. Modern techniques of Stream Processing and Machine Learning are employed to convert the accumulated event data into reliable, intelligent business decisions and repeatable, outstanding performance results.

Some examples of Event Analytics tools are: query engine — Apache Spark, Apache Impala and Presto; full-text search engine — Elasticsearch and Apache Solr; log manager — Logstash and Graylog; time-series database — InfluxDB and TimescaleDB; non-relational database — Apache HBase, Apache Cassandra and MongoDB.

analytics

The goal of Event Analytics is to detect activities of interest, analyze them and determine the required control action.

Event notifications keep different parts of a system synchronized with each other, without tight coupling and performance degradation.

Event Analytics is focused on extracting meaning out of continuous event volume, to help tune the system for higher efficiency.

Real-time event analysis and self-learning algorithms can reveal trends and metrics that would otherwise be lost in the mass of information.

Methods of Artificial Intelligence applied to Event Analytics enable triggering of alerts when deviations from the norm are noticed, such that system administrators are informed preemptively before problems escalate and impact the user experience.