Skip to content

Configuration Guide

my2ndhead edited this page Jan 3, 2015 · 29 revisions

Introduction

The Alert Manager-App's main purpose is to extend Splunk's core alerting functionality with sophisticated incident workflows and reporting.

Alert Manager can be also used to replace existing workflow solutions (eg. Incident Review in Enterprise Security).

Alert Manager core concepts

Alert Manager is built on top of Splunk's core alerting functionality, utilizing its main functionality. Instead of just doing a "fire and forget" action on the alert, Alert Manager will store the state of an alert as an incident in a KV store.

Alert Manager was designed to easily integrate into existing environments by just adding a Alert Script to alerts that should be managed and adding the alert_manager role to the users that use the app or send alerts to the app. Existing Alert Scripts can be integrated by Alert Manager's pass-through capability.

Alerts & Incidents

It is important, to distinguish between the terms alerts and incidents.

The term alert is used for alerts triggered by a Splunk scheduled search. Alert metadata is indexed by default into an index named alerts.

The term incident is used for enriched metadata around the alert. The data is stored in a KV store and some metadata is enriched using lookup tables (for dynamic customizations).

Incidents are stored with metadata such as alert_time, job_id, owner, status, priority, ttl, etc.

Incident Settings

To define, which alerts should create incidents within Alert Manager , select the item Incident Settings under the Settings menu.

Categorization

Categorization is used to group incidents. Categorization can be used to filter incidents on the Incident Posture dashboard and run category statistics. There are two attributes can be used: category and subcategory.

Tags

For more complex environments, incidents can be tagged with an arbitrary number of tags. Incidents can be filtered on the Incident Posture.

Severity, Priority and Urgency

The incident's urgency is calculated using the alert's severity and the incident's priority setting. This is based on a lookup table named alert_urgencies. A sample lookup table has been provided.

$APP_HOME/lookup/alert_urgencies.csv.sample: severity,priority,urgency unknown,unknown,low unknown,low,low unknown,medium,low ... informational,high,informational informational,critical,informational ... low,high,medium low,critical,medium ... fatal,high,critical fatal,critical,critical`

To adjust the urgencies, create a new lookup table $APP_HOME/lookup/alert_urgencies.csv and edit $APP_HOME/local/transforms.conf to point to this new lookup table.

Alert Scripts

Alert Manager uses Splunk's built-in alert script facility. To still allow further or existing alert scripts to run, Alert Manager passes through all shell option to an optional alert script.

Auto Assignment

Alert Manager allows incidents to be automatically assigned to owners. If no owner is selected, a default owner is assigned ( defined under Global Settings ). Owners can be selected amongst user defined under User Settings.

Auto Resolve

Splunk's alerting facility triggers on search results. Sometimes an incident is resolved if no further search-results are found. In this case the "Auto TTL Resolve" -function can be used.

Another scenario could be, that an alert keeps reoccurring many times before an incident owner can find the root cause and fix the problem. This may cause a lot of incidents in the "new"-state. To close these previously opened incidents, the Auto Previous Resolve -function can be used.

Auto TTL Resolve

To use the Auto TTL Resolve feature, the expiration time of the triggered alert time should be set. E.g. if an alert search runs every 15 Minutes, the expiration time should also be set to 15 Minutes.

E.g. the first alert fires at 1:00am and creates an incident. The next scheduled alert runs at 1:15am without results. The first alert from 1:00am will expire at 1:15am and the incident will be automatically resolved with status auto_ttl_resolve.

Auto Previous Resolve

The Auto Previous Resolve feature closes previous incident in status "new"

E.g. the first alert fires at 1:00am and creates an incident. The next scheduled alert fires at 1:15am and opens a new incident. If the first incident from 1:00am is still in status "new", it will be automatically resolved with status auto_previous_resolve. In case, the first incident's status was changed, it will not be resolved and it's status will be preserved.

Configure Alerts

For alerts to be managed by Alert Manager a few per-requisites have to be fulfilled.

The scheduled alert has to run a scripted alert script "alert_handler.py". Enable "Run a script" under the Splunk Saved Searches configuration page, and add the script name into the text field.

The alert has to be run by a user with the alert_manager role. This is needed for the alert_handler.py script to be able to ingest alert metadata into the index "alerts".

Configure Incident Settings

By default, the table shows all alerts that are managed by Alert Manager (indicated by the _key column). Depending on the App context drop-down selection, alerts that are readable by the logged in user's role, are displayed. Unmanaged alerts do not yet have a _key set.

To configure an unmanaged alert to be managed, the App context where the alert resides in needs to be selected. All alerts in the app context will be displayed in the table. If there are alerts that, are superfluous, they can be deleted by right-clicking on the table and selecting Remove row.

To store the new incident configuration, Save settings has to be selected. Before or after saving, further customization of the incident can be applied.

Category and Subcategory

A category and a subcategory can be defined for every incident.

Tags

Tags have to be entered as a space separated list.

Priority

A priority can be assigned to an alert. Following values are available: unknown, low, medium, high, critical.

Alert Scripts

An alert script placed under $SPLUNK_HOME/bin/scripts/

Auto Assignment

Checkbox to enable/disable auto-assignment. If auto_assigned is enabled under column auto_assign, the incident will be assigned to the value in column auto_assign_owner.

Auto Resolve

If auto_ttl_resolve is selected, the incident will be closed, after the alert has expired.

If auto_previous_resolve is selected, previously opened incident with the same alert name, and status new will be automatically resolved.

Clone this wiki locally