When a problem occurs at a datacenter, Application, or SLA, the Monitoring Station can send alerts to users. Alerts are notifications that inform users who are configured to receive alerts of the problem. The notification message contains the following information:
Whenever the status of an Element changes (for example, from Critical to Warning), Uptime Infrastructure Monitor sends an alert.
You can also configure alert escalations that occur if a warning is sent and is not acted upon. For example, if an alert is sent to a system administrator, and the administrator does not attend to the problem within a specified amount of time, then the alert is sent to the administrator’s manager.
Uptime Infrastructure Monitor can send alert to a phone, pager, or one or more email addresses.
The following is a sample email alert:
Notification type: Problem 1/12/2008 10:52 Host: filter Host State: N/A Service: FS Capacity - Filter Service State: WARN/ Output: /var is 92% full |
The following is a sample pager alert:
subject: CRIT Alert content: 5/7/2005 13:22 Type: Problem Service: FTP (CRIT) Host: filter (CRIT) |
Alerts in Uptime Infrastructure Monitor follow a specific flow. When Uptime Infrastructure Monitor detects a problem with a host, it issues an alert. Uptime Infrastructure Monitor then continues to check the host at specific intervals and reports on the status of the host.
Considering the following example:
Uptime Infrastructure Monitor encounters a critical error on a host. Uptime Infrastructure Monitor performs three rechecks at one minute intervals–all of which return a critical error–and then sends an alert after the third recheck.
Uptime Infrastructure Monitor then checks the host every two hours. While Uptime Infrastructure Monitor encounters two critical errors, it does not send an alert. Then, the status of the host changes from Critical to Warning. When this change is detected, Uptime Infrastructure Monitor sends an alert informing recipients of the change in status. When the status of the host changes to OK, Uptime Infrastructure Monitor issues an alert informing recipients that the host has recovered.
This alert flow is illustrated in the following diagram:
All service monitors have a common set of Monitor Alert Settings that configure aspects of the alert flow.
Alert Profiles are templates that tell Uptime Infrastructure Monitor how to react to various alerts that are generated by service checks. Alert Profiles enable Uptime Infrastructure Monitor to execute a series of actions in response to the failure of a service check or when a threshold is exceeded. The following diagram illustrates how an Alert Profile works:
An Alert Profile can send an alert via email, or to a pager or a cell phone. You can configure any or all of these actions to occur simultaneously by associating the Alert Profile to multiple Notification Groups. For example, if a Web server process stops responding, both the system administrator and Web server administrator can be notified.
Alert Profiles include standard message templates for emails and pagers, which are well suited for most alerting needs. However, you can customize the format of the alert using predefined variables. When creating or configuring an Alert Profile, selecting the Custom Format option provides you with a template to modify, and override the message template for the alert type you have selected:
See Custom Alert Message Variables for more information.
In addition to sending alert messages, Uptime Infrastructure Monitor can also execute an alert script. When an outage occurs, the script is run on the Monitoring Station, once for each user who receives notification. Like custom alert messages, alert scripts use predefined variables to represent outage-specific information; these variables are passed to the script at the time of the outage.
For information on alert script variables, see Script Alert Variables. For more information on alert scripts, see the IDERA Knowledge Base article, Creating Custom Alert Scripts in Uptime Infrastructure Monitor Alert Profiles.
To create Alert Profiles, do the following:
Email Alert
Sends the alert to the email addresses of the members of a Notification Group.
Pager Alert
Sends the alert to the pagers of the members of a Notification Group.
Script Alert
Executes an alert script on the Monitoring Station, once for each user who receives notification of the alert.
Because this alert option relies on a script or batch file, enter its name and path in the Script Path field (for example, on Linux, /usr/local/uptime/scripts/scriptAlert.sh
).
To view Alert Profiles, do the following:
Notification type: Problem 27/4/2006 09:19
Host: Test Host (OK)
Service: Test Monitor
Service State: OK
Output: This is a test notification; please ignore.
Alert Profile Tested
appears in the popup window. If an error message appears in the popup window, edit the profile and test it again.To edit Alert Profiles, do the following:
You can associate an Alert Profile to any Service Monitor, Application, or SLA if their state changes from OK to Warning or Critical. Alert Profiles are normally associated with any of these monitored items at the time of their configuration; you can also modify Alert Profile associations using existing service monitor definitions.
See Using Service Monitors, Working with Applications, and Adding and Editing SLA Definitions for more information about configuring Service Monitors, Applications, and SLAs, respectively.
Action Profiles are templates that direct Uptime Infrastructure Monitor when it encounters a problem on a monitored system. You can associate an Action Profile to any Service Monitor, Application, or SLA if their state changes from OK to Warning or Critical. Action Profiles are normally associated with any of these monitored Elements at the time of their configuration; Action Profile associations can also be changed when you are modifying existing service monitor definitions.
See Using Service Monitors, Working with Applications, and Adding and Editing SLA Definitions for more information about configuring Service Monitors, Applications, and SLAs, respectively.
Actions include one of the following tasks:
As templates, Action Profiles can be reused for any number of Service Monitor configurations. This means you can create a series of them as standard actions used to respond to typical types of problems you may encounter, depending on what role a Service Monitor is playing (for example, availability or performance).
If an administrator has integrated Uptime Infrastructure Monitor with VMware vCenter Orchestrator (see VMware vCenter Orchestrator Integration), you can configure Action Profiles to initiate Orchestrator workflows.
Orchestrator is a VMware vCenter Server add-on that allows its administrators to create workflows that automate vCenter management tasks. These Orchestrator workflows are open ended: all vCenter actions are available for automation through the processing of parameters and runtime arguments. Uptime Infrastructure Monitor Action Profiles can be configured to provide input parameters to specific workflows, thus integrating vCenter management with Uptime Infrastructure Monitor’s monitoring and alerting capabilities.
For example, if Uptime Infrastructure Monitor is monitoring memory, CPU, and hard disk use for a virtualized server, the passing of performance thresholds can trigger an Action Profile that, in turn, triggers an Orchestrator workflow that creates a new virtual machine to alleviate resource strain. In a converse example, if Uptime Infrastructure Monitor is monitoring a virtualized server for long periods of inactivity, a triggered Action Profile can initiate an Orchestrator workflow that shuts down the instance to free up resources.
By tightly integrating Uptime Infrastructure Monitor’s monitoring and alerting with VMware vCenter Orchestrator’s automated virtual environment administration, you can accelerate your organization’s reaction time with virtual systems management, and map established policies to automated actions.
When configuring Action Profiles, Uptime Infrastructure Monitor communicates with Orchestrator and dynamically produces a list of all available workflows. (This includes any third-party workflow packages that are installed on the Orchestrator server, including the Uptime Infrastructure Monitor Orchestrator package.)
When a workflow is selected, and the Get Parameters button is clicked, the corresponding input parameter fields are dynamically displayed, allowing you to specify parameter values required to completely configure the workflow for execution should an Uptime Infrastructure Monitor alert initiate it.
When configuring a VMware vCenter Orchestrator workflow, you have at your disposal a set of Uptime Infrastructure Monitor-specific variables that can be entered as parameter variables, and whose ensuing runtime values are passed to the Orchestrator workflow during execution. The variables available to you are those that are used when creating a custom alert format. See Custom Alert Message Variables for information.
You can also configure an Action Profile to send an SNMP trap to a particular host. An SNMP trap is a notification issued by a system that is running SNMP when a problem occurs. The host to which the SNMP trap is sent must be running an SNMP trap listener, such as snmptrapd by Net-SNMP. You can use any SNMP trap receiver as long as it can execute scripts when traps are received. When an SNMP trap is sent:
Below is a diagram depicting the process:
If you use SNMP traps, the trap message is sent in the format specified by the Uptime Infrastructure Monitor MIB. This MIB is found in the /scripts
directory. The Uptime Infrastructure Monitor enterprise OID is .1.3.6.1.4.1.24216
.
Create a new file called snmptrapd.conf in the C:\usr\etc\snmp directory with contents similar to the following ("public" on the first line is the SNMP community string - feel free to changed accordingly, and the path to Uptime Infrastructure Monitor on the second line should be changed accordingly):
authCommunity log,execute,net public traphandle default C:\Strawberry\perl\bin\perl.exe "C:\Program Files\uptime software\uptime\scripts\snmp-trap-script\trap_to_ext_event.pl" |
This approach assumes that your SNMP trap service monitors all have the same name, "SNMP Trap (member)", and are managed via a Service Group.
There are two logs that are created with this solution, one when traps are received by the Net-SNMP Trap Handler service and another when the trap is sent into Uptime Infrastructure Monitor by the perl script. This helps to narrow down an issue if one arises.
Create a new monitor called "SNMP Trap" and assign it to a Service Group, so that it can be applied to numerous Elements in Uptime Infrastructure Monitor.
Now when traps are received by the Net-SNMP snmptrapd service, the appropriate "SNMP Trap (member)" service monitor will be set to CRIT and an alert will be sent out. When the status is acknowledged in the Uptime Infrastructure Monitor UI (click the ACK button), the status will be set back to OK.
To create Action Profiles, do the following:
"/usr/local/uptime/recover.sh" "24/12/2014 5:01:05" "Problem" "printserver" "null" "WinSrv-Print Spooler" "CRIT/threshold error" "servicestatus: Not Running does not match Running (Service 'Print Spooler' found, status: Not Running, took 12ms)"
For information on predefined variables that can be used in Action Profile scripts, see Recovery Script Variables.
You can also use the recovery script to file trouble tickets with a system like Remedy, or to interact with third-party software packages. |
Windows Host
The name of the host on which the service is running.
Enter You can use this dynamic hostname in conjunction with service groups, where an issue can originate from one of many hosts. |
Windows Service
The display name of the specific Windows service to which the Action Profile applies. The display name of a service appears in the Name column of the Services Control Panel, or in the Description column of the Windows Task Manager Services tab.
The service display name must be entered verbatim, including spaces, otherwise it is not correctly processed. Double-clicking a service name in the Services Control Panel opens a properties window where you can highlight and copy the service Display name. |
Windows Service
The display name of the specific Windows service to which the Action Profile applies. The display name of a service appears in the Name column of the Services Control Panel, or in the Description column of the Windows Task Manager Services tab.
The service display name must be entered verbatim, including spaces, otherwise it is not correctly processed. Double-clicking a service name in the Services Control Panel opens a properties window where you can highlight and copy the service Display name. |
.1.3.6.1.2.1.34.4.1.7
.Monitoring Periods are the times over which a service monitor is actively monitoring a host. The Monitoring Periods also apply to the times when Uptime Infrastructure Monitor sends alerts
Uptime Infrastructure Monitor comes with the following Monitoring Periods:
You can add Monitoring Periods that suit your needs. For example, you can create a Monitoring Period called " Weekends" that only monitors a host from 12:00 a.m. on Saturday to 11:59 p.m. on Sunday.
To add Monitoring Periods, do the following:
On the Uptime Infrastructure Monitor tool bar, click Services.