Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Viewing All SLAs

Service level agreements on the SLAs dashboard indicate whether performance targets are being met. Although the main summary displays the status of the SLA definition as a whole, you can also expand the view to verify how well component service level objectives (SLOs) are meeting targets. (SLOs are made up of monitored services that, as a group, are used to measure a specific performance goal.)

In the Service Level Agreements subpanel (accessed by clicking the SLAs tab), the following SLA information is provided in the default view:

  • the list of SLAs, and whether any are in a critical or warning-level state
  • headway into the time period during which compliance is measured
  • the percentage of allowable downtime used, after which the SLA’s status becomes critical

SLA Status Indicators

The color coding used in the Service Level Agreements subpanel indicates, at a glance, whether the SLAs’ respective limits are in danger of or have already been exceeded:

The Downtime progress bar allows you to gauge how close the SLA is to reaching a critical state:

  • an SLA whose allowable downtime exceeds 100% reaches a critical state, is highlighted with red, and is accompanied by the critical state icon ()
  • an SLA whose allowable downtime, at the current rate of use, will be depleted before the compliance period has ended enters a warning-level state, is highlighted with yellow, and is accompanied by the warning state icon ()
  • an SLA whose graphed allowable downtime does not exceed the graphed progress through the compliance period is in a compliant state

Note that once an SLA reaches a critical state, it will remain in that state until the compliance period has restarted the following week or month; an SLA that enters a warning-level state can be downgraded to a normal state if the rate at which allowable downtime is used decreases to a “safer” value.

Generating an SLA Detailed Report

Clicking an SLA’s corresponding Detailed Report button instantly generates an SLA Detailed report for the last 24 hours.

See Reports for Service Level Agreements for more information.

SLA View Types

The Service Level Agreements subpanel provides two types of views: Condensed View and Detailed View. The latter view is suitable if you have one or two defined SLAs.

Condensed View

The Condensed View is the default view of this subpanel and displays the following information:

  • the name of the SLA
  • a status breakdown of the SLA for the current time period:
  • time period elapsed
  • available downtime used for the current time period
  • how close the SLA is to its performance target
  • status message
Detailed View

Click the Show Detailed View button to expand each SLA to include SLOs.

An SLA’s compliance is based on the downtime of its component SLOs: when one or more of the SLOs experience downtime, it counts towards overall SLA non-compliance.

Clicking an SLO name displays the status of the SLO, and all of the services that make up the SLO:

Using the Detailed View allows you to pinpoint which SLO is causing SLA non-compliance, and in turn which monitors are causing the SLO to experience downtime.

For more information about viewing SLA details, and defining SLOs that help you accurately gauge the performance of your IT infrastructure, see Working with Service Level Agreements.

Viewing All Applications

The Applications provide dashboard displays the overall status for one or more services that up.time monitors. Applications group services, such as ping checks and checks for the status of the up.time agents that are installed on a system. An Application can contain many services, and enable you to better analyze component outages versus true Application outages.

An Application consists of:

status of all Applications that the currently logged in user is permitted to view. Applications are groups of service monitors that together represent the status of a greater business or IT function within your organization. They consist of any number of two types of monitors: 

  • master service monitor:
  • master service monitors: one or more monitors can be
  • used to determine the status of the Application as a whole
  • regular service
  • monitors: other service monitors that are associated with a master service monitor, but are not used to determine the status of the Application as a whole
  • monitor: does not affect Application status, but helps provide a complete picture by representing a secondary service, or a direct dependency of a master service

On the dashboard, the overall The status of each Application is color coded:

  • Applications
  • highlighted in green are
  • functioning normally are marked with a check box icon (Image Added)
  • Applications undergoing maintenance are marked with a scheduled maintenance icon (Image Added)
  • Applications in a warning state are highlighted in yellow
  • are in
  • and marked with a warning
  • state

The color coding also indicates whether an Application is offline or is in scheduled maintenance:

  • icon
  • Applications in a critical state are an Application that is offline is highlighted in red and marked by the offline icon, and a message indicating that the Application is offline appears in the Applications subpanel
  • an Application that is in scheduled maintenance is grayed out, the message System is in scheduled maintenance is displayed in the Applications subpanel, and the Application is marked with the scheduled maintenance icon (Image Removed)

The Applications subpanel displays the status of each Application that you have added to up.time.

  • with a critical icon

The number and status of an Application's service monitors are also displayed through a series of bars: green indicates the service is functioning normally; yellow means the service is in a warning state; and red indicates the service is in a critical state. This information can be displayed in a default Condensed View, or a This subpanel has two views: Condensed View and Detailed View.

Condensed View

The Condensed view is the default view for this subpanel the dashboard and displays the following information:

  • the name of the Application
  • a description of the Application, if one was added when the Application was defined
  • the status of each service in the Application

The status of the service is denoted by a colored bar in the Status of Master Services and Status of Regular Services columns. For example, if there are three services associated with the Application and their status is OK then three green bars appear in this column.

Detailed View

Image Removed

Click the Show Detailed View button to change to

the Detailed

a detailed view of the dashboard.

Image Added

In this view, more information is provided for each of an Application's component service monitors, including the monitor's most current status message. Service monitor names can be clicked, taking you to the service status page for the host to which it is attached.

View Applications subpanel.The name of the master Application group is listed (AIX in the image above) along with the names of the individual service monitors (e.g., FSCap). Master service monitors in an Application are marked with an asterisk (*). The status of a service is denoted by a colored bar beside the name of the service: green indicates the service is functioning normally; yellow means the service is in a warning state; and red indicates the service is in a critical state.The name of each Application is a hyperlink. Click a link to view detailed information about an Application.

Viewing the Network Dashboard

The network dashboard is a summary of network device performance, and network-based service monitor outages. It is automatically updated every 30 seconds. You can view this dashboard by clicking the Network tab.

The network dashboard provides you with a single view of your network environment, and keeps you abreast of any network-related issues:

  • instantly spot network capacity issues, and compare trends over the past day
  • pinpoint top resource consumers to help resolve performance bottlenecks before they cause an outage
  • immediately spot network devices that are currently failing, and click through to investigate the root cause

The following metrics are together used to report network performance:

In Usage

global inbound bandwidth usage of all monitored network devices’ ports

Out Usage

global outbound bandwidth usage of all monitored network devices’ ports

Latency

network device latency values collected through each monitored Element’s ping monitor

does not include network devices without an assigned ping monitor

Errors

the average number of errors per second through all monitored network device ports

Discards

the average number of packets discarded per second through all monitored network device ports

For each category, there is a performance gauge that displays the average for all monitored ports based on the most recent sample. Maximum and minimum values over the last 24 hours is also shown. Note that the first two metrics are percentage-based, and their gauges are static; the remaining three metrics' gauges may scale if the recorded value exceeds the default upper limit. The thresholds for the service status indicators are typically 70% for a warning state, and 90% for a critical state.

Additionally for each category, there are top-10 lists displaying the individual Elements that are using the most bandwidth, have the highest latency, or seeing the most errors or discarded packets. Clicking any Element name will display its Quick Snapshot page, where you can further investigate bottlenecks.

Devices with Service Outages

Any network device Element whose attached service monitors are experiencing outages are displayed in this section. For convenience, you can sort the results by number of current outages, or outage occurrences. 

Info

If the dashboard is being viewed by an up.time user who does not have permission to view all Elements, and as a result, may not be able to see network device Elements, the list will be empty, and up.time will report that there are not any available network devices.

Viewing All Elements

Elements are the systems, network devices, Applications, and SLAs that up.time is currently monitoring. You can view the status of all monitored Elements on the All Elements dashboard. This can be accessed by clicking the All Elements tab.

The Elements subpanel lists the following information:

  • the names of the Elements in your environment
  • the status of the services that are assigned to each Element
  • the number of outages over the last hour, 12 hours, and 24 hours
  • the percentage of CPU resources being consumed by users, the system, and by disk I/O
  • the percentage of the system disk that is being used and the percentage that is busy
  • the amount of memory swap space that is being used

For convenience, you can sort the Elements table by any outage type.

If up.time cannot contact an Element, the reason will be displayed:

  • The availability check has failed.
  • This element is in maintenance.
  • This VM is currently powered off.

The values in each column are hyperlinks. Click one of the links to display the following information in the system information or graphing subpanels:

  • Click any value in the OK, WARN, CRIT, MAINT, or UNKNOWN columns to open the Element's Status subpanel.
  • Click any value in the Outages column to open the Element's Outages subpanel.
  • Click any value in the USR, SYS, WIO, or TOT columns to open the Element's Usage% Busy report subpanel. For more information, see Usage (% busy) for more information.
  • Click any value in the % Used column to open the Element's File System Capacity report subpanel. See File System Capacity Graph for more information.
  • Click any value in the % Busy column to open the Element's Disk Performance Statistics report subpanel. See Disk Performance Statistics Graph for more information.
Info

By default, Elements in WARN or CRIT states, in maintenance, or whose status is undetermined are displayed. To display all Elements regardless of status, unselect the Hide OK Elements check box.

 

...