Page History

\n

\n \n \n \n \n \n

Related Documentation

\n

Version of up.time affected

\n

Affected Platforms

\n

\n \n \n

This article is part of a series:

\n \n
\n

Part 1 - Creating Custom Service Monitors in up.time

\nPart

Part 2 - Creating Custom Service Monitors with Retained Data Collection

\n

Part 3 - Creating Plug-in Service Monitors in up.time

\n \n

All

\n

All

\n \n \n

...

Article Contents \n

...

Formatting your

\n \n

Formatting \nyour monitoring station script for retained data tracking

\n \n

Changing \nthe the check_temp script for retained data

\n \n

Adding \nCustom Adding Custom Service Monitor with Retained Data to up.time

\n \n

...

Overview \n \n

...

Custom service monitors with retained data tracking expand on \nthe the basic Custom service monitor by allowing you to retain and graph \nhistorical historical trending information returned from your custom script. This \nenables enables you to store up to 10 custom application or business metrics \nper per monitor within up.time just like system performance metrics \nreturned returned by the up.time agent. This article builds on the scripts and \nknowledge knowledge that were developed in a previous \narticlearticle. Take some time to review the previous article before \ncontinuingcontinuing.

...

Example graph produced using a custom \nservice service monitor with retained data.
\n \nImage Removed
\n \n

...

Image Added

...

Formatting your monitoring \nstation station script for retained data tracking \n \n

...

To use a custom script with retained data with up.time, you \nmust must change the output format produced by your script. All of the rules \nfound found in a regular custom service monitor still apply with a few slight \nmodificationsmodifications, as detailed below: \n

...

The monitoring station script must return a single number \n (decimal or integer) per line -- one line for each metric that you want \nto to up.time to retain. You can have a maximum of 10 lines retained. The \nexample example script output shown below retains the values '10.5' and '99' as \ntwo two distinct graphable trends within up.time: \n
> check_temp.sh
\n \n10 10.5
\n \n99
\n \n
\n \n

\n \n

99
The output must be in numeric format; you cannot have text \noutputoutput. This enables the service monitor to perform all threshold \nchecking checking within up.time instead of having your script determine \nthresholds thresholds and return an outage message in text format.

\n \n

The script must exit with a success status (0), unless \nthere there has been a problem when the script is run or if you want to force \na a status for the service monitor.

\n \n

The script must accept the hostname of the agent system as \nthe the first argument. up.time will automatically add this argument to the \narguments arguments passed to your script.

\n \n

In general, you do not need to change to your existing \nagent agent side script or configuration in order for your service monitor to \nretain retain performance metrics.

\n \n

...

Changing the check_temp script \nfor for retained data \n \n

...

Using the check_temp.sh \nscript script as a basis for this example, you can easily change the script to \nfit fit within the context of a custom service monitor with retained data. \nTo To do this, you must make the following changes to the script:

...

\n \n

Remove the logic included within the script to check \nthe temperature the temperature and humidity thresholds.

\n \n

The output that is produced by the script will be changed \nto to output the current temperature level on a single line followed by \nthe the current humidity. Here is an example of how our output and command \nformat format will change. \n
Previous Format - For a \nregular regular custom service monitor.
\n \n> > ./check_temp.sh test-agent 9998 temp 60 80
\n \nWARNING WARNING - temperature is 64.5 on test-agent
\n \n> > ./check_temp.sh test-agent 9998 rh 25 30
\n \nCRITICAL CRITICAL - humidity is 32.8 on test-agent \n \n
\n \n
\n \n
New Format - For a \ncustom custom service monitor with retained performance data.
\n \n> > ./check_temp.sh test-agent 9998
\n \n64 64.5
\n \n32 32.8 \n \n
\n \n

\n \n

...

To produce the output listed above, you must edit the script \nso so that it looks like the following example: \n \n

...

#!/bin/sh
\n \n
\n \n#
# This script takes the following arguments:
\n \n# # check_temp.sh hostname port
\n \n# # Example execution:
\n \n# # ./check_temp.sh my-agent 9998
\n \n
\n \n#
# This script can be placed anywhere on the monitoring station system \nas as long as it is
\n \n# # executable by the uptime user.
\n \n
\n \n#First
#First, collect our arguments
\nAGENT AGENT=$1
\nPORT PORT=$2
\n
\nTMPFILE
TMPFILE=/tmp/$$.temp
\n
\n#
# now use the info above to contact our agent, store the output in a \nfile file for parsing
\n \n`echo `echo -n rexec secretpassword /opt/uptime-agent/my-scripts/show_temp.sh \nmymy-arguments | /usr/local/uptime/bin/netcat $AGENT $PORT > $TMPFILE`

...

Note: If you are using agentcmd instead of netcat, replace netcat with agentcmd in the command above. For example:

...

`echo -n /opt/uptime-agent/my-scripts/show_temp.sh my-arguments | /usr/local/uptime/scripts/agentcmd -p $PORT $AGENT rexec secretpassword > $TMPFILE`

...

For more information on the syntax used with agentcmd, see this Knowledge Base article.

...

# \n# we have the output from the agent. If it is ERR that means there was \na a problem running the script on the agent
\n \n`grep `grep ERR $TMPFILE`
\n \nif if [ $? -eq 0 ]
\nthen then
\necho echo "Could not execute agent side script!"
\n \n# # by exiting with a 2 we are forcing a CRIT service outage
\n \nexit exit 2
\nfi
\n
\n \n# fi

# in this script we don't need to check thresholds or determine which information to check
\n# # we just need to reformat the agent side script output slightly so that only numerical info is displayed
\n# # we do this by trimming off the first word returned on each line from the agent, leaving just the numbers
\n# # and printing that to screen, up.time will handle the rest
\n \n
\nawk
awk '{print $2}' $TMPFILE
\n
\nexit
exit 0 \n

...

Adding Custom Service Monitor \nwith with Retained Data to up.time \n \n

...

Next, add your custom service monitor with retained \nperformance performance metrics to the up.time Web interface using the same process \nthat that you would use to add a standard custom service monitor to up.time. \nThe The Custom with \nRetained Retained Data monitor option is found in the List Other Monitors \nsection section of the Add New \nService Service Instance page.

...

The Custom with Retained Data service monitor \ntemplate template has the following monitor-specific settings:
\n \n
\n \n

...

Option Name

\n

Description

\n \n \n

Example

\n

Script Name

\n

The script name

\nis

is the path to your monitoring station script, this is the script that

\nup

up.time will execute when running this service monitor. Be sure to use

\nthe

the complete path wherever possible and that the path is to a locally

\nmounted

mounted volume. For Windows script paths you must use UNIX style

\ndirectory

directory separators (/ instead of ) and also place double quotes

\naround

around the entire script name

\n \n

UNIX/Linux

\nExample

Example:

\n \n

/usr/local/uptime/check_temp.sh

\n \n
\n \nWindows

Windows Example:

\n \n

"C:/my scripts/check_temp.bat"

\n \n \n \n \n \n

Arguments

\n \n

These are the

\narguments

arguments that you would like up.time to pass into your monitoring

\nstation

station script. No arguments are required but please be aware that

\nup

up.time will automatically include the selected hostname as the first

\nargument

argument to your script.

\n \n

temp 60 80

\n \n \n \n

Variable 1-10 Warning

\n

This is the warning threshold used against the output returned from your monitoring station script. This is a numeric comparison. You must select both a comparison method and a threshold value to enable the warning level threshold

\n \n

Output

\ncontains \n \n

contains: "warning"

\n

Variable 1-10 Critical

\n

This is the

\ncritical

critical threshold used against the output returned from your

\nmonitoring

monitoring station script. This is a numeric comparison. You must

\nselect

select both a comparison method and a threshold value to enable the

\ncritical

critical level threshold

\n

Output

\ncontains

contains: "critical"

\n \n \n \n

Retained Data Tracking

\n

This check box

\ndetermines

determines if up.time will not only check the variable for threshold

\nviolations

violations but will also retain the returned values for graphing at a

\nlater

later time. In most cases you should check this box, without it data

\nfor

for the indicated variable will not be retained for graphing.

\n

N/A

\n \n \n

...

Based on the settings used in the example monitoring station script, configure the monitor with the following setting:

...

Enter a name and description for the monitor.

\n

Select a host from the dropdown menu. Be sure to select the same host that your agent side script is on

\n

In the Script Name field, \nenter enter the path to the custom script on your monitoring station. On Windows systems be sure to use UNIX style / instead of and put quotation marks around your path. For example: "C:/my files/check_temp.bat" \n

\n

In the Arguments field, \nenter enter the arguments for the script. up.time adds the agent name as the first argument automatically so do not include it.

\n

Select contains from the Warning dropdown and enter WARNING as the search text.

\n

Select contains from the Critical drop down and enter CRITICAL as the search text.

\n

Complete the remainder of the monitor template as you would for a normal service monitor.

\n

...

Example monitor configuration

...

The image below illustrates a sample monitor configuration. This service monitor will indicate a WARN or CRIT whenever \nthe the monitoring station custom script returns WARNING \nor or CRITICAL in its output.

...

Image AddedImage Removed

Page tree

Versions Compared

Old Version 1

New Version 2

Key