Still receiving alerts when topological dependency has failed

When the host check for a topological host enters a failure state (i.e. WARN or CRIT), the service monitors for hosts that depend on that element will have their alerts / actions suppressed and enter an UNKN state until the host check on the topological host recovers. However, there are a few scenarios where you may still notice alerts being sent when it appears that the topological host has failed:

The host check for the topological host has not failed all re-checks. If the host check is still in it's re-checking loop and has not started alerting, other monitors will still register outages and potentially send alerts.
When a non-host check monitor runs, it will not force a run of the host check before sending an alert. For example, if your host check runs once every 15 minutes but you have a monitor set up to run once per minute, the once per minute monitor may fail and alert well before the host check has registered the outage.

As a general rule, your topological parent's host check should check as often as the most checked service on any of the child elements and have a re-check interval / max re-checks value that is shorter than the most checked service.

Page tree

Still receiving alerts when topological dependency has failed