The My Infrastructure panel is your starting point for monitoring the systems in your environment. From the My Infrastructure panel, you can add:
Elements are the systems or network devices that you will monitor using up.time. You can add the following types of Elements:
Element Type | Description | |
---|---|---|
Agent | A system that has an up.time agent installed on it. | |
Net-SNMP v2 or Net-SNMP v3 | These are servers that use version 2 or 3 of the Net-SNMP protocol to monitor and manage systems in a TCP/IP-based network. Net-SNMP version 3 adds security features that are lacking in Net-SNMP version 2. All of the data gathered from Net-SNMP is based on the following MIB implementations:
| |
Network Device | An agentless, SNMP-based switch or router whose performance and configuration data is retrieved by focusing on specific OID values. | |
Novell NRM | A system that is running version 6.5 of Novell Remote Manager (NRM), a Web-based interface to newer Novell NetWare servers. Novell NRM saves server statistics in an XML file. up.time can retrieve the XML file, parse it, and then store the information in the DataStore. | |
pSeries LPAR Server (VIO) | A pSeries server that is hosting multiple logical partitions (LPARs). The VIO (virtual input/output) handles the physical I/O requests from the LPARs that are on the server. In this configuration, up.time directly polls the agents installed on the VIO and LPARs on a pSeries server for workload and other data. | |
pSeries LPAR Server (HMC) | A pSeries server that is hosting multiple LPARs, and is a managed server under the supervision of an HMC (Hardware Management Console). We recommend adding pSeries servers that are managed by an HMC using the Auto Discovery process. See Auto Discovery for HMC-Managed pSeries Servers for more information. | |
Virtual Node | An agentless device that up.time can communicate with using an IP address. | |
VMware ESX | A system that is running the VMware ESX server software, which enables a single host to run multiple virtual servers and their applications. ESX includes features like the ability to balance the computing loads of a group of virtual servers as well as backup data and better manage clusters. You do not need to install an agent on an ESX server.
| |
VMware vCenter Server | A central control point for a VMware vSphere datacenter that includes ESX hosts, VMs, as well as groupings such as clusters, datacenters, vApps, and resource pools. A VMware vCenter server's inventory, system configurations, storage profiles, and performance data can be represented in up.time alongside physical systems and network devices. When a VMware vCenter is added, its resources are detected and automatically imported. | |
WMI Agentless | A Windows-based system whose metrics collection is managed by WMI (Windows Management Instrumentation), and does not have an up.time Agent installed on it.
|
You can add multiple systems to up.time in a batch operation using a text file and a command line utility. See Adding Multiple Systems for more information.
To add systems or network devices, do the following:
In the My Infrastructure panel, click Add System/Network Device.
Toronto Mail Server
to a system with the host name 10.1.1.6
. This way, IP addresses are stored in up.time, but a more descriptive or meaningful name is displayed in the up.time Web interface.Agent
Configure the following:
The up.time Agent’s information can be globally configured in the Global Element Settings page on the Config tab. If this has been done, and the Use up.time Agent Global Configuration check box is selected, the agent port and SSL options will not appear. |
You can set both an authentication and password type, only one of them, or neither. |
SNMP details can be globally configured in the Global Element Settings page on the Config tab. If these have been done, and the Use Global SNMP Connection Configuration check box is selected, none of these options will appear, or need to be configured. |
You can set both an authentication and password type, only one of them, or neither. |
Is Node Pingable?
This option specifies whether up.time can contact the network device using the ping utility.
There are scenarios in which you might not want the network device to be pingable (e.g., you have a firewall in place). Before selecting this check box, you should try to contact the network device using the ping utility. If you cannot ping it, ensure the check box is left cleared. Then, change the default host check for the network device. See Changing Host Checks for more information.
SNMP details can be globally configured in the Global Element Settings page on the Config tab. If these have been done, and the Use Global SNMP Connection Configuration check box is selected, none of these options will appear, or need to be configured. |
pSeries LPAR Server (HMC)
If you are adding a pSeries server that is managed by a Hardware Management Console, complete the following fields:
Although you can manually add HMC-managed pSeries servers, we recommend using the Auto Discovery process as this will add all the pSeries servers managed by the HMC, and automatically populate their respective managed server names. See Auto Discovery for HMC-Managed pSeries Servers for more information. |
Managed Server
The HMC's unique identifier for the pSeries server. This information can be retrieved from the HMC itself (e.g., by running lssyscfg -r sys -F name
).
For HMC-managed pSeries servers, the above two fields are used in conjunction with the |
Information for the up.time Agent on the Virtual I/O server can be globally configured in the Global Element Settings page on the Config tab. If this has been done, and the Use up.time Agent Global Configuration check box is selected, the agent port and SSL options will not appear. |
Password
The password for the account with access to WMI on the windows domain.
WMI information can be globally configured in the Global Element Settings page on the Config tab. If these have been done, and the Use WMI Global Credentials check box is selected, none of these options will appear, or need to be configured. |
Ideally, VMware instance monitoring is performed by adding an entire VMware vCenter server and allowing up.time's auto-discovery process to add all of its inventory (including ESX servers and VMware instances). However, as a legacy option, you can manually add an ESX server to up.time's monitored inventory, and then manually add VMware instances. |
To add VMware instances to up.time from an ESX Server that was manually added as an up.time Element, do the following:
Click the Add to up.time button for the instance you wish to add.
The Add System window appears.
The Add to up.time button will not be visible if the VMware instance is not powered on. |
Simple Network Management Protocol (SNMP) is a widely-used protocol that monitors the health of computer and network equipment. The SNMP Poller enables you to query SNMP devices or systems for a given object identifier (OID) of an SNMP Management Information Base (MIB). You can use the monitor to translate or clean up the returned response, then set thresholds for them.
SNMP works on the basis that network management systems send out a request, and managed devices send a response. SNMP messages consist of a header and a PDU (protocol data units). The headers consist of the SNMP version number and the community name; the community name is used as a form of security. Requests and responses between network management systems and devices is implemented using one of four operations: Get, GetNext, Set, and Trap.
A MIB is a collection of hierarchically organized definitions, accessed using SNMP. All of the manageable features of all managed devices from different vendors are arranged in this tree. MIB definitions describe the properties of objects within a managed device, and OIDs uniquely identify managed objects in a MIB hierarchy.
Managed objects can exist in either scalar or tabular form. Scalar objects define a single object instance, identified by its “ .0 ”; tabular objects define multiple related object instances grouped in MIB tables, and is identified by its index value.
The MIB hierarchy can be depicted as a tree. Each vendor of SNMP equipment has an exclusive section of the MIB tree structure under their control. Vendors define private branches including managed objects for their own products. Each branch of the MIB tree has a number and name, and a point on the tree is named according to its complete path from the top of the tree (for example, .1.3.6.1.2.1.1.1.0 .). Nodes near the top of the tree are very general, whereas each ending node represents a particular feature on a specific device.
The up.time SNMP monitor also supports Net-SNMP, which is a suite of command line and graphical applications that do the following:
To take advantage of the Net-SNMP features, you must:
The up.time SNMP monitor works with the following versions of SNMP:
The second implementation of the SNMP protocol, which contains additional protocol operations as well as improved security and data authentication.
The latest implementation of the SNMP protocol, which adds security and privacy features that are missing in versions 1 and 2 of the protocol.
See SNMP Poller and Network Device Port Monitor for more information.
After you have added pSeries servers - whether managed by an HMC or not - to up.time , you can add individual LPARs from those systems to up.time . While up.time collects workload data from all LPARs on a pSeries server (whether they have been added to up.time or not), adding LPARs can help you keep track of any specific LPAR.
To add an LPAR to up.time , do the following:
It can take up to 15 minutes for the Monitoring Station to retrieve enough samples to provide historical graphing data to the Monitoring Station. |
If the Windows-based component of your infrastructure already makes use of WMI (Windows Management Instrumentation), Windows Elements can be configured to use it for data collection as an alternative to the up.time Agent. Using WMI allows you to avoid the overhead associated with managing and updating all of the systems on which an up.time Agent has been installed.
WMI-based monitoring can only be performed if the Monitoring Station itself is running on Windows. |
An Element can be set to use WMI through the following methods:
Globally defined WMI credentials can be used for the second and third method. In the latter’s case, configuring these is mandatory. Refer to Configuring Global WMI Credentials for more information.
Regardless of which method is used, when changing a Windows Element’s data collection method, all historical data is retained.
In order to monitor agentless systems through WMI in a secure environment (e.g., through a firewall), you need to create an exception for WMI on the host end. Consult the Microsoft documentation or developer resources for information on connecting to WMI on a remote computer.
To add an agentless WMI system to up.time , do the following:
To change the data collection source for an individual Windows Element from the up.time Agent to WMI, do the following:
To change the data collection source for an individual Windows Element from WMI to the up.time Agent, do the following:
To change multiple agent-based Elements to use WMI for data collection, do the following
To change multiple WMI Elements to use the up.time Agent for data collection, do the following
For bulk WMI-to-agent conversions, the port used by all of the converted up.time Agents must match the port specified in the global agent configuration. |
up.time collects performance metrics and availability information from version 6.5 of the Novell Remote Manager (NRM) using HTTP or HTTPS. up.time extracts performance information from the NRM by reading and parsing XML files.
To add a Novell NRM version 6.5 system to up.time , do the following:
Password
The NRM administrator password. This field is mandatory.
The password is encrypted and stored in the up.time DataStore. |
up.time captures the following Novell NRM system (version 6.5) statistics:
Each statistic returns one of the following statuses:
The statistic is well within the threshold suspect value.
The statistic is between the threshold good and critical values.
The statistic is greater than the threshold critical value.
This statistic enables you to view how processes share the CPU. The response time is the amount of time that a Work To Do process requires to run.
If this statistic returns a value of Suspect, you can check the running threads to determine why there is a delay in the Work To Do threads. If the value is Bad, thread is probably running more than it should or it is hung. You should identify the parent NetWare Loadable Module and then unload and reload it if possible.
This statistic enables you to view, as a graph, how the service processes are allocated on your server.
If the service processes are approaching the maximum, increase the value of the Maximum Server Processes Set parameter. If you have only a few available server processes, increase the Minimum Server Processes Set parameter.
If the status is Bad, examine your server by doing the following:
In Novell NRM, click Profiling / Debugging.
This statistic enables you to view the number of available processes on your server as a graph. The graph charts the processes that are available every five seconds over a 50 second period.
If the status is Suspect or Bad, you should increase the Set parameters for Maximum Server Processes and the Minimum Server Processes settings. If the number of available server processes has not reached the maximum and is not increasing, you should add memory to your server.
This statistic enables you to view the threads that have ended abnormally (abended) and are suspended. This statistic returns the following statuses:
If the status is Suspect or a Bad, your server has abended and has recovered automatically by suspending the offending thread while leaving the rest of the server processes running. As a result, some of the server's functions were compromised. You must determine which module, driver, or hardware the abended threads belong to, and then take the appropriate action.
This statistic enables you view, as a graph, how busy any given CPU is. up.time tracks usage on a per CPU basis, collecting data every 30 seconds. The graph displays a 10 second history.
If the status is Suspect or Bad, determine which thread or module is causing the most CPU cycles and take appropriate action, including the following:
To determine which thread or module is using the most CPU cycles, do the following:
up.time monitors connections on a per-server basis. NRM displays only the following metrics:
This statistic enables you to view the amount of memory that is not allocated to any service. Most, if not all, of this memory is used by the file system cache. When available memory gets too low, modules might not be able to load or file system access might become sluggish.
This statistic enables you view the number of server threads that Novell eDirectory uses. The server thread limit ensures that threads are available for other functions as needed - for example, when large number of users log in at the same time.
eDirectory uses multiple server threads. However, its thread requirements should not cause poor performance because eDirectory cannot use more than its allocated maximum number of threads.
If this statistic returns a Good status, eDirectory is using less than 25% of the available server threads. If it returns a Suspect status, eDirectory is using between 25% and 50% of the available server threads. If the status is Bad, eDirectory is using more than 50% of the available server threads.
This statistic enables you to view the status of Packet Receive Buffers for the server. Packet Receive Buffers transmit and receive packets. You can set the maximum or minimum number of buffers to allocate using the Maximum Packet Receive Buffers or Minimum Packet Receive Buffers SET parameters. The minimum number of buffers is the number of packets that are allocated at when the system is initialized.
If the number of Packet Receive Buffers is increasing, the system will be sluggish. If the number of Packet Receive Buffers reaches the maximum, and no Event Control Blocks (ECBs) are available, the server will become very sluggish and will not recover.
This statistic enables you to view the status of available Event Control Blocks (ECBs). Available ECBs are Packet Receive Buffers that have been created but which are not currently being used.
If the available ECB count is zero, the server will become sluggish until enough ECBs are created to fill the demand. The server will recover as long as the number of Packet Receive Buffers does not increase to the maximum that can be allocated.
This statistic shows whether or not your server can transmit and receive packets. If this statistic returns a Good status, the server is able to accept or transmit packets through the network board. If the status is Bad, the network board is not transmitting or receiving packets.
All servers should be able to transmit or receive packets. If your server is not transmitting, your LAN is not functioning properly. Check the drivers and protocol bindings for the network board on the server. If the drivers and protocol bindings are functioning properly, then the network board is probably faulty. If the network board is functioning, you should perform a diagnostic on your LAN.
This statistic enables you to view the status of the available disk space on all mounted volumes on a server. This statistic returns the following statuses:
This statistic enables you to view the status of amount of the data that is being read from and written to the storage media on this server.
If this statistic returns a Good status, then the storage system is experiencing reads or writes, and there are no pending disk I/Os. If the status is Suspect, the storage system has disk I/Os pending, no reads or writes have occurred, and less than four samples have been taken. If the status is Bad, the storage system has disk I/Os pending, no reads or writes have occurred, and four or more samples have been taken.
It can be time consuming to add large numbers of systems and network devices to up.time individually through the Web interface. You can, however, add multiple systems to up.time using a text file and the addsystem
command line tool.
A text-based "hosts file" can contain entries that mirror the fields in up.time's Add System window; these fields provide profile and connection information about the system or network device. The hosts file format is as follows:
%%
), and is also on its own separate line.The following table explains the properties you can include in a hosts file to describe Elements. The properties required to add a system or network device depends on the type of Element it will be. For example, to add an Agent-based system, you only need to provide information for Host Name
, Type
, and Port
. (For more information, see Working with Elements for a summary of Element types, and Adding Systems or Network Devices for configuration information by Element type.)
Element Property | Description | Required / Optional |
---|---|---|
| The name or IP address of the Element (i.e., system or network device) that you are adding to up.time. | required |
| The Element name as it will appear in the up.time Web interface. There are some views, such as My Infrastructure, that will show the host name alongside the display name. | required, but can be identical to the Host Name value |
| A short description of the Element. This field is optional. | optional |
| The type of Element, which can be one of the following:
| required |
| The name of the up.time service group to which you want to add the Element. Service groups allow you to group multiple service monitors and simultaneously apply them to multiple Elements. (See Service Groups for more information.) | optional |
| The port on which up.time will connect to the Element. When a port is required, up.time uses a default whose value depends on the type of Element (e.g., network devices will default to an SNMP port of 161). | required for these Element types:
|
| If the Element is a network device or a server using version 2 of the Net-SNMP protocol, you will need to specify the read community, which acts like a user ID or password, in order to access the system or device. Use one of the following options:
| required for network devices or servers using version 2 of the Net-SNMP protocol |
| The name or IP address of the Hardware Management Console (HMC) that is being used to manage one or more pSeries servers in your infrastructure. For pSeries servers, this field is used in conjunction with the | required for pSeries servers managed by an HMC |
| The unique identifier for a pSeries server that is managed by an HMC. This managed server name can be retrieved from the HMC itself (e.g., by running For pSeries servers, this field is used in conjunction with the | required for pSeries servers managed by an HMC |
| The username required to access the Element. | required for the following Element types:
|
| The password required to access the Element. | required for the following Element types:
|
| The name of the up.time infrastructure group to which you want to add the Element. Infrastructure groups help you organize all of your monitored systems and network devices. (See Working with Groups for more information.) | optional |
| For some types of servers, you can specify whether up.time will securely communicate with an installed Agent using SSL. Valid options are | optional for the following Element types:
|
| If the Element is a network device or server using version 3 of the Net-SNMP protocol, you will need to specify an authentication method to determine how encrypted information traveling between the Net-SNMP instance and up.time will be authenticated. Use one of the following options:
| required for network devices or servers using version 3 of the Net-SNMP protocol |
| If the Element is a network device or server using version 3 of the Net-SNMP protocol, you will need to specify the password that will be used to encrypt information traveling between the Net-SNMP instance and up.time. | required for network devices or servers using version 3 of the Net-SNMP protocol |
| If the Element is a network device or server using version 3 of the Net-SNMP protocol, you will need to specify how information traveling between up.time and the Net-SNMP instance is encrypted. Use one of the following options:
| required for network devices or servers using version 3 of the Net-SNMP protocol |
| For network devices and nodes, use this field to specify whether or not up.time can contact it using the ping utility. Valid options are | optional for network devices and nodes |
| For Windows-based Elements using WMI for data collection, the Windows domain in which WMI has been implemented. | required for WMI Agentless |
| For Windows-based Elements using WMI for data collection, the name of the account with access to WMI on the Windows domain. | required for WMI Agentless |
| For Windows-based Elements using WMI for data collection, the password for the account with access to WMI on the windows domain. | required for WMI Agentless |
The following table contains sample host file entries for different Element types that you can add to up.time
:
Host Type | Sample Hosts File Entry |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The simplest way to create a hosts file is to use a text editor to type the entries in a file.
If you have a large number of systems to add, you can keep a list of all systems and network devices in a spreadsheet. You can then save the list as a text file or a comma-separated values file, then use a script to manipulate these files into the proper format.
To add multiple systems to up.time, do the following:
scripts
folder.C:\Program Files\uptime software\uptime\scripts\
addsystem <path_and_filename>
<path_and_filename>
is the name of, and full path to, the text file that contains the list of systems that you want to add to up.time.If you have deployed up.time UI instances, ensure you always run command-line scripts such as |
After you have added a system to up.time, you might need to change some of the basic information about that system. You can do this by editing the system profile.
To edit a system profile, do the following:
An Application is comprised of one or more service monitors, and is an effective way to monitor and report on business functions that are most accurately represented by multiple services. For example, you can create an Application that monitors a server's Web services, database, and file system capacity.
An Application definition can include as many service monitors as required to fully represent the business function. As part of an Application definition, service monitors can be one of two types:
All master service monitors affect Application status equally, using their respective configured thresholds. You can configure an Application to reach a warning- or critical-level status as a whole using one of the following conditions:
Note that Applications are meant to report as OK, WARN, or CRIT; Application status is not affected by component service monitors that are in an UNKNOWN or MAINT state. (Note, however, that an Application as a whole can be put into temporary maintenance via My Infrastructure.)
Being able to control the total number of service monitors, as well as the number that need to violate thresholds before affecting the Application's collective status allows you to give different Applications varying levels of robustness. As a result, each of your Applications will provide the most accurate status possible, with fewer false positives. For example, a Web server cluster of 10 servers might only cause alerts when three or more of them are down, whereas a mission-critical application will cause an alert when any of its master service monitors fail.
After you have added an Application to up.time, the name of the Application appears in the My Infrastructure panel. The name of the Application is a hyperlink.
You can view detailed information about that Application by clicking the name of the Application, which opens the Application General Information subpanel.
The Application Profile section of the subpanel displays the following information about the Application:
The Application Member Services section of the subpanel contains the following information about the service monitors that are part of the Application:
The Alert Profiles and Action Profiles sections of the subpanel displays which Alert Profiles and Action Profiles have been associated with the Application.
To add an Application, do the following:
To edit an Application, do the following:
In up.time , a service level agreement (SLA) measures your organization’s ability to meet pre-defined performance goals. These goals focus on various aspects of your IT infrastructure, and each can include any number of monitored systems.
From the My Infrastructure panel, you can view your existing SLA details by clicking the SLA name (see Viewing SLA Details for more information).
For information about creating and using SLAs, see Adding and Editing SLA Definitions.
At sites with multiple systems to monitor, searching through a large list of systems is time consuming. To avoid this problem, you can define groups of systems. Groups are sets of systems that have been combined in a meaningful way.
You can group systems by their geographical location or by their function. The name of the group should describe the servers or they way in which they have been grouped. For example, you can create a group called Database Servers that contains all of the database servers in your environment.
You can assign the following to groups:
If you plan to group your systems, you should first map out what groups you need and which systems will be part of those groups. |
To add a group, do the following:
To make this group a subgroup, select the name of the existing group to which it will be subordinate in the Parent Groups list, then click Add.
If this is the first group that you have defined, only My Infrastructure will appear in the dropdown list. |
You can also create nested groups . Nested groups enable you to further group your systems. For example, you can create a parent group called Datacenters, and then add two nested groups called Production and Disaster Recovery.
You can assign the following to nested groups:
Note that you cannot assign a parent group to a subgroup or to any other ancestor.
Before you begin, ensure that you have at least one parent group defined. |
To add a nested group, do the following:
To edit groups, do the following:
To delete a group, click its gear icon, then click Delete, but note that only empty groups can be deleted from the My Infrastructure panel.
Not every user that accesses the Monitoring Station needs to view all Elements that are a part of your infrastructure. Some users may, for example, only need to be interested in five to 10 of the available servers. You can limit the servers that one or more users will see by creating specific views , which are subsets of the servers in your environment. By creating views, it becomes easier for users to not only monitor systems, but to also browse and compare historical data.Views appear in the Views section on the Infrastructure panel, as well as the Global Scan dashboard.
To add a view, do the following:
To make this view a child of an existing one, select it from the Parent View dropdown list.
If this is the first group that you have defined, this option will not appear. |
You can also create nested views in order to categorize and better manage a larger set of existing views. The following can be assigned to nested views:
You cannot assign a parent view to a child view or to any other ancestor.
Before you begin, ensure that you have at least one parent view defined. |
To add a nested view, do the following:
To view and edit views, do the following:
If you have administrator privileges, you can delete a Element, or view in the Infrastructure panel.
To remove an Element, Application, or View from up.time, do the following:
You can only delete Elements that were created in up.time. You cannot manually remove Elements that represent VMware vSphere components imported into up.time via vSync. |
When a problem occurs on a system that up.time is monitoring, the Monitoring Station sends alerts: these are notifications about the problem, sent to users who are qualified to receive them. If the user role to which they belong is configured to do so, they can also acknowledge an alert.
When you acknowledge an alert, up.time:
To acknowledge alerts, do the following:
Port
The port on which the NRM is listening. By default, the non-SSL port is 8008, and when SSL is used, the port is 8009.