The My Infrastructure panel is your starting point for monitoring the systems in your environment. From the My Infrastructure panel, you can add:
Systems are the network devices that you will monitor using up.time . You can add the following types of systems:
A system that has an up.time agent installed on it. In the Global Scan and My Infrastructure panels, agent systems are denoted by this icon:
These are servers that use version 2 or 3 of the Net-SNMP protocol to monitor and manage systems in a TCP/IP-based network. Net-SNMP version 3 adds security features that are lacking in Net-SNMP version 2.
All of the data gathered from Net-SNMP is based on the following MIB implementations:
Presents network interface information.
Presents general system state information.
Presents system performance data.
Note - For information on Net-SNMP, see Understanding the up.time DataStore.
A device without an agent, but with which up.time can communicate using an IP address.
A system that is running version 6.5 of Novell Remote Manager (NRM), a Web-based interface to newer Novell NetWare servers. Novell NRM saves server statistics in an XML file. up.time can retrieve the XML file, parse it, and then store the information in the DataStore.
A pSeries server that is hosting multiple logical partitions (LPARs). The VIO (virtual input/output) handles the physical I/O requests from the LPARs that are on the server. In this configuration, up.time directly polls the agents installed on the VIO and the LPARs on a pSeries server for workload and other data.
You can add multiple systems to up.time in a batch operation using a text file and a command line utility. See Adding Multiple Systems for more information.
In a clustered environment, a device with which up.time can communicate using a floating IP address.
A system that is running version 3 or 4 of the VMware ESX server software, which enables a single host to run multiple virtual servers and their applications. ESX includes features like the ability to balance the computing loads of a group of virtual servers as well as backup data and better manage clusters.
You do not need to install an agent on an ESX server.
A central control point for a VMware vSphere datacenter that includes ESX hosts, VMs, as well as groupings such as clusters, datacenters, vApps, and resource pools. A VMware vCenter server’s inventory, system configurations, storage profiles, and performance data can be represented in up.time alongside physical systems and network devices. When a VMware vCenter is added, its resources are detected and automatically imported.
A Windows-based system whose metrics collection is managed by WMI (Windows Management Instrumentation), and does not have an up.time agent installed on it. Note - WMI-based monitoring only works if the Monitoring Station is running on Windows.
To add systems or network devices, do the following:
In the My Infrastructure panel, click Add System/Network Device.
Note - You can set both the authentication and password types, only one of them, or neither.
Password
The password for the account with access to WMI on the windows domain.
Note that if global WMI credentials have been defined in the Config section, none of these options will appear, or need to be configured.
VMware ESX server software enables a single host to run multiple virtual servers and their applications. up.time can monitor both the server that is running VMware ESX, and VMware instances, which are the virtual servers that are running on the VMware server.
To add VMware instances to up.time , do the following:
A new window containing information about the system appears.
The Add System window appears. Note - The Add to up.time button is not visible if a VMware instance is not on.
Simple Network Management Protocol (SNMP) is a widely-used protocol that monitors the health of computer and network equipment. The SNMP Poller enables you to query SNMP devices or systems for a given object identifier (OID) of an SNMP Management Information Base (MIB). You can use the monitor to translate or clean up the returned response, then set thresholds for them.
SNMP works on the basis that network management systems send out a request, and managed devices send a response. SNMP messages consist of a header and a PDU (protocol data units). The headers consist of the SNMP version number and the community name; the community name is used as a form of security. Requests and responses between network management systems and devices is implemented using one of four operations: Get, GetNext, Set, and Trap.
A MIB is a collection of hierarchically organized definitions, accessed using SNMP. All of the manageable features of all managed devices from different vendors are arranged in this tree. MIB definitions describe the properties of objects within a managed device, and OIDs uniquely identify managed objects in a MIB hierarchy.
Managed objects can exist in either scalar or tabular form. Scalar objects define a single object instance, identified by its “ .0 ”; tabular objects define multiple related object instances grouped in MIB tables, and is identified by its index value.
The MIB hierarchy can be depicted as a tree. Each vendor of SNMP equipment has an exclusive section of the MIB tree structure under their control. Vendors define private branches including managed objects for their own products. Each branch of the MIB tree has a number and name, and a point on the tree is named according to its complete path from the top of the tree (for example, .1.3.6.1.2.1.1.1.0 .). Nodes near the top of the tree are very general, whereas each ending node represents a particular feature on a specific device.
The up.time SNMP monitor also supports Net-SNMP, which is a suite of command line and graphical applications that do the following:
To take advantage of the Net-SNMP features, you must:
The up.time SNMP monitor works with the following versions of SNMP:
The second implementation of the SNMP protocol, which contains additional protocol operations as well as improved security and data authentication.
The latest implementation of the SNMP protocol, which adds security and privacy features that are missing in versions 1 and 2 of the protocol.
See SNMP Poller and Network Device Port Monitor for more information.
After you have added pSeries servers - whether managed by an HMC or not - to up.time , you can add individual LPARs from those systems to up.time . While up.time collects workload data from all LPARs on a pSeries server (whether they have been added to up.time or not), adding LPARs can help you keep track of any specific LPAR.
To add an LPAR to up.time , do the following:
Note - It can take up to 15 minutes for the Monitoring Station to retrieve enough samples to provide historical graphing data to the Monitoring Station. |
If the Windows-based component of your infrastructure already makes use of WMI (Windows Management Instrumentation), Windows Elements can be configured to use it for data collection as an alternative to the up.time Agent. Using WMI allows you to avoid the overhead associated with managing and updating all of the systems on which an up.time Agent has been installed.
Note - WMI-based monitoring can only be performed if the Monitoring Station itself is running on Windows.
An Element can be set to use WMI through the following methods:
Globally defined WMI credentials can be used for the second and third method. In the latter’s case, configuring these is mandatory. Refer to Configuring Global WMI Credentials for more information.
Regardless of which method is used, when changing a Windows Element’s data collection method, all historical data is retained.
In order to monitor agentless systems through WMI in a secure environment (e.g., through a firewall), you need to create an exception for WMI on the host end. For example, to allow WMI access through Windows Firewall, refer to the following MSDN articles:
To add an agentless WMI system to up.time , do the following:
To change the data collection source for an individual Windows Element from the up.time Agent to WMI, do the following:
To change the data collection source for an individual Windows Element from WMI to the up.time Agent, do the following:
To change multiple agent-based Elements to use WMI for data collection, do the following
To change multiple WMI Elements to use the up.time Agent for data collection, do the following
Note - For bulk WMI-to-agent conversions, th port used by all of the converted up.time Agents must match the port specified in the global agent configuration. |
up.time collects performance metrics and availability information from version 6.5 of the Novell Remote Manager (NRM) using HTTP or HTTPS. up.time extracts performance information from the NRM by reading and parsing XML files.
To add a Novell NRM version 6.5 system to up.time , do the following:
up.time captures the following Novell NRM system (version 6.5) statistics:
Each statistic returns one of the following statuses:
The statistic is well within the threshold suspect value.
The statistic is between the threshold good and critical values.
The statistic is greater than the threshold critical value.
This statistic enables you to view how processes share the CPU. The response time is the amount of time that a Work To Do process requires to run.
If this statistic returns a value of Suspect, you can check the running threads to determine why there is a delay in the Work To Do threads. If the value is Bad, thread is probably running more than it should or it is hung. You should identify the parent NetWare Loadable Module and then unload and reload it if possible.
This statistic enables you to view, as a graph, how the service processes are allocated on your server.
If the service processes are approaching the maximum, increase the value of the Maximum Server Processes Set parameter. If you have only a few available server processes, increase the Minimum Server Processes Set parameter.
If the status is Bad, examine your server by doing the following:
In Novell NRM, click Profiling / Debugging .
This statistic enables you to view the number of available processes on your server as a graph. The graph charts the processes that are available every five seconds over a 50 second period.
If the status is Suspect or Bad, you should increase the Set parameters for Maximum Server Processes and the Minimum Server Processes settings. If the number of available server processes has not reached the maximum and is not increasing, you should add memory to your server.
This statistic enables you to view the threads that have ended abnormally (abended) and are suspended. This statistic returns the following statuses:
If the status is Suspect or a Bad, your server has abended and has recovered automatically by suspending the offending thread while leaving the rest of the server processes running. As a result, some of the server's functions were compromised. You must determine which module, driver, or hardware the abended threads belong to, and then take the appropriate action.
This statistic enables you view, as a graph, how busy any given CPU is. up.time tracks usage on a per CPU basis, collecting data every 30 seconds. The graph displays a 10 second history.
If the status is Suspect or Bad, determine which thread or module is causing the most CPU cycles and take appropriate action, including the following:
To determine which thread or module is using the most CPU cycles, do the following:
up.time monitors connections on a per-server basis. NRM displays only the following metrics:
This statistic enables you to view the amount of memory that is not allocated to any service. Most, if not all, of this memory is used by the file system cache. When available memory gets too low, modules might not be able to load or file system access might become sluggish.
This statistic enables you view the number of server threads that Novell eDirectory uses. The server thread limit ensures that threads are available for other functions as needed - for example, when large number of users log in at the same time.
eDirectory uses multiple server threads. However, its thread requirements should not cause poor performance because eDirectory cannot use more than its allocated maximum number of threads.
If this statistic returns a Good status, eDirectory is using less than 25% of the available server threads. If it returns a Suspect status, eDirectory is using between 25% and 50% of the available server threads. If the status is Bad, eDirectory is using more than 50% of the available server threads.
This statistic enables you to view the status of Packet Receive Buffers for the server. Packet Receive Buffers transmit and receive packets. You can set the maximum or minimum number of buffers to allocate using the Maximum Packet Receive Buffers or Minimum Packet Receive Buffers SET parameters. The minimum number of buffers is the number of packets that are allocated at when the system is initialized.
If the number of Packet Receive Buffers is increasing, the system will be sluggish. If the number of Packet Receive Buffers reaches the maximum, and no Event Control Blocks (ECBs) are available, the server will become very sluggish and will not recover.
This statistic enables you to view the status of available Event Control Blocks (ECBs). Available ECBs are Packet Receive Buffers that have been created but which are not currently being used.
If the available ECB count is zero, the server will become sluggish until enough ECBs are created to fill the demand. The server will recover as long as the number of Packet Receive Buffers does not increase to the maximum that can be allocated.
This statistic shows whether or not your server can transmit and receive packets. If this statistic returns a Good status, the server is able to accept or transmit packets through the network board. If the status is Bad, the network board is not transmitting or receiving packets.
All servers should be able to transmit or receive packets. If your server is not transmitting, your LAN is not functioning properly. Check the drivers and protocol bindings for the network board on the server. If the drivers and protocol bindings are functioning properly, then the network board is probably faulty. If the network board is functioning, you should perform a diagnostic on your LAN.
This statistic enables you to view the status of the available disk space on all mounted volumes on a server. This statistic returns the following statuses:
This statistic enables you to view the status of amount of the data that is being read from and written to the storage media on this server.
If this statistic returns a Good status, then the storage system is experiencing reads or writes, and there are no pending disk I/Os. If the status is Suspect, the storage system has disk I/Os pending, no reads or writes have occurred, and less than four samples have been taken. If the status is Bad, the storage system has disk I/Os pending, no reads or writes have occurred, and four or more samples have been taken.
It can be time consuming to add large numbers of systems to up.time using the Web interface. You can, however, add multiple systems to up.time using the addsystem command line tool and a text file.
A text file, called a hosts file , contains entries which mirror the fields in the Add System window of the up.time Web interface. These fields contain information about the systems that you want to add.
You can find examples of entries in a hosts file in the section Examples of Hosts File Entries.
There are a number of ways in which you can create a hosts file. The simplest way is to use a text editor to type the entries in a file. If you have a large number of systems to add, you can copy and paste an entry, and modify the fields as needed.
If you keep a list of all the systems in your environment in a spreadsheet, you can save the list as a text file or a comma separated values ( .csv ) file. Then, you can write a script that can manipulate the text or .csv file into the proper format.
The following table explains the fields that you can include in the hosts file.The fields that are needed to add a system will vary depending on the type of system that you want to add. For example, to add an agent system you only need to include the Host Name, Type, and Port fields. See Working with Systems for more information.
Field | Description |
---|---|
Host Name | The name or the IP address of the system that you want to add to up.time . |
Display Name | The name for the system that will appear in the up.time Web interface. |
Description | A short description of the system. This field is optional. |
Type | The type of system, which can be one of the following: Agent Node Novell NRM Net-SNMP v2 Net-SNMP v3 pSeries LPAR Server (HMC) Virtual Node WMI Agentless |
Service Group | The name of the up.time service group - which enables you to simultaneously apply common service checks to hosts that you are monitoring - to which you want to add the system.
This field is optional. |
Port | The number of the port on which you will be connecting to the system. Leave this field blank to use the default port for the type of system that you are adding. |
Community | If you are adding a Net-SNMP system to up.time , specify the read community (which acts like a user ID or password) that gives you access to the system. Valid options are: public , which enables you to retrieve read-only information. private , which enables you to access all information |
HMC Hostname | The name or the IP address of the Hardware Management Console (HMC) that is being used to manage one or more pSeries LPAR servers in your environment. |
Managed Server | The unique identifier of a pSeries LPAR server that is managed by an HMC. |
Username | If you are adding a Net-SNMP or Novell NRM system to up.time , specify the user name required to access the system. |
Password | If you are adding a Net-SNMP or Novell NRM system to up.time , specify the password required to access the system. |
Group | The name of the Element group - a set of systems that have been combined in a meaningful way - to which you want to add this system.
This field is optional. |
SSL | For agent systems, use this field to determine whether or not up.time will securely communicate with an agent installed on the system using SSL. Valid options are true and false .
This field is optional. |
Authentication Method | For Net-SNMP systems, use this field to determine how encrypted information travelling between the Net-SNMP instance and up.time will be authenticated. Valid options are: MD5 , a widely-used method for creating digital signatures. SHA , a secure method of creating digital signatures. |
Privacy Password | For Net-SNMP systems, the password that will be used to encrypt information travelling between the Net-SNMP instance and up.time . |
Privacy Type | For Net-SNMP systems, how information travelling between up.time and the Net-SNMP instance is encrypted. Valid options are: DES , an older method used to encrypt information. AES , the successor to DES, which is used with a variety of software including SSL servers. |
Pingable | For nodes, use this field to specify whether or not up.time can contact the node using the ping utility. Valid options are true and false . |
WMI Domain | The Windows domain in which WMI has been implemented. |
WMI Username | The name of the account with access to WMI on the Windows domain. |
WMI Password | The password for the account with access to WMI on the windows domain. |
To add multiple systems to up.time , do the following:
The following table contains sample host file entries for each type of system that you can add to up.time :
Host Type | Sample Hosts File Entry |
---|---|
Agent | Host Name: prod-mainSystem |
Node | Host Name: www.myDomain.ca |
Novell NRM | Host Name: novell01 |
Net-SNMP v2 | Host Name: gateway.mydomain.com |
Net-SNMP v3 | Host Name: SNMP-1 |
pSeries LPAR | Host Name: 10.1.2.42 |
Virtual Node | Host Name: router-Toronto |
WMI Agentless | Host Name: Win7-Production |
After you have added a system to up.time , you might need to change some of the basic information about that system. You can do this by editing the system profile.
To edit a system profile, do the following:
An Application provides the overall status for one or more services. You can, for example, add an Application that checks the status of a system’s Web services, database, and file system capacity.
When creating an Application, you must specify the following:
One or more monitors can be used to determine the status of the Application as a whole.
Other service monitors that are associated with a master service monitor, but are not used to determine the status of the Application as a whole.
You can configure an Application to reach a warning- or critical-level state when a specific number, percentage, or all master service monitors enter those states.
This allows you to give different Applications different levels of robustness by assigning more or less “weight” to their respective groups of master service monitors. As a result, each of your Applications will provide the most accurate status possible, and fewer false positives. For example, a web server cluster of 10 servers might only cause alerts when three of them are down, whereas a mission-critical application will cause an alert when all of its master service monitors fail.
For more information on services, see Using Service Monitors.
To add an Application, do the following:
After you have added an Application to up.time , the name of the Application appears in the My Infrastructure panel. The name of the Application is a hyperlink.
You can view detailed information about that Application by clicking the name of the Application, which opens the Application General Information subpanel.
The Application Profile section of the subpanel displays the following information about the Application:
The Application Member Services section of the subpanel contains the following information about the service monitors that are part of the Application:
The Alert Profiles section of the subpanel displays which Alert Profiles have been associated with the Application.
For information about viewing more details about Applications, see .
To edit an Application, do the following:
In up.time , a service level agreement (SLA) measures your organization’s ability to meet pre-defined performance goals. These goals focus on various aspects of your IT infrastructure, and each can include any number of monitored systems.
From the My Infrastructure panel, you can view your existing SLA details by clicking the SLA name (see Viewing SLA Details for more information).
For information about creating and using SLAs, see Adding and Editing SLA Definitions.
At sites with multiple systems to monitor, searching through a large list of systems is time consuming. To avoid this problem, you can define groups of systems. Groups are sets of systems that have been combined in a meaningful way.
You can group systems by their geographical location or by their function. The name of the group should describe the servers or they way in which they have been grouped. For example, you can create a group called Database Servers that contains all of the database servers in your environment.
You can assign the following to groups:
Note - If you plan to group your systems, you should first map out what groups you need and which systems will be part of those groups. |
To add a group, do the following:
You can also create nested groups . Nested groups enable you to further group your systems. For example, you can create a parent group called Datacenters, and then add two nested groups called Production and Disaster Recovery.
You can assign the following to nested groups:
Note that you cannot assign a parent group to a subgroup or to any other ancestor.
Note - Before you begin, ensure that you have at least one parent group defined. |
Adding a Nested Group
To add a nested group, do the following:
To edit groups, do the following:
To delete a group, click its gear icon, then click Delete, but note that only empty groups can be deleted from the My Infrastructure panel.
Not every user that accesses the Monitoring Station needs to view all Elements that are a part of your infrastructure. Some users may, for example, only need to be interested in five to 10 of the available servers. You can limit the servers that one or more users will see by creating specific views , which are subsets of the servers in your environment. By creating views, it becomes easier for users to not only monitor systems, but to also browse and compare historical data.Views appear in the Views section on the Infrastructure panel, as well as the the Global Scan panel.
To add a view, do the following:
You can also create nested views in order to categorize and better manage a larger set of existing views. The following can be assigned to nested views:
You cannot assign a parent view to a child view or to any other ancestor.
Note - Before you begin, ensure that you have at least one parent view defined. |
To add a nested view, do the following:
To view and edit views, do the following:
If you have administrator privileges, you can delete a Element, or view in the Infrastructure panel.
To remove an Element, Application, or View from up.time , do the following:
Note - You can only delete Elements that were created in up.time . You cannot manually remove Elements that represent VMware vSphere components imported into up.time via vSync. |
When a problem occurs on a system that up.time is monitoring, the Monitoring Station sends alerts: these are notifications about the problem, sent to users who are qualified to receive them. If the user role to which they belong is configured to do so, they can also acknowledge an alert.
When you acknowledge an alert, up.time :
To acknowledge alerts, do the following: