Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Graphing Processes

Uptime Infrastructure Monitor uses the following graphs to chart the activity of processes on a system.

Number of Processes

This graph charts the number of processes that are currently running on a system. The process count is taken from the system kernel, and can be used to determine process usage trends.

Process Running, Blocked, Waiting

This graph indicates whether there is enough CPU capacity for the processes that are run on a system. If the size of the blocked or waiting queue is disproportionate to the running queue, then either the system does not have enough CPUs or is too I/O bound.

A blocked process signals a disk bottleneck. If the number of blocked processes approaches or exceeds the number of processes in the run queue, you should tune the disk subsystem. Whenever there are any blocked processes, all CPU idle time is treated as wait for I/O time. If database batch jobs are running on the system that is monitored, there are always some blocked processes. However, you can increase the throughput of batch jobs by removing disk bottlenecks.

Process Creation Rate

This graph determines whether there are runaway processes on a system or if a forking-based process (like a Web server) is spawning too many processes over a specified period of time.

Generating a Process Graph

To generate a process graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click one of the following options:
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times
  5. Click Generate Graph.

Graphing TCP Retransmits

The TCP Retransmits graph indicates whether data is transmitted over a network. Using TCP, information is transmitted in pieces called packets. A packet consists of:

A header

Contains transmission information, such as the IP addresses of the sender and receiver, the protocol used, and the packet number.

A payload

Contains the sent data.

A trailer

Contains data that denotes the end of the packet, as well as error correction information.

TCP retransmits indicate that certain network services may not be completing properly because of a high load on a network or a system. A lost packet can indicate network congestion, and requires the sender to reduce the transmission rate and to retransmit the packet. A slower transmission rate combined with retransmitted packets reduces network performance.

Generating a TCP Retransmits Graph

To generate a TCP retransmits graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click TCP Retransmits.
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times.
  5. Click Generate Graph.

...

Workload Graphs

The three workload graphs determine the demand that network and local services are putting on a system. The graphs chart an aggregate amount of performance information for a given user, group, or process.

You can generate the following workload graphs:

Workload -
User
Processes

The demand that network and local services are putting on the a system, based on the IDs of the users who are logged into a systemprocesses that are running.

Workload -
Group
Users

The demand that network and local services are putting on the system, based on the IDs of the user groups that users who are logged into a system.

Workload -
Process Name
User Groups

The demand that network and local services are putting on a the system, based on the processes IDs of the user groups that are runninglogged into a system.

These graphs use the same input criteria, but they return different data.

Each workload graph captures the following metrics:

CPU %

The percentage of CPU time that is taken up by a user, group, or process.

Memory Size

The amount of the page file and virtual memory that is taken up by a user, group, or process.

On Windows systems, Memory Size is called Virtual Bytes.

RSS

The Run Set Size, which is the amount of physical memory used by a user, group, or process. On Windows systems, RSS is called Working Set.

Info

Graphs generated for SNMP agents only chart the memory metric.

 

Generating a Workload Graph

To generate a workload graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click Workload.
  4. Select and apply the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times.
  5. Use the available Quick Graphs or click one of the following options:
    Click
  6. Click whether you want the Top 10 or Specific items included in the graph.
  7. Click one of the following metrics:
  8. Select one or more of the available users, groups, or processes from the list.
    If you are generating a workload graph by processes, (i.e., Workload - Process Name graph), enter a regular expression in the Process Selection Regex field to automatically add matching process names for graphing, and avoid dealing with ungainly lists of system processes.

    Info
    The list of available process varies by server and by operating system.
  9. Click Add.
  10. Click Generate Graph.

Workload Top 10 Graphs

The three Workload top 10 graphs chart the 10 processes that are consuming the most CPU resources. Consumption of CPU resources is tracked via one of the following: a user ID, a group ID, or the name of a process. Workload Top 10 graphs enable you to quickly determine which processes are consuming the most CPU resources over a specified time period.

Each graph uses the same input criteria, but they return different data.

Generating a Workload Top 10 Graph

To generate a Workload Top 10 graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click one of the following options: 
    • Workload Top 10 - User
    • Workload Top 10 - Group
    • Workload Top 10 - Process Name
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times
  5. Click one of the following options:
    • CPU %
    • Memory Size
    • RSS
      Graphs generated for SNMP agents only chart the memory size metric.
  6. Click Generate Graph.

LPAR Workload Graphs

Uptime Infrastructure Monitor can collect workload information from logical partitions (LPARs) that are running on pSeries servers. The following graphs visualize the workload information for all LPARs on a server:

Workload - CPU

The amount of CPU time used by the LPAR.

Workload - Memory

The total amount of memory used by an LPAR.

Workload - Disk

The amount of data that is transferred to and from the disk.

Workload - Network

The amount of data that is transferred over the network interface used by the LPAR.

You can also graph the CPU entitlement of individual LPARs using the CPU Utilization graph. See LPAR CPU Utilization Graphs for more information.

Generating an LPAR Workload Graph

To generate an LPAR Workload graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the pSeries server which is hosting the LPARs whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click one of the following options:
    • Workload - CPU
    • Workload - Memory
    • Workload - Disk
    • Workload - Network
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times
  5. Click Generate Graph.

LPAR CPU Utilization Graphs

Using the CPU Utilization graph, you can better determine the CPU entitlements of the LPARs on a system. The entitlements indicate the amount of CPU power that is assigned to an individual LPAR. For example, an entitlement of 0.5 indicates that an LPAR is assigned half of the processing power of a CPU.

You can use the graphs to give you a clearer view of how much you may need to increase an LPAR’s entitlement. Instead of using trial and error to determine optimum entitlements, you can use actual data to determine accurate entitlements.

To generate an LPAR CPU Utilization graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the pSeries server which is hosting the LPAR whose information you want to graph.
  2. In the tree panel, click the
  3. Graphing
  4. tab.
  5. Under the LPAR Workload heading, click Workload - CPU Utilization.
  6. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times
  7. Select the name of the LPAR whose information you want to graph.

If the message There are no LPARs for this date range is displayed, do one of the following:

  • Click the Update List button.
  • Change the date range.
  • Click Generate Graph.

Network Graphs

Network graphs track the performance and reliability of your computing network. You can generate I/O and Errors graphs. These graphs use the same input criteria, but return different data. 

I/O

The I/O graph charts the average amount of data that is moving in and out of a network interface over a specified time period. Uptime Infrastructure Monitor also identifies bursts of network traffic.

The I/O graph captures the following statistics:

  • In bytes: the number of bytes received over the network interface each second
  • Out bytes: the number of bytes sent by the network interface each second

Errors

The Errors graph charts the number of network interface errors that occur each second. The most common types of errors include collisions in a hubbed environment or the presence of full-duplex handshake errors between a system and a switch. The following communication line problems can also cause network errors:

  • excessive noise
  • cabling problems
  • problems with backbone connections

The Errors graph captures the following statistics:

  • In Errors: A data packet was received but could not be decoded because either the packet's header or trailer was not available.
  • Out Errors: A data packet could not be sent due to problems transmitting the packet or formatting the packet for transmission.
  • Collisions: The simultaneous presence of signals from two nodes on a network. A collision can occur when two nodes start transmitting over a network at the same time. Packets that are involved in a collision are broken into fragments and must be retransmitted.

NetFlow

The NetFlow graphing function transfers you to your Scrutinizer instance.

For network device Elements that are monitored by Scrutinizer, a graph that covers a specified time frame is generated. It shows the monitored node’s bi-directional throughput rates through known ports, which are determined based on use by all known applications.

Generating a Network Graph

To generate network graphs, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click one of the following options:
  4. For I/O and Errors graphs, select the start and end dates and times for which the graph charts data. For NetFlow, select one of the set time frames. (For more information, see Understanding Dates and Times.)
  5. For I/O and Errors graphs, select one or more network interfaces from the Available Interfaces list, and then click Add.
  6. Click Generate Graph.

...

Graphing Network Device Performance

Uptime Infrastructure Monitor allows you to generate graphs to display the performance of the following:

Network Device Port I/O

The I/O graph displays the average amount of data moving in and out of a network device’s ports over a specified time period. This can help you confirm bursts in network traffic, and identify ports that are receiving and transmitting large amounts of data in relation to their maximum throughput.

You can generate top-10-port graphs based on a specific criterion, or focus on a specific port on your network device, and create a graph that includes multiple metrics.

Network I/O Metrics

The following metrics can be used when generating a Network I/O graph for a network device Element:

Total Rate

the combined incoming and outgoing data rates, in Mbps, for the port during the time period

Usage

the percentage of the port’s maximum throughput that was used by inbound and outbound packets, during the time interval

In Rate

the average throughput of inbound packets, in Mbps, during the time interval

In Usage

the percentage of the port’s maximum throughput that was used by inbound packets during the time interval

Out Rate

the average throughput of outbound packets, in Mbps, during the time interval

Out Usage

the percentage of the port’s maximum throughput that was used by outbound packets, during the time interval

 

Graphing Network I/O Rates for a Network Device
To generate a Network I/O graph for a network device, do the following:
  1. Go to the Element’s Quick Snapshot page.
    For example, in the Infrastructure panel, find the network device Element whose network rates you want to graph, click its corresponding gear icon, then click Graph Performance.
  2. In the Network section of the Tree panel, click I/O.
  3. Select the start and end dates and times for which the graph charts data, and click Apply Date and Time.
    For more information, see Understanding Dates and Times
  4. Click one of the Quick Graphs options to display a pre-configured graph in a pop-up window, or skip this step to manually configure a graph.
  5. In the next step, select whether you generate a Top 10 ports graph, or a graph for a Specific port.
    If you select Specific , an Element selection dialog appears, requiring you to select a specific port on the network device.
  6. Select the network metric to include in the graph.
    If you are graphing I/O for a Specific port, you can include multiple metrics in the graph.
  7. Click Generate Graph.
    A pop-up window appears, displaying the network I/O rate graph you have configured.

Network Device Port Errors

The network device Errors graph displays the number of errors or discards that occur each second. The following communication line problems can cause network errors:

  • excessive noise
  • cabling problems
  • problems with backbone connections
Network Error Metrics

The following metrics can be used when generating a Network Error graph:

Errors

the total number of errors per second during the time period

In Errors

the number of packets received, but unable to be decoded, per second, due to a missing header or trailer

Out Errors

the number of packets that were not sent, per second, due to problems transmitting the packet or formatting the packet for transmission

Discards

the total number of packets dropped per second, through the port, during the time period

In Discards

the number of packets inbound through the port that were dropped per second, during the time period

Out Discards

the number of packets outbound through the port that were dropped per second, during the time period

 

Graphing Network Error Rates for a Network Device
To generate a network error graph, do the following:
  1. Go to the Element’s Quick Snapshot page.
    For example, in the Infrastructure panel, find the network device Element whose network rates you want to graph, click its corresponding gear icon, then click Graph Performance.
  2. In the Network section of the Tree panel, click Errors.
  3. Select the start and end dates and times for which the graph charts data, and click Apply Date and Time.
    For more information, see Understanding Dates and Times
  4. Click one of the Quick Graphs options to display a pre-configured graph in a pop-up window, or skip this step to manually configure a graph.
  5. In the next step, select whether you generate a Top 10 ports graph, or a graph for a Specific port.
    If you select Specific , an Element selection dialog appears, requiring you to select a specific port on the network device.
  6. Select the network metric to include in the graph.
    If you are graphing network errors for a Specific port, you can include multiple metrics in the graph.
  7. Click Generate Graph.
    A pop-up window appears, displaying the network error graph you have configured.

Disk Performance Statistics Graph

The Disk Performance Statistics graph charts a set of disk performance metrics returned by utilities - such as perfmon on Windows, and iostat or sar on Solaris - that are running on a system.

Requests can experience delays proportional to the length of the request queue minus the number of spindles on the disks. For optimal performance, this difference should be less than two on average.

Generating a Disk Performance Statistics Graph

To generate a Disk Performance Statistics graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click Disk Performance Statistics .
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times
  5. Select one of the following options:
    • Percent Busy
      The percentage of the disk capacity used.

      Info
      For NFS systems, 100% busy does not indicate that the server itself is saturated, but that the client always has outstanding requests to that server.
    • Average Queue
      The average number of processes that are waiting to access the disk.
      The length of the queue is affected by how busy the system is and the amount of time that each transaction requires to perform a disk operation. A complete transaction must occur before the next transaction can start. Longer disk operations per transaction increases the average length of the queue.
    • Read/Writes
      The number of read/write requests, per second, from or to a disk.
    • Throughput (blks/s)
      The amount of disk traffic, in blocks of 512 bytes, that is flowing to and from a disk each second.
    • Average Wait Time
      The average time, in milliseconds, that a transaction is waiting in a queue. The wait time is directly proportional to the length of the queue.
    • Average Serve Time
      The average time, in milliseconds, required to perform a task.
    • All of the above for one disk
      Uptime Infrastructure Monitor graphs all of the metrics listed above for a single disk.
  6. Select the disks for which you want to collect information from the list.
    If you select multiple disks and selected All of the above for one disk in step 5, then Uptime Infrastructure Monitor only graphs information for the first disk that you selected.
  7. Click Generate Graph.

Top 10 Disks Graph

The Top 10 Disks graph displays the ten busiest disks in your environment as of the last sample that Uptime Infrastructure Monitor has taken. If there are fewer than ten disks on the system, then all of the disks on a system charted in the graph.

Generating a Top 10 Disks Graph

To generate a Top 10 Disks graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click Top 10 Disks .
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times
  5. Select one of the following options:
    • Percent Busy
      The percentage of the disk capacity used.

      Info
      For NFS systems, 100% busy does not indicate that the server itself is saturated, but that the client has outstanding requests to that server.
    • Average Queue
      The average number of processes that are waiting to access the disk.
      The length of the queue is affected by the amount of time that each transaction requires to perform a disk operation. For both sequential and random disk transactions, a complete transaction must occur before the next transaction can begin. Longer disk operations per transactions increase the average length of the queue.
    • Read/Writes
      The number of read/write requests per second from or to a disk.
    • Throughput (blks/s)
      The amount of traffic, in 512 byte blocks, that is flowing to and from a disk.
    • Average Wait Time
      The average time, in milliseconds, that a transaction is waiting in a queue. The wait time is directly proportional to the length of the queue.
    • Average Serve Time
      The average time, in milliseconds, required to perform a task.
  6. Click Generate Graph.

File System Capacity Graph

A File System Capacity graph charts the amount of total and used space, in kilobytes, on a server’s disk. On Windows servers, Uptime Infrastructure Monitor looks at the capacity of the main partition (usually the C:\ drive). On UNIX and Linux servers, Uptime Infrastructure Monitor looks at the individual file systems (for example, /var ,
/export , /usr ) on all the disks on the server.

 

Info
If a single disk system has no partitions, then the file system capacity is the same as the disk capacity.

The File System Capacity graph visualizes the following statistics:

  • Total Size
    The total amount of space available on the system.
  • Space Used
    The amount of space on the file system that is used.

Generating a File System Capacity Graph

To generate a File System Capacity graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click File System Capacity.
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times.
  5. Select one or more file systems from the list.
    If you are generating a graph for a Windows system, you are only able to generate a graph for the C:\ drive.
  6. Click Generate Graph.

VXVM Stats Graph

The VXVM Stats graph charts the amount of data written to or read from a Solaris volume that is managed by the Veritas Volume Manager. Veritas Volume Manager is storage management system that operates between a host’s operating system and its filesystems or database management systems. Veritas Volume Manager enables you to manage disk drives on a system as if they were volumes (logical devices that appear to be physical partitions on a disk).

Depending on the options that you specify, this graph contains the following information:

the number of read and write operations to and from the volume
the number of blocks that were read and written to and from the volume
the amount of time that is required to read data from and write data to the volume

If Veritas Volume Manager is not running on a host, or if Uptime Infrastructure Monitor cannot connect to the volume, an error message informing you that Uptime Infrastructure Monitor cannot detect the Veritas Volume Manager appears in the Graphing subpanel.

In the Info & Rescan panel, verify that the entry Has a Logical Volume Manager? is set to Yes . If it is, then ensure that you can connect to the host from the Monitoring Station.

Generating a VXVM Stats Graph

To generate a VXVM Stats graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click VXVM Stats.
  4. Select the start and end dates and times for which the graph charts data. For more information, see Understanding Dates and Times.
  5. In the Available Disk Groups and Volumes area, select one or more volumes on which to report.
    The disk groups or volumes that appear in this area varies from system to system. You must select at least one disk group or volume.
  6. Select one of the following options:
    • I/O Operations
      The number of times, per second, that data is written to and read from the volume.
    • Block Throughput
      The amount of disk traffic, in blocks of 512 bytes, that is flowing to and from the volume.
    • Average Service Times
      The average amount of time, in milliseconds, that is required for a request to be carried out.
  7. If necessary, uncheck either of the Read or Write checkboxes.
    Depending on the option you chose in step 6, the Read and Write options chart the following information in the graph:
    1. If you selected I/O Operations in step 6, the number of read and write operations to and from the volume.
    2. If you selected Block Throughput in step 6, the number of blocks that were read and written to and from the volume.
    3. If you selected Average Service Times in step 6, the amount of time requires to read and write data to and from the volume.

      Info
      Select only one option if you are comparing more than one volume.
  8. Click Generate Graph.

Novell NRM Graphs

Uptime Infrastructure Monitor can collect data from systems that are running version 6.5 of the Novell Remote Manager (NRM). Uptime Infrastructure Monitor retrieves NRM service metrics and then stores this information in the DataStore. Using the data that is collected from NRM, you can generate graphs for the following metrics:

  • Available Memory
    The amount of memory that is not allocated to any service.
  • DS Thread Usage
    The number of server threads that Novell eDirectory uses. The server thread limit ensures that server threads are available for other functions as needed.
  • Work To Do Response Time
    The amount of time that a Work To Do process requires to run from the time a process is scheduled.
  • Allocated Server Processes
    How the service processes are allocated on the NRM system.
  • Available Server Processes
    The number of available processes on the NRM system.
  • Abended Thread Count
    The number of threads that have abended (ended abnormally) and that are suspended because of abended recovery.
  • Packet Receive Buffers
    The status of Packet Receive Buffers (which transmit and receive packets) for the NRM system.
  • Available ECBs
    The status of available Event Control Blocks (ECBs), which are Packet Receive Buffers that were created but which are not currently used.
  • LAN Traffic
    Whether the NRM system can transmit and receive packets.
  • Available Disk Space
    The status of the available disk space on a server.
  • Disk Throughput
    The status of amount of the data read from and written to the storage media on the server.
  • Connection Usage
    The number of connections used, and the peak number of connections used on this server.

For more information about Novell NRM systems, see Novell NRM Systems.

Generating a Novell NRM Graph

To generate a Novell NRM graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the Novell NRM system whose information you want to graph.
  2. In the tree panel, click the Graphing tab, then click one of the metrics on the list.
  3. Select the start and end dates and times for which the graph charts data.  For more information, see Understanding Dates and Times.
  4. Click Generate Graph.

Instance Motion Graphs

The VMware VMotion tool enables you to move ESX instances from one server to another without any downtime or loss of data. You would use VMotion to, for example, move an instance to newer and faster hardware, or to temporarily relocate the instance while performing a hardware upgrade.

The Instance Motion graph enables you to keep track of a moving VMware instance. For a given ESX instance, the graph charts which systems it is running on over a given time range.

Generating an Instance Motion Graph

To generate an Instance Motion graph, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the ESX instance whose motion you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click Instance Motion.
  4. Select the start and end dates and times for which the graph charts data.  For more information, see Understanding Dates and Times.
  5. Click Generate Graph.

Displaying Detailed Process Information

Detailed process information provides an insight into how various user and system processes are consuming system resources. The information is not presented in a graph - it is a table that contains the following information:

  • Process
    The name of the process, which is taken from its executed path name.
  • PID
    The number that identifies the process.
  • PPID
    The number that identifies the parent process. The PPID can help identify possible relationships between processes.
    On Windows systems, the PPID is called the Creating Process ID.
  • UID
    The ID of the user or account that is consuming CPU time.
    On Windows systems, the UID is called the Owner.
  • GID
    The ID of the group that is consuming CPU time.
    On Windows systems, the GID is called the Group Name.
  • Memory Used
    The amount of memory, expresses as a percentage of total available memory, consumed by a process.
    On Windows systems, Memory Used is called Virtual Bytes .
    The Memory Used value can be misleading because shared memory between processes is counted multiple times. For example, if five Oracle processes are using 10% of available memory, this does not indicate that Oracle is consuming 50% of system memory.
  • RSS
    Run Set Size - the amount of physical memory used.
    On Windows systems, RSS is called the Working Set.
  • CPU %
    The percentage of the CPU time used by the process, calculated by dividing total used CPU Time by the process’ running time; if applicable, the result is further divided by the number of CPUs for the Element on which the process is running.
    On Windows systems, the CPU % is called % Processor Time.
  • User Time
    The amount of time (in seconds) that a particular user, group, or account is using the CPU.
    This value is not displayed for Windows systems.
  • User System Time
    The amount of time (in seconds) that a process is consuming system time on the CPU.
    This value is not displayed for Windows systems.

    Info

    You can get a better indication of the amount of work a process has done by dividing this amount by a sample of time - for example, five minutes.

  • Start Time
    The time at which the process started. This can be used to determine the lifetime of a process.

    Info

    The process information for the current date and time is displayed in the Graphing subpanel.

Generating Detailed Process Information

To display detailed process information, do the following:
  1. On the Global Scan dashboard or Infrastructure panel, click the name of the system whose information you want to graph.
  2. In the tree panel, click the Graphing tab.
  3. Click Detailed Process Information.
  4. Select the start and end dates and times for which the graph charts data.  For more information, see Understanding Dates and Times.
  5. Click Display Process Information.
    A window containing a chart that lists the process information for the time period that you specified appears.
  6. From the dropdown list, select the date and time for which you want to view process information.

The percentage of time that the CPU spends executing Windows kernel commands. If this metric is consistently high you should consider using a faster or more efficient disk subsystem.

Save

Save