The up.time Solaris agent uses a number of utilities resident on a Solaris server to collect performance metrics including sar
, awk
, fgrep
and ps
. Note that the sadc
utility (which collects system activity information and saves it as a binary file) is used by sar
and must be readable by the up.time agent user account on the system.
The up.time Solaris agent collects the following performance metrics from the systems on which it is installed:
\n \n \n \nWhenever the sar
command uses the -f
option to specify a file, that file is generated using the sadc 1 1
command. The sadc
command polls the system counters at a one-second interval and writes the information that it receives to a file. The sar
command then reads this file.
Each set of performance metrics is averaged between the interval at which the up.time monitoring station polls the agent (e.g. every 10 minutes).
\n \nCPU
\n \nThe up.time agent uses the sar -u -f
command to collect CPU metrics from a Solaris system. The statistics returned by the agent are averaged for all CPUs on the system and the sar
command compares the system counters during a one-second interval. In multi-CPU systems, the CPU statistics output by the agent are an average of all the CPUs on the server.
Metric | \n\tExplanation | \n
% USR | \n\tThe percentage of time that the processor spends in user mode (a processing mode for applications and subsystems). | \n
% SYS | \n\tThe percentage of time that the kernel spends processing system calls. | \n
% WIO | \n\tThe amount of waiting time that a runnable process for a device takes to perform an I/O operation. | \n \n
% Total | \n\tThe total amount of User %, System %, and \nWait I/O % | \n
Run Queue Length | \n\tThe percentage of time that one or more services or processes are waiting to be served by the CPU. | \n
Run Queue Occupancy | \n\tThe percentage of time that one or more services or processes are waiting to be served by the CPU. | \n
Multi-CPU
\n \nThe up.time agent outputs statistics for the entire Solaris system, per CPU. The sar command (mpstat 1 2
) averages the statistics for each CPU and compares the system counters during a one-second interval
Metric | \n\tExplanation | \n
User % | \n \n\tThe percentage of CPU user processes that are in use. | \n
System % | \n\tThe percentage of CPU kernel processes that are in use. | \n
Wait I/O % | \n\tThe percentage of time that a process which can be run must wait for a device to perform an I/O operation. | \n \n
SMTX | \n\tThe number of read or write locks that a thread was not able to acquire on the first attempt, as reported by the mpstat command. | \n
XCAL | \n\tThe number of interprocess cross-calls. In a multi-processor environment, one processor sends cross-calls to another processor to get that processor to do work. Cross-calls can also be used to ensure consistency in virtual memory. Heavy file system activity (such as NFS) can result in a high number of cross-calls. | \n
Interrupts | \n\tThe number of CPU interrupts. | \n
Total % | \n\tThe total amount of User %, System %, and Wait I/O%. | \n
Memory
\n \nThe up.time agent uses the vmstat 1 2
command to average the metrics that are collected for the entire system. The agent also compares system counters during a one-second interval using the sar
command and the following options:
-
\n
-b -f
(cache metrics)
\n-g -f
and-p -f
(paging activity)
\n-q -f
(the average queue length while it is occupied, and the percentage of time that the queue is occupied)
\n-c -f
(system calls)
\n
The free swap metric is collected from the "available" value in the swap -s
command.
Metric | \n\tExplanation | \n
Free Memory | \n\tThe amount of physical memory available to the operating system, system library files, and applications. | \n
Cache Hit Rate | \n\tHow often the system accesses the CPU cache. | \n
Page-outs/s | \n\tThe rate at which pages were written to disk. | \n
Page-ins/s | \n\tThe rate at which pages were read from or written to the disk. | \n
Page Free/s | \n\tThe number of pages that are freed from memory each second. | \n
Attaches/s | \n\tThe number of pages that get attached to memory each second. | \n
Page-out Requests/s | \n\tThe number of requests to perform a write operation that occur each second. | \n
Page-in reqs/s | \n\tThe number of requests to perform a read operation that occur each second. | \n
PageScans/s | \n\tThe number of pages that are scanned each second. | \n
PageFaults/s | \n\tThe number of page faults that occur each second. | \n
Software Locks/s | \n\tThe number of software locks that are issued each second. | \n
Virtual Faults/s | \n\tThe number of virtual memory faults that occur each second. | \n
Free Swap | \n\tThe amount of available free swap space, as a percentage of total available free swap space. | \n
Disk
\n \nThe up.time agent uses the following commands to collect disk statistics:
\n \n-
\n
df -lk
to gather file system capacity statistics, for file system.
\nsar -d -f
to output disk statistics (e.g. %busy, Read/Write/s) per disk, and compare those statisitics between polling intervals.
\n
Metric | \n\tExplanation | \n
Disk (Spindle) Name | \n\tThe names of each disk on the system. | \n
Usage (% Busy) | \n\tThe percentage of time during which the disk drive is handling read or write requests. | \n
Throughput (Blk/s) | \n\tThe number of read and write operations on the disk that occur each second. | \n
Read/Writes/s | \n\tThe average number of bytes that have been transferred to or from the disk during write or read operations. | \n
Average Queue Length | \n\tThe number of threads that are waiting for processor time. | \n
Average Service Time | \n\tThe average amount of time, in milliseconds, that is required for a request to be carried out. | \n
Average Wait Time | \n\tThe average time, in milliseconds, that a transaction is waiting in a queue. The wait time is directly proportional to the length of the queue. | \n
Network
\n \nThe up.time agent uses the netstat -s
command to collect network metrics from a Solaris server. Except for TCP retransmits, the agent averages all statistics per interface. Other statistics (e.g. kbps, errors and collisions) are collected per interface by the kstat
command.
Metric | \n\tExplanation | \n
In Kbps | \n\tThe rate, in kilobytes per seconds, at which data is received over a specific network adapter. | \n
Out Kbps | \n\tThe rate, in kilobytes per seconds, at which data is sent over a specific network adapter. | \n
In Errors | \n\tThe number of inbound packets that contained errors, which preventing those packets from being delivered to a higher-layer protocol. | \n
Out Errors | \n\tThe number of outbound packets that could not be transmitted because of errors. | \n
Collisions | \n\tThe number of signals from two separate nodes on the network that have collided. | \n
TCP Retransmits | \n\tThe number of packets that have been re-sent over a network interface. | \n
Process
\n \nThe up.time agent gathers process information directly from the /proc
filesystem using the procfs
command.
Metric | \n\tExplanation | \n
PID | \n\tThe unique identifier of a specific process. | \n
PPID | \n\tThe identifier of the process that the process that is currently running. | \n
UID | \n\tA value that identifies the current user. | \n
GID | \n\tA value that identifies a group of users. | \n
Memory Consumed | \n\tThe amount of memory that is being used by a process. | \n
RSS | \n\tThe amount of physical memory that is being used by a process. | \n
CPU % Utilization by Process | \n\tThe percentage of CPU time that is being used by individual processes. | \n
Memory % Utilization by Process | \n\tThe amount of physical memory that is being used by individual processes. | \n
Process Start Time | \n\tThe time at which the process started. | \n
Process Run Time | \n\tThe time at which the process started. | \n
Number of Processes Running | \n\tThe total number of processes that are currently running on the system. | \n
Number of Blocked Processes | \n\tThe total number of processes that are blocking resources. | \n
Number of Waiting Processes | \n\tThe total number of processes that are waiting to be executed by the CPU. | \n
Execs per Second | \n\tThe total number of system calls that are executed each second. | \n
Process Creation Rate | \n\tThe total number of processes that are being spawned over a specified time period. | \n
Workload
\n \nWorkload statistics are sorted within up.time's core but are the same 20 processes that were gathered from the Process method (see above). The workload processes gathered by the agent include user/group/process name and their invividual statistics. The up.time core then sorts based on the selected graph (e.g. user, group or process name).
\n \nMetric | \n\tExplanation | \n
Workload by Process | \n\tThe demand that network and local services are putting on a system, based on the processes that are running. | \n
Workload by User | \n\tThe demand that network and local services are putting on the system, based on the IDs of the users who are logged into a system. | \n
Workload by Group | \n\tThe demand that network and local services are putting on the system, based on the IDs of the user groups that are logged into a system. | \n
Workload Top 10 by Process | \n\tThe 10 processes that are consuming the most CPU resources. | \n
Workload Top 10 by User | \n\tThe 10 processes the are consuming the most CPU resources, based on user ID. | \n
Workload Top 10 by Group | \n\tThe 10 processes the are consuming the most CPU resources, based on group ID. | \n
Veritas Volume Manager
\n \nThe up.time agent uses the vxdg list
command to collect statistics from disk volumes that are managed by the Veritas Volume Manager. These statistics are gathered for each volume, first by retrieving the contents of the disk groups using the vxdisk list
command, and then by collecting statistics using the vxstat -g <diskgroup>
command.
Metric | \n\tExplanation | \n
DG/Volume/Subdisk | \n \n\tThe name of the disk, volume, or subdisk. | \n
I/O Operations | \n\tThe number of times, per second, that data is written to and read from the volume being managed by Veritas Volume Manager. | \n
Block Throughput | \n\tThe amount of disk traffic, in blocks of 512 bytes, that is flowing to and from the volume being managed by Veritas Volume Manager. | \n \n
Average Service Time | \n\tThe average amount of time, in milliseconds, that is required for a request to be carried out. | \n
User
\n \nThe up.time agent uses the following commands to collect user statistics from a Solaris system:
\n \n-
\n
ps -eo
\nlast | head 10
(login history for the last 10 users on the system)
\nwho
(lists who is currently logged into the system)
\n
Metric | \n\tExplanation | \n
Login History | \n\tThe number of times or frequency at which a user has logged into a system during any 30 minute time interval. | \n
Sessions | \n\tThe number of sessions or number of distinct users who are logged into a system during any 30 minute time interval. | \n