CPU, memory, and buffer monitoring

When configuring CPU monitoring, specify a value in the 1-100 range. When the CPU usage exceeds the limit, a threshold monitor alert is triggered. The default CPU limit is 75 percent. With respect to memory, the limit specifies a usage limit as a percentage of available resources.

When used to configure memory or CPU threshold monitoring, the limit value must be greater than the low limit and smaller than the high limit.

Monitoring involves automatic data gathering for low memory and buffer conditions and high CPU conditions. Threshold monitoring tracks the buffer thresholds for each BM buffer queues and the buffer usage on periodic interval and undertake the defined actions whenever the threshold exceeds.

Memory status data collection is invoked by SRMd and is collected every hour. Data collection is triggered upon reaching the limit . The data is available at /var/log/mstatdir and includes historic data.

The histogram feature includes the functionality to collect detailed CPU, memory and buffer utilization by system tasks. This is used to troubleshoot resource allocation and utilization problems. It also includes functionality to monitor line module memory errors. Error messages are logged via Syslog and SNMP traps.

As part of CPU threshold monitoring, some packets that are received by CPU are captured and stored in non-volatile RAM, when CPU hits an abnormal level. This serves and historic reference data for support engineers to troubleshoot network outage.

The alert provided is a RASLog message, with the following options configurable under the raslog option of the threshold-monitor cpu , threshold-monitor buffer or the threshold-monitor memory commands:

Limit specifies the baseline memory usage limit as a percentage of available resources. When this value is exceeded, a RASLog WARNING message is sent. When the usage returns below the value set by limit, a RASLog INFO message is sent. Valid values range from 0 through 80 percent.

High-limit Specifies an upper limit for memory usage as a percentage of available memory. This value must be greater than the value set by limit. When memory usage exceeds this limit, a RASLog CRITICAL message is sent. Valid values range from range from 0 through 80 percent.

The show process cpu top command collects those CPU usages which crosses the threshold value. This data is logged into a text file so that it can be read offline.

Low-limit specifies a lower limit for memory usage as percentage of available memory. This value must be smaller than the value set by limit.

The low memory condition is not prevented. When memory usage exceeds or falls below this limit, the threshold-monitor command reports in RASLog and a RASLog information message is sent.

Poll specifies the polling interval in seconds. Valid values range from 0 through 3600.

Retry specifies the number of polling retries before desired action is taken. Valid values range from 1 through 100.

Note

Note

For CPU and memory thresholds, the low limit must be the lowest value and the high limit must be the highest value.

The following actions are configurable when the set threshold is violated:

Note

Note

The loginfo action collects the 'show process cpu top' and iostat information into a file.

The table below lists the factory defaults for CPU, memory, and buffer thresholds.

Table 1. Default values for CPU, memory, and buffer threshold monitoring

Operand

Memory

CPU

Buffer

low-limit

40%

N/A

N/A

limit

60%

75%

70%

high-limit

70%

N/A

N/A

poll

120 seconds

120 seconds

120 seconds

retry

3

3

N/A