But what if the stats from the system show that CPU Utilization (% Util and Run queue) are well within thresholds and show plenty of available capacity, but Oracle continues to report CPU time as a Top 5 wait event?
We are also seeing high degree of involuntary context switching (using mpstat in Solaris) or context switches (using vmstat in Linux). Obviously, something is not computing right.
CPU Time could mean that the process is either
- On a CPU run queue waiting to be scheduled
- Or currently running on a CPU.
- Minimizing the wait time on the run queue so that the session can run on the CPU as soon as possible. This is determined by the priority of the process.
- And once running on the CPU, be allowed to run on the CPU to complete its tasks. The amount of time available for the process on the CPU is defined as the Time Quanta.
"Scheduling is a key concept in computer multitasking, multiprocessing operating system and real-time operating system designs. In modern operating systems, there are typically many more processes running than there are CPUs available to run them. Scheduling refers to the way processes are assigned to run on the available CPUs. This assignment is carried out by software known as a scheduler or is sometimes referred to as a dispatcher"
Understanding how the scheduler shares CPU resources is key to understanding and influencing the wait event "CPU Time".
In any Unix platform, there are processes which take higher priority than others. Labeling a process as higher priority can be done through the implementation of Scheduling classes and with the nice command. Both can have different effects on the process.
An easy method to identify the scheduling class and current priority for a process is to use the ps command. Used with the "-flycae" arguments, it shows both the scheduling class and current priority. However it does not show the CPU time quanta associated with a process.
dbrac{714}$ ps -flycae |egrep "ora_" |more
S UID PID PPID CLS PRI CMD
S oracle 931 1 TS 24 ora_p000_DW
S oracle 933 1 TS 24 ora_p001_DW
S UID PID PPID CLS PRI CMD
S oracle 931 1 TS 24 ora_p000_DW
S oracle 933 1 TS 24 ora_p001_DW
In the above example, you would be interested in the CLS and PRI column. The above example shows that the oracle background processes as running under the TS Scheduling class with a priority of 24. The higher the number reported in the PRI column, the higher the priority.
The default Scheduler for user processes is TS or Time Share and is common across Solaris and Linux. The TS scheduler changes priorities and CPU time quantas for processes based on recent processor usage.
Since we appear to have plenty of CPU resources, we could draw the conclusion that the default (TS) scheduling class does not appear to be good enough for us. Either the scheduler is not allocating sufficient CPU time quanta (resulting in involuntary context switching) or not giving the process a sufficiently higher priority so that it can be scheduled earlier than other processes.
So how do we change it? Obviously we would want to
- set a fixed priority for Oracle processes so that they are able to run on the CPU ahead of other competing processes.
- set a fixed time quanta for Oracle processes so that they can run to completion on the CPU.
In Linux, it is the RR class and in Solaris it is the FX class. The simplest way to change the scheduling class is to use the priocntl tool. While it is a native binary on Solaris, it is available on Linux through the Heirloom Project.
On Linux, you would need to use the renice command to change the CPU time quantas and on Solaris, priocntl does both - scheduling class and time quanta.
Let us look at a few examples -
On Linux - Let us try and change the Scheduling Class and Time Quanta for the Log Writer.
[root@dbrac root]# ./priocntl -l
CONFIGURED CLASSES
==================
TS (Time Sharing)
Configured TS User Priority Range: -19 through 20
RT (Real Time Round Robin)
Maximum Configured RT Priority: 99
FF (Real Time First In-First Out)
Maximum Configured FI Priority: 99
We see that there are 3 Scheduling classes available for use.
[root@dbrac root]# ps -flycae |grep ora_lgwr
S UID PID PPID CLS PRI CMD
S oracle 30318 1 TS 23 ora_lgwr_DWRAC
It shows that LGWR is running in TS class with a Priority of 23.
[root@dbrac root]# ./priocntl -d 30318
TIME SHARING PROCESSES
PID TSUPRI
30318 0
Let us change LGWR to RT class with a RT priority of 50.
[root@dbrac root]# ./priocntl -s -c RT -p 50 -i pid 30318
[root@dbrac root]# ./priocntl -d 30318
REAL TIME PROCESSES
PID RTPRI TQNTM
30318 50 99
It shows that the RT priority is 50 and the Time Quanta is 99.
[root@dbrac root]# ps -flycae |grep ora_lgwr
S UID PID PPID CLS PRI CMD
S oracle 30318 1 RR 90 ora_lgwr_DWRAC
Note that even though the RT priority is 50, ps shows the PRI as 90.
Let us change the time quanta for the Log writer.
[root@dbrac root]# renice +2 30318
30318: old priority 0, new priority 2
[root@dbrac root]# ps -flycae |grep ora_lgwr
S UID PID PPID CLS PRI CMD
S oracle 30318 1 RR 90 ora_lgwr_DWRAC
No change in the PRI after renicing a RT process (expected).
[root@dbrac root]# ./priocntl -d 30318
REAL TIME PROCESSES
PID RTPRI TQNTM
30318 50 89
But when checking with priocntl, we see that the time quanta is now 89 (Previous was 99).
Let us see if we can increase the time quanta.
[root@dbrac root]# renice -3 30318
30318: old priority 2, new priority -3
[root@dbrac root]# ./priocntl -d 30318
REAL TIME PROCESSES
PID RTPRI TQNTM
30318 50 459
Now the time quanta is 459. Higher the time quanta, the more time the process can spend on the CPU before being context switched out.
For Solaris - priocntl can be used to set the Scheduling class and the time quanta simultaneously. I am not going to show any examples here as it would be the same as above.
Now, as to which processes (background/shadow) need to have a higher priority than others, that is a decision which requires significant amount of testing. I have seen 30% improvements in load timings when changing scheduling properties, however it has the potential to completely break the environment if not done correctly.
Interestingly enough, when running Oracle RAC on Linux, you would notice that the lms process are now running under the RR Scheduling class.
dbrac{720}$ ps -flycae |grep ora_lms
S UID PID PPID CLS PRI CMD
S oracle 9074 1 RR 41 ora_lms0_DWRAC
S oracle 9078 1 RR 41 ora_lms1_DWRAC
[root@dbrac root]# ./priocntl -d 30306
REAL TIME PROCESSES
PID RTPRI TQNTM
30306 50 99
5 comments:
"it has the potential to completely break the environment if not done correctly." is very significant.
Besides fixed time another reason why a process would be switched out of a processor would be when it makes an I/O call. The process would then relinquish the CPU. Would changing the "fixed quanta" affect this behaviour ?
(Also, I need to recheck how a Priority is listed. I thought a higher absolute value for PRI meant a lower relative priority).
"Besides fixed time another reason why a process would be switched out of a processor would be when it makes an I/O call. The process would then relinquish the CPU. Would changing the "fixed quanta" affect this behaviour ?"
I do not believe so. The time quanta would apply only when the process is running on the CPU actively.
The correlation of priorities with the numbers as displayed by the "ps" command depends on the argument passed to "ps".
With the -c flag, higher numbers in the PRI column means higher priority. Otherwise, it would mean the opposite - lower numbers are higher priority.
Excellent topic will use some of the tips
Capacity Planning
Capacity Planning
Krishna,
Thanks for the article,nice.
As a DBA I want to limit myself to database front and let the SA figureout what's wrong with CPU if I confirm that nothing is wrong from Oracle Database prespecitive . What in your view is the best way to monitor CPU usage within oracle Database.
As you said , CPU as top event (time) is a indication that something is consuming more resource. I would like to quickly run some queries against v$ views to get the information about the same .
Do you thing V$OSSTAT or V$SYSSTAT are good starting point ? what should be the chronological steps to come to a conclude the bottleneck ?
For monitoring CPU usage, I guess you could use v$osstat. As always, you would need to correlate using OS tools such as vmstat or mpstat.
The v$sys_time_model may be a better view to see if you have a CPU bottleneck. As it is cumulative, you would need to take multiple snapshots over a period of time.
Post a Comment