Friday, May 9, 2008

Simple Userland tools on Unix to help analyze application impact as a non-root user - CPU subsystem

I recently gave a presentation to my team on how to analyze the impact of the application on the CPU subsystem.

Read this document on Scribd: Analyze application impact - CPU Subsystem

Analyzing Application impact using simple userland tools – CPU subsystem Krishna Manoharan krishmanoh@gmail.com 1 Introduction Every application impacts the host Operating system and connected sub-systems in a unique way. In order to profile an application and understand its impact on the environment, there are a number of userland tools provided within the OS. Many of these tools do not require super-user privileges thus enabling ordinary users such as dba’s or application developers, the ability to see and gauge the impact of the application on the system. 2 Subsystems in an environment One needs to analyze the impact of an application on all the major subsystems in an environment.  CPU  Memory  Storage  Network 3 CPU Subsystem - Metrics The key metrics to look for when measuring CPU utilization is  Run queue – The number of running threads or threads waiting to run on a CPU.  Utilization – CPU utilization  Context switches (Voluntary and Involuntary) - Context switch is when a kernel thread is suspended from execution on the CPU with its current state captured and another thread is scheduled.  Thread Migrations – Shows kernel threads which re-awake after a context switch and are running on a different CPU than previous run.  Interrupt generation – Interrupts are signals from hardware devices such as network, disk etc or from software which force the CPU to process them immediately.  System Calls - System calls allow an application to request the kernel to perform privileged operations such as IO, new threads etc.  Translation Lookaside Buffer Miss – TLB is a table maintained in the CPU which contains a subset of mappings of virtual memory addess to actual physical address. When a CPU need to convert a virtual memory address to a physical address, it searches the TLB first and if not found, it is a miss. TLB misses are expensive. 4 CPU Subsystem - Tools To measure the impact of an application on the CPU subsystem, the tools available are  sar - Gathers and reports on a variety of statistics. Make sure it is set to 5 minute intervals rather than the default.  vmstat – Reports CPU related statics  mpstat - Reports CPU related statistics  prstat - Reports process related statistics  truss - Used in debugging  ps – Reports process related statistics  trapstat - Reports TLB misses. Of all these, only trapstat needs to be run as root. 5 CPU Subsystem – Run Queue There are many schools of thought regarding the size of the run queue. Normally one would say that a run queue of 2 or more shows a CPU bottleneck. On a SMP system, each CPU has its own run queue and so for a 8 CPU system, would a run queue of 16 would be okay? It all depends on what the user perception is. You may have a run queue of 100 and still be okay with it if you are meeting your SLA. I personally have seen Tibco Business Works running on Solaris and racking up a sustained run queue of +80 on a 4 CPU V440 and still be averagely responsive. Similarly a Oracle 10g RAC installation running on a Dell 2950 with 8 CPU's, a sustained run queue of +12 would leave the system pretty much unresponsive and unusable. I would feel that a sustained run queue of 10 or more coupled with a lot of involuntary context switches means you definitely have a CPU bottleneck. All this assuming that your average CPU utilization is well within limits (average 30-50%). The important thing to note about the run queue when analyzing application impact is that none of these tools discussed above will tell you if the run queue is due to the application or other process. In order to find out what exactly is causing it, you would need to use prstat. 6 CPU Subsystem – Run Queue contd. The run queue can be obtained either using vmstat, w or sar. kthr rbw memory swap free re page disk in faults sy cpu cs us sy id mf pi po fr de sr s9 s3 s3 s3 0 13 0 84634792 97274944 103 573 96 26 24 0 0 0 0 0 0 4377 31853 14630 23 3 74 The run queue can also be obtained using the sar command. This gives a historic perspective. mkrishna@tus1dwhdbspex01:> sar –q SunOS tus1dwhdbspex01 5.9 Generic_122300-16 sun4u 00:00:00 runq-sz %runocc swpq-sz %swpocc 00:05:00 1.0 2 0.0 0 06/04/2008 The w command also gives the load average for the last 1, 5, 15 minutes. The load average is the run queue. mkrishna@tus1dwhdbspex01:> w 8:30pm User root up 24 day(s), 22:12, tty console login@ 4 users, idle load average: 7.71, 8.87, 8.86 JCPU PCPU what format 10May0825days 7 CPU Subsystem – Utilization Measure of how busy the CPU is. CPU can be busy either processing high complexity (compression, encyption, Math) threads resulting in high cpu utilization or busy processing medium to low complex threads showing lower utilization. You can have a high run queue and low CPU utilization. Overall CPU utilization can be obtained from  vmstat  sar Process wise CPU utilization can be obtained from  prstat  ps Individual CPU utilization can be obtained from  mpstat 8 CPU Subsystem – Utilization – Contd. Overall CPU utilization - Measure using vmstat and sar  vmstat – In an ideal scenario, % system time will be less than 5%. % user time will be the real measure of CPU utilization. % wait is to be ignored completely. kthr rbw memory swap free re page disk faults in sy cs cpu us sy id 23 3 27 2 74 72 mf pi po fr de sr s9 s3 s3 s3 0 13 0 84634792 97274944 103 573 96 26 24 0 0 0 0 0 0 4377 31853 14630 0 19 0 77838000 93835392 0 21 0 0 0 0 0 0 0 0 0 5997 31834 9154  sar - Gives historical reporting. More details on how to get sar reports for previous days. mkrishna@tus1dwhdbspex01:> sar –u SunOS tus1dwhdbspex01 5.9 Generic_122300-16 sun4u 00:00:00 00:05:00 %usr 31 %sys 3 %wio 38 %idle 29 06/04/2008 9 CPU Subsystem – Utilization – Contd. Process CPU utilization  prstat – Solaris equivalent to top. Gives a considerable amount of detailed information. In order to see individual process cpu utilization and overall user cpu utilization, run as prstat –a 1.  prstat also can give a break up of the run-queue and which thread/process is currently executing. mkrishna@tus1dwhdbspex01:> prstat -a 1 PID USERNAME 29192 oraodsup NPROC USERNAME SIZE 20G SIZE RSS STATE 11G cpu522 RSS MEMORY 77% PRI NICE 10 0 TIME 57:08:28 TIME CPU PROCESS/NLWP 0:12:59 3.1% oracle/1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ CPU 26% 185 oraodsup 3579G 1969G  The STATE column gives the currently running processes and the CPU column gives the % CPU consumed by that process. The Output at the bottom gives overall CPU consumed by a specific user account. 10 CPU Subsystem – Utilization – Contd. Process CPU utilization over time • The ps command will give the overall CPU time consumed by a process. The time column shows the amount of CPU execution time. This will give an idea as to what process has consumed the most CPU time since system boot. mkrishna@tus1dwhdbspex01:> ps -e -o time,args |more TIME COMMAND 0:12 sched 4:38 /etc/init 0:00 pageout 1-23:03:57 fsflush 0:00 -sh 0:10 /usr/lib/inet/in.mpathd –a 11 CPU Subsystem – Utilization – Contd. Individual CPU Utilization The mpstat command will give the individual CPU utilization on a SMP system. Checking for CPU utilization using mpstat allows you to see if all the CPU’s are being used and if SMP is really happening. mkrishna@tus1dwhdbspex01:> mpstat CPU minf mjf xcal 0 1 36 16 0 0 926 914 intr ithr 313 18 201 1 csw icsw migr smtx 475 480 14 13 34 34 81 71 srw syscl 0 0 1125 929 usr sys 28 21 2 2 wt idl 30 34 40 44 12 CPU Subsystem – Context switching and Thread Migrations Context switch (CS) is when a kernel thread is suspended from execution on the CPU with its current state captured and another thread is scheduled. Thread migrations happen when a thread was running on a particular CPU and was CS out. Then when it is ready to run, it is scheduled on a different CPU. Thread migrations are not efficient because the current CPU Cache is not warm with the thread information. Context switches happen as a result of the below: » Time quanta expired for the running thread – Causes Involuntary CS » Higher priority thread needs to run - Causes Involuntary CS » Interrupt is being serviced (not 100% context switch) - Causes Involuntary CS » Thread has completed it’s activity or issued a blocking system call – Causes Voluntary CS An Involuntary CS means that a thread did not get an opportunity to complete it’s work. A system can have 100,000 context switches and still meet SLA’s if most of the CS are voluntary in nature. A high degree of involuntary context switches coupled with thread migrations usually means there is a CPU related bottleneck. 13 CPU Subsystem – Context switching and Thread Migrations – Contd. The tools to measure context switches and thread migrations are • • • • vmstat mpstat prstat sar vmstat mkrishna@tus1dwhdbspex01:> vmstat 1 kthr rbw memory swap free re page disk in faults sy cpu mf pi po fr de sr s9 s3 s3 s3 cs us sy id 0 13 0 84568400 97239720 102 573 95 26 24 0 0 0 0 0 0 4398 31925 14622 23 3 74 0 18 0 76750792 91878616 6 146 0 63 55 0 0 0 0 0 0 6712 45464 11107 33 3 64 mpstat mkrishna@tus1dwhdbspex01:> mpstat CPU minf mjf xcal 0 36 0 925 intr ithr 313 201 csw icsw migr smtx 474 15 34 81 srw syscl 0 1125 usr sys 28 2 wt idl 30 40 14 CPU Subsystem – Context switching and Thread Migrations – Contd. prstat – using the –m switch which shows specific process related statistics. It allows you to drill down as to which process is being context switched. mkrishna@tus1dwhdbspex01:> prstat –n 1000 -m 1 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 5221 oraodsup 100 0.0 - 0.0 0 11 2 0 oracle/1 5400 oraodsup 29 2.4 69 - 235 28 1K 0 oracle/1 sar -w – shows history of context switches mkrishna@inftpd1:> sar -w SunOS inftpd1 5.10 Generic_118833-36 sun4u 06/06/2008 00:00:00 swpin/s bswin/s swpot/s bswot/s pswch/s 00:05:00 0.00 0.0 0.00 0.0 1420 15 CPU Subsystem – Interrupts Interrupts are mechanisms by which a hardware device (HBA, Network adaptor) or application (signals) can get CPU attention. Interrupts are generated to specific CPU’s. When an interrupt is generated, the current running thread is pinned and the interrupt serviced. The pinned thread is then again resumed. Too many interrupts can hurt application performance. 16 CPU Subsystem – Interrupts contd.   Interrupts can be monitored using mpstat and vmstat vmstat mkrishna@tus1dwhdbspex01:> vmstat 1 kthr rbw memory swap free re page disk faults cpu cs us sy id mf pi po fr de sr s9 s3 s3 s3 in sy 0 13 0 84568400 97239720 102 573 95 26 24 0 0 0 0 0 0 4398 31925 14622 23 3 74 0 18 0 76750792 91878616 6 146 0 63 55 0 0 0 0 0 0 6712 45464 11107 33 3 64 mpstat mkrishna@tus1dwhdbspex01:> mpstat CPU minf mjf xcal 0 36 0 925 intr ithr 313 201 csw icsw migr smtx 474 15 34 81 srw syscl 0 1125 usr sys 28 2 wt idl 30 40 17 CPU Subsystem – System Calls System calls are API which allow a process to request privileged operations such as IO, memory allocation or new thread creation etc. It is important to identify the time spent servicing system calls. Too many system calls (such as the timed_statistics in oracle) can swamp the system. 18 CPU Subsystem – System Calls contd. System calls can be monitored using vmstat, mpstat, prstat and sar (historical trending) vmstat mkrishna@tus1dwhdbspex01:> vmstat 1 kthr rbw memory swap free re page disk in faults cpu cs us sy id mf pi po fr de sr s9 s3 s3 s3 sy 0 13 0 84568400 97239720 102 573 95 26 24 0 0 0 0 0 0 4398 31925 14622 23 3 74 0 18 0 76750792 91878616 6 146 0 63 55 0 0 0 0 0 0 6712 45464 11107 33 3 64 mpstat mkrishna@tus1dwhdbspex01:> mpstat CPU minf mjf xcal 0 36 0 925 intr ithr 313 201 csw icsw migr smtx 474 15 34 81 srw syscl 0 usr sys 28 2 wt idl 30 40 1125 19 CPU Subsystem – System Calls contd. sar -c mkrishna@inftpd1:> sar -c |more SunOS inftpd1 5.10 Generic_118833-36 sun4u 00:00:00 scall/s sread/s swrit/s 00:05:00 fork/s 0.07 06/06/2008 exec/s rchar/s wchar/s 0.07 56915 41169 1400 103 21 20 CPU Subsystem – System Calls contd. The truss utility executes the specified command and produces a trace of the system calls it performs, the signals it receives, and the machine faults it incurs. truss is very useful to trace system calls and the time spent on a particular call. 21 CPU Subsystem – System Calls contd. In 126 seconds, 50 seconds were spent on reading data from the cache folder. root@ustu-infapppdin21:> truss -c -p 23050 syscall seconds calls errors read 40.740 639326 write 1.313 29562 time .079 4286 lseek 9.807 608657 fcntl .000 35 lwp_park .197 5170 1 lwp_unpark .142 5168 poll .046 731 yield .001 29 ------------- ---sys totals: 52.328 1292964 1 usr time: 25.691 elapsed: 126.210 22 Historic sar data How do I read historical sar data? By default sar data is saved in /var/adm/sa for 7 days. BI systems have been customized to save data for 30 days. To read sar data from previous days, sar –f /var/adm/sa/sa<01-29> -q|u|w|etc 23 So much data and how do I correlate? If you have kept up with me so far, you will begin to wonder as to how do I take all this data in and make it meaningful? It depends on what you wish to accomplish. – Short Term Reactive – Check on a possible CPU bottleneck right away – Long Term – Do trending and see how usage has changed over a period of time. For long term Sar data is invaluable, however it is not complete. Sar captures run queue, utilization, system calls and context switches. However sar does not capture interrupts, icsw or thread migrations to measure CPU load, so custom scripts needs to be written to capture this information. Long term – it is difficult to capture specific process/user related information unless process accounting is installed. A better option for the above would be tools such as sitescope, 24 Short Term Reactive approach Suspect CPU Bottleneck ? Application confirms (AWR/ASH) or poor performance. System related processes are not the cause. Check using w (load average), vmstat, sar (run queue and utilization) Debug process? e.g. Do a truss –c –p and see where time is being spent by the process on which system call. From AWR/ASH reports, identify and analyze the sql. Check using prstat –m –a 1 for processes using up CPU. Check for cpu, cs, icsw and system calls for specific process(s). Identify process either from application or through ps Further action (application/sql analysis, add more cpu’s, breakup/stagger load etc) 25 Long Term Pro-Active approach Setup sar for running every 5 minutes. Either setup custom scripts to capture other data such as icsw, interrupts etc or install tools such as BMC Patrol/Sitescope/Openv iew Prepare trending reports for CPU run queue, utilization, cs, icsw, thread migrations and interrupts over a period of time (say 3 months). This gives an overall picture of CPU related metrics. Further action 26 From an oracle perspective All this is fine, how do I know oracle sees a CPU bottleneck? The answer is in AWR/ASH reports – AWR/ASH reports will show a CPU wait in the top 5 waits Table dba_hist_sysmetric_summary (Oracle 10G) The “Current OS Load: column will show the run queue as seen by oracle for every AWR snapshot. 27