All 1 entries tagged Mdb
No other Warwick Blogs use the tag Mdb on entries | View entries tagged Mdb at Technorati | There are no images tagged Mdb on this blog
October 27, 2010
Finding which processes are swapped / MDB tutorial
Today I encountered a server that had previously experienced some serious memory pressure, evidence of the page scanner running, a serious amount of anonymous paging and swapping of light weight processes out of memory. I’m sure some of you have had this experience before and wonder just which processes those are that are sat in the vmstat ‘w’ column.
The vmstat man page simply says:
w the number of swapped out light-
weight processes (LWPs) that are
waiting for processing resources to
finish.
Which is vague, at best. Perhaps it’s just me. Anyway, this is what I saw when I ran vmstat:
bash-3.00# vmstat 1
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr vc vc -- -- in sy cs us sy id
0 0 4 12857400 5644584 13 15 166 144 148 0 285 14 0 0 0 1619 1762 1574 2 1 98
1 0 94 14134296 7000240 7 14 0 0 0 0 0 0 0 0 0 308 779 83 0 0 100
0 0 94 14134040 6999984 4 5 0 0 0 0 0 0 0 0 0 306 766 79 0 0 100
1 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 342 755 107 0 0 100
1 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 310 776 83 0 0 100
1 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 310 763 89 0 0 100
1 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 301 757 75 0 0 100
0 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 296 751 69 0 0 100
1 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 352 766 111 0 0 100
0 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 310 787 93 0 0 100
1 0 94 14134040 6999984 4 4 0 0 0 0 0 0 0 0 0 316 772 87 0 0 100
Wow, 94 LWPs swapped out, that’s quite a lot. This generally won’t cause any issues on your system, as soon as an LWP is asked to do work it will be swapped back in, on demand. In fact, you could force them back in by tracing the process (p-commands such as pfiles etc) or sending the process a signal but in order to do that you would need to know ‘which’ threads are swapped out.
There are a number of ways to do this, and one of them is with mdb. So, I set myself the task of finding which processes are swapped out.
First, access to the kernel threads in the live system is possible through the ‘thread’ walker in mdb. To access the live running kernel, you need to fire up mdb with the ‘-k’ option.
The ::help dcmd in mdb shows us a brief description of each dcmd. Below, the walker and output is shown as an example.
# mdb -k
Loading modules: [ unix genunix specfs dtrace zfs sd pcisch ip hook neti sctp arp usba fcp fctl nca lofs md cpc random crypto wrsmd fcip logindmux ptm ufs sppp nfs ipc ]
> ::help walk
NAME
walk - walk data structure
SYNOPSIS
[ addr ] ::walk name [variable]
ATTRIBUTES
Target: kvm
Module: mdb
Interface Stability: Evolving
>
So, lets start with the kthreads:
> ::walk thread thr
180e000
2a10001fca0
2a100017ca0
2a10000fca0
2a100007ca0
2a10003fca0
What we get back is a bunch of memory addresses that allow us access to the threads, one per thread, of course. Pipes are powerful constructs in mdb which allow us, much like unix commands, to pipe the output of one cmd (dcmd) into the input of another, in the first instance, we can pipe the output of the walker above straight into ::print to print the kthread_t schedflag.
I should mention that the addresses listed in step one will point to a data structure of type kthread_t which you can look up in /usr/include/sys/thread.h. This is important because it tells you how to read the structure and details what elements within we may be interested in. Take a look and read up on the schedflag.
> ::walk thread thr |::print kthread_t t_schedflag
t_schedflag = 0x3e03
t_schedflag = 0x3
t_schedflag = 0x3
t_schedflag = 0x3
t_schedflag = 0x3
t_schedflag = 0x3
....
OK, so I’ve truncated the output, but we can see that this gives us a list of all the schedflags for all the threads on the system. The list is long, so we need to work out which schedflag values we are interested in and ‘grep’ for that value in the output. Fortunately, mdb provides us with just what we need in the ::grep dcmd. Note that the ::grep dcmd uses the . notation which refers to the previous command output. The mdb documentation is suitably confusing on this one, I don’t know about you but I had to just ‘play’ with the dcmd to work out what this was trying to tell me:
::grep command
Evaluate the specified command string, and then print
the old value of dot if the new value of dot is non-
zero. If the command contains whitespace or metacharac-
ters, it must be quoted. The ::grep dcmd can be used in
pipelines to filter a list of addresses.
Again, output below truncated.
> ::walk thread thr |::print -x kthread_t t_schedflag|::grep .==0x03
3
3
3
As for my quest, I initially expected to be able to match/search for the value 0×0008, due to the following in thread.h:
#define TS_ON_SWAPQ 0x0008 /* thread is on the swap queue */
I will jump ahead here and show you that you can run mdb non-interactively and combine it with standard unix commands.
To my dismay, there are no threads with the 0×0008 schedflag value on my system:
bash-3.00# echo "::walk thread thr |::print kthread_t t_schedflag|::grep .==0008" | mdb -k |wc -l
0
When there clearly are still threads on the run queue, 91 in fact:
bash-3.00# vmstat 1 3
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr vc vc -- -- in sy cs us sy id
0 0 6 12892280 5681600 13 15 162 140 144 0 278 14 0 0 0 1584 1718 1534 2 1 98
1 0 91 14172696 7055272 10 17 0 0 0 0 0 0 0 0 0 318 645 87 0 0 100
I came across this by trial and error, but it can’t be a coincidence that there are 91 threads in my system that have TS_RUNQMATCH set. I need to read through the code to work out why, but it would appear that setbackdq() has set/changed the value since the threads were swapped out. If anyone cares to explain in the comments section, I’d be very grateful :)
bash-3.00# grep RUNQ /usr/include/sys/thread.h
#define TS_RUNQMATCH 0x4000 /* exact run queue balancing by setbackdq() */
bash-3.00# echo "::walk thread thr |::print kthread_t t_schedflag|::grep .==4000" | mdb -k |wc -l
91
So, going with a rather large assumption for now, lets look at the threads with that value set. First, print out the address of the threads in question, and recall that after the ::grep we have the value of the schedflag where the schedflag was equal to TS_RUNQMATCH. We can use ::eval to convert this back to an address.
> ::walk thread thr |::print kthread_t t_schedflag|::grep .==4000 |::eval <thr=K
30007f36760
30007f21a80
30007f36aa0
30007f514e0
3000803af20
30007f2f780
30007f1d3e0
300076366a0
Then, (finally), pipe each one back to ::print in order to interrogate the proc structure and in turn the p_user user structure (see sys/proc.h and sys/user.h) and in turn the value in p.user.u_comm. Output truncated.
> ::walk thread thr |::print kthread_t t_schedflag|::grep .==4000 |::eval <thr=K |::print -d kthread_t t_procp->p_user.u_comm
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "nscd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.startd" ]
t_procp->p_user.u_comm = [ "svc.startd" ]
...
Alternatively, the proc structure contains a pointer to a pid structure, p_pidp which gives us access to the process id via pid_id:
::walk thread thr |::print kthread_t t_schedflag|::grep .==4000 |::eval <thr=K |::print -d kthread_t t_procp->p_pidp->pid_id
t_procp->p_pidp->pid_id = 0t10199
t_procp->p_pidp->pid_id = 0t10213
t_procp->p_pidp->pid_id = 0t10661
t_procp->p_pidp->pid_id = 0t10661
t_procp->p_pidp->pid_id = 0t10661
t_procp->p_pidp->pid_id = 0t10661
t_procp->p_pidp->pid_id = 0t10661
OK, so we got there, command names and process ids for the swapped out processes, if we go with the assumption around TS_RUNQMATCH. However, since drafting my notes for this blog entry I have found that a cleaner and possibly more reliable way of identifying the processes/threads that are swapped out is by looking for threads that are NOT in memory. The schedflag value TS_LOAD would be most helpful here, so the best option would be to search for threads whose schedflag is NOT equal to TS_LOAD.
#define TS_LOAD 0x0001 /* thread is in memory */
How? By using the bitwise AND operator in mdb (&) and AND’ing the schedflag with 1. If the result is zero, then the value in schedflag must have been zero also (TS_LOAD not set).
> ::walk thread thr |::print -d kthread_t t_schedflag | ::grep '(.&1)==0' |::eval <thr=K |::print -d kthread_t t_procp->p_user.u_comm
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.configd" ]
t_procp->p_user.u_comm = [ "svc.startd" ]
t_procp->p_user.u_comm = [ "drd" ]
t_procp->p_user.u_comm = [ "syseventd" ]
t_procp->p_user.u_comm = [ "syseventd" ]
t_procp->p_user.u_comm = [ "syseventd" ]
This appears to give the same answer as my previous attempt, and gives the same number of swapped out threads (91);
bash-3.00# echo "::walk thread thr |::print -d kthread_t t_schedflag | ::grep '(.&1)==0' |::eval <thr=K |::print -d kthread_t t_procp->p_user.u_comm" | mdb -k |wc -l
91
bash-3.00# echo "::walk thread thr |::print kthread_t t_schedflag|::grep .==4000" | mdb -k |wc -l
91
I really should write a wrapper around this, but for now the blog entry is enough. Hopefully this has been useful, if somewhat verbose. It has been as much about my education as anything else. I need to apologise too, this is even more dull than my last post.
Paul.