Discussion:
SCO spending lots of time in spin locks
(too old to reply)
Brad Guillory
2005-07-13 14:31:47 UTC
Permalink
If I should go elsewhere I apologize in advance; please just point me
in the right direction.

There exists a SCO box:
# uname -X

System = SCO_SV
Node = *HIDDEN*
Release = 3.2v5.0.6
KernelID = 2000-07-27
Machine = Xeon
BusType = ISA
Serial = *HIDDEN*
Users = 255-user
OEM# = 0
Origin# = 1
NumCPU = 1

It runs a single curses based application (that my company maintains)
for about 250 users. After upgrading the application this weekend the
average "load" during peak hours jumped from 1.2 to 15+ (yes more than
10 fold).

I went through the sar -A report and found the following differences
between Friday (before the upgrade) and Tuesday (after the upgrade):

(forgive me if this is too verbose)

00:00:00 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
(-c)
FRI 10:00:00 58500 5777 2442 1305.29 11.28 1243788 35388
TUE 10:00:00 275583 3405 540 13.51 12.07 1641830 33363

As you can see the scall/s is much higher, but the characters read per
call to read is is higher (better) same for the writes. I figure that
this may be caused by longer times between calls to read() and write().
I am trying to account for the spike in scall/s.

scall/s
total number of system calls per second

sread/s
number of read(S) calls per second

swrit/s
number of write(S) calls per second

fork/s number of fork(S) calls per second

exec/s number of exec(S) calls per second

rchar/s
number of characters read per second (by read)

wchar/s
number of characters written per second (by write)


00:00:00 vflt/s pflt/s pgfil/s rclm/s (-p)
FRI 10:00:00 4070.80 19542.89 0.00 0.00
TUES 10:00:00 114.18 387.32 0.03 0.00

This seems interesting but I have no idea what would cause it.

vflt/s address translation page faults (valid page not in
memory)
per second

pflt/s page faults per second caused by attempts to write to
a page
marked ``copy-on-write'' (COW), or by protection
errors
(illegal access to page)

pgfil/s
address translation faults per second satisfied by
paging in
from filesystem

rclm/s pages added to the free list per second


I relinked the kernel and enabled prfld to obtain the following stats:

spltty 30.00
splx 35.16
sys_call 5.73
user 14.18

Can anyone point me to the next step? How can I find what library
calls in turn call spltty?

TIA, Brad Guillory
Robert Lipe
2005-07-15 05:48:26 UTC
Permalink
Post by Brad Guillory
It runs a single curses based application (that my company maintains)
for about 250 users. After upgrading the application this weekend the
average "load" during peak hours jumped from 1.2 to 15+ (yes more than
To chase this you need to look at what the application is doing as it's
seemingly doing something unpleasant such as a zero byte write or a read
of zero bytes with a short vmin/vtime delay or something.

Use scotruss to figure out what the app is doing.
Post by Brad Guillory
00:00:00 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
(-c)
FRI 10:00:00 58500 5777 2442 1305.29 11.28 1243788 35388
TUE 10:00:00 275583 3405 540 13.51 12.07 1641830 33363
More than 5x as many system calls per second? Oh, dear. The first is
suprisingly high for a 250 user system that's doing curses stuff and
therefore presumably spending most of its life waiting on humans. The
second is, well, 5x as high.

What's up with 1300 forks a second? That's pretty atypical for the kind
of workload you're describing.

RJL
Bill Campbell
2005-07-15 15:32:25 UTC
Permalink
Post by Robert Lipe
Post by Brad Guillory
It runs a single curses based application (that my company maintains)
for about 250 users. After upgrading the application this weekend the
average "load" during peak hours jumped from 1.2 to 15+ (yes more than
To chase this you need to look at what the application is doing as it's
seemingly doing something unpleasant such as a zero byte write or a read
of zero bytes with a short vmin/vtime delay or something.
Use scotruss to figure out what the app is doing.
Post by Brad Guillory
00:00:00 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
(-c)
FRI 10:00:00 58500 5777 2442 1305.29 11.28 1243788 35388
TUE 10:00:00 275583 3405 540 13.51 12.07 1641830 33363
More than 5x as many system calls per second? Oh, dear. The first is
suprisingly high for a 250 user system that's doing curses stuff and
therefore presumably spending most of its life waiting on humans. The
second is, well, 5x as high.
What's up with 1300 forks a second? That's pretty atypical for the kind
of workload you're describing.
I've seen this type of behaviour when running software written in
brain-dead languages like BASIC where they're doing some kind of
polling for character input.

Bill
--
INTERNET: ***@Celestial.COM Bill Campbell; Celestial Software LLC
UUCP: camco!bill PO Box 820; 6641 E. Mercer Way
FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
URL: http://www.celestial.com/

``The end move in politics is always to pick up a gun.''
-- Buckminster Fuller
m***@poista.helsinki.fi.invalid
2005-07-19 18:59:16 UTC
Permalink
Post by Brad Guillory
If I should go elsewhere I apologize in advance; please just point me
in the right direction.
# uname -X
System = SCO_SV
Node = *HIDDEN*
Release = 3.2v5.0.6
KernelID = 2000-07-27
Machine = Xeon
BusType = ISA
Serial = *HIDDEN*
Users = 255-user
OEM# = 0
Origin# = 1
NumCPU = 1
It runs a single curses based application (that my company maintains)
for about 250 users. After upgrading the application this weekend the
average "load" during peak hours jumped from 1.2 to 15+ (yes more than
10 fold).
I had several situations likes this:
1. Application(s) main process interacts with user and uses
curses
2. Main process fork() another process (which should be in
background)
3. Main process dies but child process still goes on
4. Child process thinks it has user interface and simply goes
endless loop of calling getch()

Actually that is effect improper coding with curses. This maybe or not
your situation.
Post by Brad Guillory
I went through the sar -A report and found the following differences
(forgive me if this is too verbose)
00:00:00 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
(-c)
FRI 10:00:00 58500 5777 2442 1305.29 11.28 1243788 35388
TUE 10:00:00 275583 3405 540 13.51 12.07 1641830 33363
At this point I have seen same effect within scall/s, what I above
described (program in curses tries read stdin without it and it don't
care error code returned from getch()).
Post by Brad Guillory
As you can see the scall/s is much higher, but the characters read per
call to read is is higher (better) same for the writes. I figure that
this may be caused by longer times between calls to read() and write().
I am trying to account for the spike in scall/s.
scall/s
total number of system calls per second
sread/s
number of read(S) calls per second
swrit/s
number of write(S) calls per second
fork/s number of fork(S) calls per second
exec/s number of exec(S) calls per second
rchar/s
number of characters read per second (by read)
wchar/s
number of characters written per second (by write)
00:00:00 vflt/s pflt/s pgfil/s rclm/s (-p)
FRI 10:00:00 4070.80 19542.89 0.00 0.00
TUES 10:00:00 114.18 387.32 0.03 0.00
These numbers (should) show that application is much efficent, but it
just maybe illusination. Explanation below from man pages (as you have
Post by Brad Guillory
This seems interesting but I have no idea what would cause it.
vflt/s address translation page faults (valid page not in
memory)
per second
Application memory pages has paged to disk (swap area) and now the
appilication wants it back to main memory.
Post by Brad Guillory
pflt/s page faults per second caused by attempts to write to
a page
marked ``copy-on-write'' (COW), or by protection
errors
(illegal access to page)
Fork uses COW: When application calls fork() data pages are actually
duplicated. Those are duplicated when child (or mother too?) are
changing the data. (Historically explanation: most cases of fork() there
are exec() so data the duplicated data areas are needless overhead to
the system - in case of exec() those would be release and replaced...)

This was only quick explanation and may be partially wrong - but you
should get the idea (especially if you are programmer ;) )
Post by Brad Guillory
pgfil/s
address translation faults per second satisfied by
paging in
from filesystem
rclm/s pages added to the free list per second
spltty 30.00
splx 35.16
sys_call 5.73
user 14.18
Sorry, I have no experience with prfld.
Post by Brad Guillory
Can anyone point me to the next step? How can I find what library
calls in turn call spltty?
Hope so, that somebody do so ;)
--
Money, it's a crime
-------------------------------------------------------------------------------
Mika Reunanen | Homepage:
Mika . Reunanen @ helsinki . fi | http://www.helsinki.fi/~mreunane/
-------------------------------------------------------------------------------
Loading...