Sometimes I ask myself “how did that work again”, so I decided to document this every time I have this feeling. With some links to the documentation, easy commands,… you got the picture.
First one today, new customer, new environment, to get some feeling with the cells, I used cellsrvstat.
Documentation reference (here ). Cellsrvstat is also part of the exawatcher on the cells.
A basic overview of the command. If you log on to the cells as root, it is in your $PATH. But in case you’re looking for it, it’s stored in /opt/oracle/cell<version>/cellsrv/bin/
So basics first, what can it do:
# cellsrvstat -h
LRM-00101: Message 101 not found; No message file for product=ORACORE, facility=LRM
cellsrvstat [-stat_group=<group name>,<group name>,]
[-stat=<stat name>,<stat name>,] [-interval=<interval>]
[-count=<count>] [-table] [-short] [-list]
stat A comma separated list of short strings representing
the stats. Default is all. (unless -stat is specified).
The -list option displays all stats.
stat_group A comma separated list of short strings representing
stat groups. Default: all except database
(unless -stat_group is specified).
The -list option displays all stat groups.
The valid groups are: io, mem, exec, net,
smartio, flashcache, offload, database.
A comma separated list of short strings representing
offload group names.
Default: cellsrvstat -stat_group=offload
(all offload groups unless -offload_group_name is specified).
database_name A comma separated list of short strings representing
database group names.
Default: cellsrvstat -stat_group=database
(all databases unless -database_name is specified).
interval At what interval the stats should be obtained and
printed (in seconds). Default is 1 second.
count How many times the stats should be printed.
Default is once.
list List all metric abbreviations and their descriptions.
All other options are ignored.
table Use a tabular format for output. This option will be
ignored if all metrics specified are not integer
short Use abbreviated metric name instead of
error_out An output file to print error messages to, mostly for
In non-tabular mode, The output has three columns. The first column
is the name of the metric, the second one is the difference between the
last and the current value(delta), and the third column is the absolute value.
In Tabular mode absolute values are printed as is without delta.
cellsrvstat -list command points out the statistics that are absolute values
So it can display all kind of information about your cell status, which can be helpful to see what’s going on. So let’s do the list: (warning: awful lot of info! But i’ll cut out some of the rows, but if you execute it, be prepared for a long list)
[root@dm06celadm01 ~]# cellsrvstat -list
io Input/Output related stats
mem Memory related stats
exec Execution related stats
net Network related stats
smartio SmartIO related stats
flashcache FlashCache related stats
health Cellsrv health/events related stats
offload Offload server related stats
database Database related stats
ffi FFI related stats
lio LinuxBlockIO related stats
mpp Reverse Offload related stats
Sparse Sparse stats
[ * - Absolute values. Indicates no delta computation in tabular format]
io_nbiorr_hdd Number of hard disk block IO read requests
io_nbiowr_hdd Number of hard disk block IO write requests
io_nbiorb_hdd Hard disk block IO reads (KB)
io_nbiowb_hdd Hard disk block IO writes (KB)
io_nbiorr_flash Number of flash disk block IO read requests
io_nbiowr_flash Number of flash disk block IO write requests
io_nbiorb_flash Flash disk block IO reads (KB)
io_nbiowb_flash Flash disk block IO writes (KB)
io_ndioerr Number of disk IO errors
io_ltow Number of latency threshold warnings during job
io_ltcw Number of latency threshold warnings by checker
io_ltsiow Number of latency threshold warnings for smart IO
io_ltrlw Number of latency threshold warnings for redolog writes
mpp_nr_blcc Num of reqs not pushed due to low cell cpu (C)
mpp_nr_bhcon Num of reqs not pushed due to high cell outnet (C)
mpp_nr_bhrnin Num of reqs not pushed due to high db node innet (C)
mpp_nincr_mb Num rate increase by reverse offload info from db (C)
mpp_ndecr_mb Num rate decrease by reverse offload info from db (C)
mpp_nincr_rn Num rate increases from db node cpu information (C)
mpp_ndecr_rn Num rate decreases from db node cpu information (C)
mpp_ndecr_ccpu Num rate decreases from low cell cpu utilization (C)
mpp_ndecr_con Num rate decreases from high cell outnet util (C)
mpp_ndecr_rn_in Num rate decreases from high db node innet util (C)
sparse_ncb num buckets compacted by sparse HT background scan
sparse_ios num IOs with sparse regions
sparse_ios_kb Total sparse IOs (KB)
sparse_smartio Total redirected smart ios (KB)
Let’s say you’re only interested in the io related things you could use a stat_group:
[root@dm06celadm01 ~]# cellsrvstat -stat_group io
===Current Time=== Tue Feb 21 11:29:39 2017
== Input/Output related stats ==
Number of hard disk block IO read requests 0 2226820445
Number of hard disk block IO write requests 0 1033312850
Hard disk block IO reads (KB) 0 1909110664882
Hard disk block IO writes (KB) 0 199121447989
Number of flash disk block IO read requests 0 14301322886
Number of flash disk block IO write requests 0 1008668696
Flash disk block IO reads (KB) 0 789129901568
Flash disk block IO writes (KB) 0 52097067586
Number of disk IO errors 0 0
Number of latency threshold warnings during job 0 1081
Number of latency threshold warnings by checker 0 0
Number of latency threshold warnings for smart IO 0 0
Number of latency threshold warnings for redolog writes 0 0
Current read block IO to be issued (KB) 0 0
Total read block IO to be issued (KB) 0 599867955384
Current write block IO to be issued (KB) 0 0
Total write block IO to be issued (KB) 0 197822797002
Current read blocks in IO (KB) 0 0
Total read block IO issued (KB) 0 599867955384
Current write blocks in IO (KB) 0 0
Total write block IO issued (KB) 0 197822797002
Current read block IO in network send (KB) 0 0
Total read block IO in network send (KB) 0 599867955384
Current write block IO in network send (KB) 0 0
Total write block IO in network send (KB) 0 197822797002
Current block IO being populated in flash (KB) 0 2765920
Total block IO KB populated in flash (KB) 0 32844047616
I/Os queued in IORM for hard disks 0 0
I/Os queued in IORM for flash disks 0 0
Last 2 lines are also very interesting, it tells you if IORM is kicking in or not. Might be usefull in some cases. Just saying.
The exec group is also nice. Once again I will cut out some rows, but the last lines are very interesting as well:
[root@dm06celadm01 ~]# cellsrvstat -stat_group exec
===Current Time=== Tue Feb 21 11:30:17 2017
== Execution related stats ==
Incarnation number 0 3
Number of module version failures 0 0
Number of threads working 0 2
Number of threads waiting for network 0 23
Number of threads waiting for resource 0 9
Number of threads waiting for a mutex 0 112
Number of Jobs executed for each job type
CacheGet 0 3123536972
CachePut 0 1031998876
CloseDisk 0 15376502
OpenDisk 0 20379160
ProcessIoctl 0 304858117
PredicateDiskRead 0 7462707
PredicateDiskWrite 0 36539
PredicateFilter 0 24054836
PredicateCacheGet 0 140219901
PredicateCachePut 0 16917010
FlashCacheMetadataWrite 0 0
RemoteListenerJob 0 0
CacheBackground 0 0
RemoteCellMgrService 0 0
CopyFromRemote 0 30925
sparse_bootstrap 0 0
sparse_free_region 0 0
DelegateIO 0 62678
NetworkPoll 0 0
CopySIFromRemote 0 550
SIGetJob 0 720
NetworkDirectoryGC 0 0
SQL ids consuming the most CPU
INT99 dxpwsgys5za27 3
END SQL ids consuming the most CPU
This tells me which database is asking the most cpu for which query. Might be usefull in some cases. Remember… in an idle environment and you do something, … then you’re automatically the “top”. But if suspecting things, it’s worth to have a look, it might help.
As always, questions, remarks? find me on twitter @vanpupi