Browsed by
Category: Howto

Memo to Self: Recap cellsrvstat

Memo to Self: Recap cellsrvstat

Sometimes I ask myself “how did that work again”, so I decided to document this every time I have this feeling. With some links to the documentation, easy commands,… you got the picture.

First one today, new customer, new environment, to get some feeling with the cells, I used cellsrvstat.

Documentation reference (here ). Cellsrvstat is also part of the exawatcher on the cells.

A basic overview of the command. If you log on to the cells as root, it is in your $PATH. But in case you’re looking for it, it’s stored in /opt/oracle/cell<version>/cellsrv/bin/

So basics first, what can it do:

# cellsrvstat -h
LRM-00101: Message 101 not found; No message file for product=ORACORE, facility=LRM
Usage:
cellsrvstat [-stat_group=<group name>,<group name>,]
[-offload_group_name=<offload_group_name>,]
[-database_name=<database_name>,]
[-stat=<stat name>,<stat name>,] [-interval=<interval>]
[-count=<count>] [-table] [-short] [-list]

stat A comma separated list of short strings representing
the stats. Default is all. (unless -stat is specified).
The -list option displays all stats.
Example: -stat=io_nbiorr_hdd,io_nbiowr_hdd
stat_group A comma separated list of short strings representing
stat groups. Default: all except database
(unless -stat_group is specified).
The -list option displays all stat groups.
The valid groups are: io, mem, exec, net,
smartio, flashcache, offload, database.
Example: -stat_group=io,mem
offload_group_name
A comma separated list of short strings representing
offload group names.
Default: cellsrvstat -stat_group=offload
(all offload groups unless -offload_group_name is specified).
Example: -offload_group_name=SYS_121111_130502
database_name A comma separated list of short strings representing
database group names.
Default: cellsrvstat -stat_group=database
(all databases unless -database_name is specified).
Example: -database_name=testdb,proddb
interval At what interval the stats should be obtained and
printed (in seconds). Default is 1 second.
count How many times the stats should be printed.
Default is once.
list List all metric abbreviations and their descriptions.
All other options are ignored.
table Use a tabular format for output. This option will be
ignored if all metrics specified are not integer
based metrics.
short Use abbreviated metric name instead of
descriptive ones.
error_out An output file to print error messages to, mostly for
debugging.

In non-tabular mode, The output has three columns. The first column
is the name of the metric, the second one is the difference between the
last and the current value(delta), and the third column is the absolute value.
In Tabular mode absolute values are printed as is without delta.
cellsrvstat -list command points out the statistics that are absolute values


[root@dm06celadm01 ~]#

So it can display all kind of information about your cell status, which can be helpful to see what’s going on. So let’s do the list: (warning: awful lot of info! But i’ll cut out some of the rows, but if you execute it, be prepared for a long list)

[root@dm06celadm01 ~]# cellsrvstat -list
Statistic Groups:
io Input/Output related stats
mem Memory related stats
exec Execution related stats
net Network related stats
smartio SmartIO related stats
flashcache FlashCache related stats
health Cellsrv health/events related stats
offload Offload server related stats
database Database related stats
ffi FFI related stats
lio LinuxBlockIO related stats
mpp Reverse Offload related stats
Sparse Sparse stats

Statistics:
[ * - Absolute values. Indicates no delta computation in tabular format]

io_nbiorr_hdd Number of hard disk block IO read requests
io_nbiowr_hdd Number of hard disk block IO write requests
io_nbiorb_hdd Hard disk block IO reads (KB)
io_nbiowb_hdd Hard disk block IO writes (KB)
io_nbiorr_flash Number of flash disk block IO read requests
io_nbiowr_flash Number of flash disk block IO write requests
io_nbiorb_flash Flash disk block IO reads (KB)
io_nbiowb_flash Flash disk block IO writes (KB)
io_ndioerr Number of disk IO errors
io_ltow Number of latency threshold warnings during job
io_ltcw Number of latency threshold warnings by checker
io_ltsiow Number of latency threshold warnings for smart IO
io_ltrlw Number of latency threshold warnings for redolog writes
...
mpp_nr_blcc Num of reqs not pushed due to low cell cpu (C)
mpp_nr_bhcon Num of reqs not pushed due to high cell outnet (C)
mpp_nr_bhrnin Num of reqs not pushed due to high db node innet (C)
mpp_nincr_mb Num rate increase by reverse offload info from db (C)
mpp_ndecr_mb Num rate decrease by reverse offload info from db (C)
mpp_nincr_rn Num rate increases from db node cpu information (C)
mpp_ndecr_rn Num rate decreases from db node cpu information (C)
mpp_ndecr_ccpu Num rate decreases from low cell cpu utilization (C)
mpp_ndecr_con Num rate decreases from high cell outnet util (C)
mpp_ndecr_rn_in Num rate decreases from high db node innet util (C)
sparse_ncb num buckets compacted by sparse HT background scan
sparse_ios num IOs with sparse regions
sparse_ios_kb Total sparse IOs (KB)
sparse_smartio Total redirected smart ios (KB)
[root@dm06celadm01 ~]#

Let’s say you’re only interested in the io related things you could use a stat_group:

[root@dm06celadm01 ~]# cellsrvstat -stat_group io
===Current Time=== Tue Feb 21 11:29:39 2017

== Input/Output related stats ==
Number of hard disk block IO read requests 0 2226820445
Number of hard disk block IO write requests 0 1033312850
Hard disk block IO reads (KB) 0 1909110664882
Hard disk block IO writes (KB) 0 199121447989
Number of flash disk block IO read requests 0 14301322886
Number of flash disk block IO write requests 0 1008668696
Flash disk block IO reads (KB) 0 789129901568
Flash disk block IO writes (KB) 0 52097067586
Number of disk IO errors 0 0
Number of latency threshold warnings during job 0 1081
Number of latency threshold warnings by checker 0 0
Number of latency threshold warnings for smart IO 0 0
Number of latency threshold warnings for redolog writes 0 0
Current read block IO to be issued (KB) 0 0
Total read block IO to be issued (KB) 0 599867955384
Current write block IO to be issued (KB) 0 0
Total write block IO to be issued (KB) 0 197822797002
Current read blocks in IO (KB) 0 0
Total read block IO issued (KB) 0 599867955384
Current write blocks in IO (KB) 0 0
Total write block IO issued (KB) 0 197822797002
Current read block IO in network send (KB) 0 0
Total read block IO in network send (KB) 0 599867955384
Current write block IO in network send (KB) 0 0
Total write block IO in network send (KB) 0 197822797002
Current block IO being populated in flash (KB) 0 2765920
Total block IO KB populated in flash (KB) 0 32844047616
I/Os queued in IORM for hard disks 0 0
I/Os queued in IORM for flash disks 0 0

[root@dm06celadm01 ~]#

Last 2 lines are also very interesting, it tells you if IORM is kicking in or not. Might be usefull in some cases. Just saying.

The exec group is also nice. Once again I will cut out some rows, but the last lines are very interesting as well:

[root@dm06celadm01 ~]# cellsrvstat -stat_group exec
===Current Time=== Tue Feb 21 11:30:17 2017

== Execution related stats ==
Incarnation number 0 3
Number of module version failures 0 0
Number of threads working 0 2
Number of threads waiting for network 0 23
Number of threads waiting for resource 0 9
Number of threads waiting for a mutex 0 112
Number of Jobs executed for each job type
CacheGet 0 3123536972
CachePut 0 1031998876
CloseDisk 0 15376502
OpenDisk 0 20379160
ProcessIoctl 0 304858117
PredicateDiskRead 0 7462707
PredicateDiskWrite 0 36539
PredicateFilter 0 24054836
PredicateCacheGet 0 140219901
PredicateCachePut 0 16917010
FlashCacheMetadataWrite 0 0
RemoteListenerJob 0 0
CacheBackground 0 0
RemoteCellMgrService 0 0
CopyFromRemote 0 30925
...
sparse_bootstrap 0 0
sparse_free_region 0 0
DelegateIO 0 62678
NetworkPoll 0 0
CopySIFromRemote 0 550
SIGetJob 0 720
NetworkDirectoryGC 0 0

SQL ids consuming the most CPU
INT99 dxpwsgys5za27 3
END SQL ids consuming the most CPU

[root@dm06celadm01 ~]#

This tells me which database is asking the most cpu for which query. Might be usefull in some cases. Remember… in an idle environment and you do something, … then you’re automatically the “top”. But if suspecting things, it’s worth to have a look, it might help.

As always, questions, remarks? find me on twitter @vanpupi

Exadata add a new vm

Exadata add a new vm

Today a customer highlighted me a nice-to-know. If you add a new virtual machine to an exadata ovm cluster, he experienced something odd. It was tested on a “new installation”, so it worked good. Basic steps are:

  • Run over OEDA and add the cluster
  • move the xml-files to the dom0 on the same spot as the original one
  • run install.sh with this config

As this is a good customer he followed the advice of having all passwords changed. The bad thing is … while running install.sh lots of errors on different components where thrown.
The most remarkable, and even the first one thrown, was:

OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException

So after digging around for a while, it turned out that it was due to the “non-default” passwords for root and celladmin.
After changing the root and celladmin passwords back the the wellknown default, the install.sh liked it and gave the expected success message.

Successfully completed execution of step Validate Configuration File [elapsed Time [Elapsed...

The IB switches suffer from this as well. But that’s only faced if you are going to upgrade the IB software. So in order to patch them easily, just temporarily reset the passwords to the default and change them back afterwards.

Python … no not the snake – my very first script

Python … no not the snake – my very first script

Do you know the feeling? “I should do <fill in something cool here>”. Well, I was facing this already a time by learning python. I knew that you could do some cool thing with it, but never pushed myself to do it. Until now! Otn appreciation day! Thanks to Mr Oracle Base  Tim Hall. Some time ago, he launched the idea of otn appreciation day and of course I added my entry as well. You can find it here.

Tuesday 11 october 2016, my entry was scheduled at 08:30 CEST and seeing all the other blog posts, I soon realised that this would be a very nice bunch of information. I started copy pasting the blogs and links, but … As soon as I started doing something else (work!!! ) I missed some. That brought me the idea of creating a script. The idea was simple, log in to twitter, fetch all tweets hashtagged by ThanksOTN and then filtering out the retweets. Simple huh? Then … how to do it? mmm … let’s take the challenge, I’ll do it in python.

The result is here (improvement needed Christian! But that will come in time).
I’m happy to have an omnios (solaris derivate) server at home and there I have python available. So let’s go.

I broke it down into some steps. In order to read a twitter feed you need a twitter application. To do so, surf to https://apps.twitter.com and after loging in, click the create application button.

twitterapp creation

 

I only left the callback url blank. For this purpose, we don’t need it. I think, if I do, please let me know.

 

 

 

 

 

Then all is done. The next screen will give you an overview about the application you just have created and in the tab “keys and access tokens”, you only have to click on one more button “Create my access token”. So that’s it folks, nothing more to be done at twitter side.
Just record following fields:

  • Consumer Key (API Key)
  • Consumer Secret (API Secret)
  • Access Token
  • Access Token Secret

these we need in order to be able to establish a connection with tweepy to twitter.

Then it’s time to write some code! I assume you already have python setup, if not, drop me a mail, comment or tweet and I’ll help you out. So I never ever had done some python scripting so it was googling a bit.
It turned out i’d need tweepy for this task, so it was easy to install. Pip install tweepy was all I needed to enter and confidence was growing, if it’s going to be as easy as this, I’ll be good!

first we need to import some things:

import tweepy
from tweepy import OAuthHandler
import json

Then (at this point) I only needed a main procedure. I call it “main”. maybe obvious but ok 🙂 If you need more procedures, they go right after the import statements.

So the main I created looks like this. The comments are my comment for the blog as well:

def main():

 #Variables that contains the user credentials to access Twitter API
 access_token = "<fill in your own>"
 access_token_secret = "<fill in your own>"
 consumer_key = "<fill in your own>"
 consumer_secret = "<fill in your own>"

 # OAuth process, using the keys and tokens
 # here we create an auth object which uses the tweepy oauthhandler. There we need to pass the consumer key and secret
 # then we need to set the access tokens into the auth object
 auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
 auth.set_access_token(access_token, access_token_secret)

# Creation of the actual interface, using authentication
# here we create the actual connection to twitter and we call it api. It's just a name.
 api = tweepy.API(auth)

# search it and display them after removing the RT @'s
# some variables i picked:
 query = '#thanksOTN'
 max_tweets = 1000

# here i am gathering the tweets into an array and put them in a cursor.
 searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
# this is just "quick'n dirty I figured out in the meanwhile. A better way would be to 
# use templates and fill them in this way, but hey... this works :-)
 print('<html><body>')
# now I have a resultset (searched_tweets) and I will run over them in a for loop.
 for tweet in searched_tweets:
# I'm filtering out the retweets in following line
 if "RT @" not in tweet.text:
## the next line in comment. I added an extra if-clause first to only list tweets which contained "OTN Appreciation day:" in the tweet
# later turned out that not everyone put this one in, so I commented it out.
# if "OTN Appreciation Day:" in tweet.text:
# and here i'm printing the user who has sent the tweet
 print('<p>Twitter user:',tweet.user.screen_name,'<br />')
# and eventually what he has tweeted. This is also a bit Q'n'd I figured out. but the emoji weren't
# displayed correctly in python 3.4, so this is a was to have them parsed to utf8. I think there 
# will be some more efficient ways of doing this, so feel free to comment on this.
 tweet_text = str(tweet.text.encode("utf-8") if tweet.text else tweet.text).lstrip('b\'')
# and finally print the tweet.
 print(tweet_text,'</p>')
 # and close the webpage 
 print('</body></html>')

# finally call main
if __name__=='__main__':
 main()

And finally  this script was scheduled in crontab every 30 minutes, and redirected to a html file.

I should still create a kind of tokeniser to manipulate the tweet_text in order to make hyperlinks from the links. But hey … that’s something for the future 🙂

So this was my very very very first python script. I think it’s a fun language which I’ll be using more and more.

Comments, advice,… are always welcome!

 

Acfs: it’s all about permissions

Acfs: it’s all about permissions

It all starts with creation of a database on a Database appliance which failed with the error

Validation of server pool succeeded.
Registering database with Oracle Restart
PRCR-1006 : Failed to add resource ora.demodb.db for demodb
PRCR-1071 : Failed to register or update resource ora.demodb.db
CRS-2566: User 'oracle' does not have sufficient permissions to operate on resource 'ora.redo.datastore.acfs', which is part of the dependency specification.
DBCA_PROGRESS : DBCA Operation failed.

 

One of the things … is it due to running on the ODA or is it a general cluster issue?
It was easy to verify as this customer had another ODA on which everything just works smoothly. So we started to compare the environments. One tiny little thing appeared to be different: the ACL.

On a working ODA:

[grid@ODA_A-1 ~]$ crsctl status resource ora.redo.datastore.acfs -p |grep ACL
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,user:oracle:r-x
[grid@ODA_A-1 ~]$ 

 

On this one:

[grid@ODA_B-1 ~]$ crsctl status resource ora.redo.datastore.acfs -p|grep ACL
ACL=owner:root:rwx,pgrp:root:r-x,other::r--
[grid@ODA_B-1 ~]$ 

Sooo there we have it.
The first intention to do is to do a crsctl modify or a crsctl setperm.
Let’s switch to a demo system as this is acfs and not oda related.

So it’s playtime!
On the demo environment we have an acfs volume:

[root@demo-rac12-01 ~]# crsctl status resource ora.dg_advm.advmvol01.acfs
NAME=ora.dg_advm.advmvol01.acfs
TYPE=ora.acfs.type
TARGET=ONLINE , ONLINE , ONLINE
STATE=ONLINE on demo-rac12-01, ONLINE on demo-rac12-02, ONLINE on demo-rac12-03

[root@demo-rac12-01 ~]#

If we verify the ACL we see the same configuration as on the ODA:

[root@demo-rac12-01 ~]# crsctl status resource ora.dg_advm.advmvol01.acfs -p |grep ACL
ACL=owner:root:rwx,pgrp:root:r-x,other::r--
[root@demo-rac12-01 ~]#

Yes I know, I did this as root and you could get this information as grid as well.
So let’s do the instinctive thing and try to modify the resource:

[root@demo-rac12-01 ~]# crsctl modify resource ora.dg_advm.advmvol01.acfs -attr "ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:oracle:r-x'"
CRS-4995:  The command 'Modify  resource' is invalid in crsctl. Use srvctl for this command.
[root@demo-rac12-01 ~]#

And now we have to be careful with googling things. If you start googling this error, you will find several pages suggesting to use the -unsupported flag. But there is no reason to do so 🙂
By the way, this same errors is thrown to you if you try to crsctl setperm.

Let’s assume the cluster is right (he mostly is), then a srvctl modify must exist and indeed there is!

[root@demo-rac12-01 ~]# srvctl modify filesystem -h

Modifies the configuration for the file system.

Usage: srvctl modify filesystem -device <volume_device> [-user {[/+ | /-]<user> | <user_list>}] [-path <mountpoint_path>] [-node <node_list> | -serverpool <serverpool_list>] [-fsoptions <options>] [-description <description>] [-autostart {ALWAYS|NEVER|RESTORE}] [-force]
-device <volume_device> Volume device path
-user <user>|<user_list> Add (/+) or remove (/-) a single user, or replace the entire set of users (with a comma-separated list) authorized to mount and unmount the file system
-path <mountpoint_path> Mountpoint path
-node <node_list> Comma separated node names
-serverpool <serverpool_list> Comma separated list of server pool names
-fsoptions <fs_options> Comma separated list of file system mount options
-description <description> File system description
-autostart {ALWAYS|NEVER|RESTORE} File system autostart policy
-force Force modification (ignore dependencies)
-help Print usage
[root@demo-rac12-01 ~]#

So it seems we need to find out which device we’re using. This is simple:

[root@demo-rac12-01 ~]# crsctl status resource ora.dg_advm.advmvol01.acfs -p |grep VOLUME_DEVICE
CANONICAL_VOLUME_DEVICE=/dev/asm/advmvol01-438
VOLUME_DEVICE=/dev/asm/advmvol01-438
[root@demo-rac12-01 ~]#

There we have it. So now it ‘s just syntax. Remember the difference in ACL, so we need to add user:oracle:r-x and sometimes we’re lucky, it’s not too hard.

[root@demo-rac12-01 ~]# /u01/app/12.1.0.2/grid/bin/crsctl status resource ora.dg_advm.advmvol01.acfs -p |grep -i acl
ACL=owner:root:rwx,pgrp:root:r-x,other::r--
[root@demo-rac12-01 ~]# /u01/app/12.1.0.2/grid/bin/srvctl modify filesystem -device /dev/asm/advmvol01-438 -user /+oracle
[root@demo-rac12-01 ~]# /u01/app/12.1.0.2/grid/bin/crsctl status resource ora.dg_advm.advmvol01.acfs -p |grep -i acl
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,user:oracle:r-x
[root@demo-rac12-01 ~]# 

Removing it, isn’t too hard either:


[root@demo-rac12-01 ~]# /u01/app/12.1.0.2/grid/bin/srvctl modify filesystem -device /dev/asm/advmvol01-438 -user /-oracle
[root@demo-rac12-01 ~]# /u01/app/12.1.0.2/grid/bin/crsctl status resource ora.dg_advm.advmvol01.acfs -p |grep -i acl
ACL=owner:root:rwx,pgrp:root:r-x,other::r--
[root@demo-rac12-01 ~]#

 

As always, questions, remarks? find me on twitter @vanpupi

Oracle secure backup. It DOES work on windows

Oracle secure backup. It DOES work on windows

A while ago I was asked to implement a filesystem backup on a windows server. How hard could it be and basically it’s not too difficult. Only one minor thing is not too well documented and I would like to highlight this here.

In this particular setup the service was running under the local system account. Personally I would like to see it under a local user account.

In order to be able to perform a succesfull filesystem backup, we need to teach oracle secure backup how to log on to this system.

Two methods can be chosen.

  • adapt the current configured admin user
  • create an extra user for the windows filesystem backups

In this case the second option was chosen for clarity to split the windows filesystem backups from the linux ones. It’s a choice.

osb_configure_user01

 

And in the users screen click “add”.

Choose a username and enter a password and change “NDMP server user” from yes to “no” and click “Apply”

You will be redirected to following screen:

osb_add_user01

2 buttons at the top/bottom are needed now.

First choose the windows-domains button and you see the default:

osb_add_user02

In this case we don’t have a domain account, so we need to teach him to use the local administrator account:

osb_add_user03

Domain name is the hostname of the windows guest and the Administrator is the local administrator and the password is the password for the local administrator.

Then click add and go back to the user configuration.

Click on the “Preauthorized Access” button and edit accordingly

osb_add_user04

 

Then it’s just configuring this client as you would configure another one.

It took me some windows-like thinking while I was expecting this to be in the documentation.

 

As always, questions, remarks? find me on twitter @vanpupi

 

 

mac os x + 12c thin client

mac os x + 12c thin client

Sometimes it’s time to upgrade. So the I decided to add the oracle 12c thin client to my mac a couple of weeks ago. Yes I know, I was very happy with the oldy-but-goodie 10g version, but ok.

The 12c instant client is very easy to install. I remember it took quite some time to install the 10g version, so happy about that. Unzipping the zipfiles, 2 symlinks and then a bit personal config in my .profile. So far so good.

But as soon as I started to use it, I was unable to connect to databases which had the SEC_CASE_SENSITIVE_LOGON parameter on true (the default) and I recommend keeping it that way. 2 weeks ago Oracle released a new version of their instant client and this problem is fixed!
The error which was thrown to you was:

Enter password:
ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name:

So if you encounter the same, check if your client was released before 14th of june 2016.
Only thing which still bothers me is that the thin client is missing a tnsping utility.

It works if you copy it over from the 10g version, but it’s not that nice.

 

As always, questions, remarks? find me on twitter @vanpupi