28 August 2008

V$SQL

We are told that all of the V$ database views are dynamic, so this one is fairly obvious but very handy. Lets say we have a situation where, for example, somebody has kicked off a monster SELECT statement and you need to know how far the query has got. One way of checking is via the V$SQL view where it turns out that the ROWS_PROCESSED column is updated real time.

For example:



So here, we can see that by running this query several times we can track the number of rows processes (returned) by the SQL in question. If you have up to date optimizer stats, you can compare this against NUM_ROWS column from DBA_TABLES to see how far the query has to run before completion.

10 August 2008

Controlling Data Guard with DGMGRL.

The obvious benefit of Dataguard Broker is the ease of which switchover and failover operations can be performed. The other benefits include managing Data Guard operations via Grid Control plus the general reporting features. Having said that, a lot of DBA's (including myself) still like to use the command line interface to manage our databases. With that in mind, here is a quick guide to the most common commands used to manage Standby and Primary databases via the Data Guard Broker DGMGRL command line tool.

Manage the database state.
Your Dataguard configuration will consist of a Primary database and of course one or more Standby databases. For my setup, I have the following databases in place. Obviously your going to have to substitute your own database names in here:

1 x Primary Oracle database: "ORCL_DG1"
1 x Standby Oracle database: "ORCL_DG2"

Once up and running, DG Broker Primary databases can be in one of the following known states:

"ONLINE", "OFFLINE" or "LOG-TRANSPORT-OFF".

The Standby can also be deemed either:

"READ-ONLY", "OFFLINE", "ONLINE" or "LOG-APPLY-OFF ".

At some point, you will need to switch either the Primary or Standby states or settings:

To shut down the Standby, and alter the Primary's transportation settings to reflect:

DGMGRL> edit database 'ORCL_DG2' set state='OFFLINE';

To Shut down the Primary database:

DGMGRL> edit database 'ORCL_DG1' set state='OFFLINE';

To open the Standby database in READ ONLY mode:

DGMGRL> edit database 'ORCL_DG1' set state='READ-ONLY'
;

To switch the Standby database back to online recovery mode:

DGMGRL> edit database 'ORCL_DG2' set state='ONLINE';

To switch off Redo Transportation from the Primary to the Standby:

DGMGRL> edit database 'ORCL_DG1' set state='LOG-TRANSPORT-OFF';

To switch redo transportation back on:

DGMGRL> edit database 'ORCL_DG1' set state='ONLINE';

To edit specific database properties, for example, set the database archive lag target:

DGMGRL> edit database 'ORCL_DG1' set property 'ArchiveLagTarget'=1200;

To disable the entire broker configuration:

DGMGRL> disable configuration;

To enable the entire configuration:

DGMGRL> enable configuration;

Dataguard reporting.
One thing I do like about Dataguard broker is its reporting capabilities. By default, Grid Control provides some great graphs for visualising performance and configuration. In addition, DGMGRL provides some pretty helpful detail too:

To view the basic broker configuration:

DGMGRL> show configuration;

To view basic detail about a specific database in the configuration:

DGMGRL> show database ORCL_DG1;

DGMGRL> show database ORCL_DG2;

To view in depth detail about a specific database and its current state:

DGMGRL> show database verbose ORCL_DG1;

DGMGRL> show database verbose ORCL_DG2;

To display the current status and any configuration errors:

DGMGRL> show database 'ORCL_DG1' StatusReport;

DGMGRL> show database 'ORCL_DG2' StatusReport;

To provide detail of transportation and apply operations, state and any delays:

DGMGRL> show database 'ORCL_DG1' SendQEntries;

DGMGRL> show database 'ORCL_DG2' RecvQEntries;

To view the current database Top Wait Events:

DGMGRL> show database 'ORCL_DG1' TopWaitEvents;

DGMGRL> show database 'ORCL_DG2' TopWaitEvents;

Role reversal / failover.
Hopefully, you will not need to perform too many of these, but if you do, then the commands are pretty straight forward (which is what we need!):

To perform a Role Reversal:

DGMGRL> switchover to 'ORCL_DG2';

To perform a Failover:

DGMGRL> failover to 'ORCL_DG2';

Getting help.
Aside from the manuals, help can be provided by the aptly named HELP command. Apart from using it to look up all of that lengthy syntax, I find a combination of the help command plus cut and paste is a good way to avoid any DGMGRL related RSI problems....

23 July 2008

Locking schema statistics.

One nice Oracle 10g feature is automatic optimizer statistics gathering.

Out of the box, by calling the GATHER_STATS_JOB during maintenance windows Oracle gathers fresh optimizer statistics for tables where it deems that the current statistics are "stale" (statistics are labelled stale once 10% or more of underlying table rows have been modified in some way). Overall, this is pretty neat. You can even modify the maintenance window start / stop times via Grid Control to suit your specific needs.

However, what about if you want Oracle to perform this automatic statistics gathering for all but a few of your database tables? You may have a few special cases that get loaded / truncated at certain times and therefore you wish to control statistics gathering at certain times, based on your application requirements.

Luckily, the DBMS_STATS package allows us to lock statistics for specific tables or even an entire schema. If you are interested in this, then take a look at the following procedures:

DBMS_STATS.LOCK_SCHEMA_STATS
DBMS_STATS.LOCK_TABLE_STATS
DBMS_STATS.UNLOCK_SCHEMA_STATS
DBMS_STATS.UNLOCK_TABLE_STATS

DBMS_STATS allows you (the DBA or Developer) to unlock a table(s) statistics, gather fresh statistics and then finally lock them up again.

With this kind of approach, the GATHER_STATS_JOB job will automatically gather statistics for most of your database tables, but skip the ones with the locked statistics. This can help protect against any unwanted statistic related performance surprises, something I am always keen to avoid.

14 July 2008

Wait event classifications.

This mini investigation stemmed from a discussion I was having the other day with one of my DBA colleagues:

"Can any of the Data Guard wait events be classified as Idle ?"

Oracle 10g has a total of 878 wait events. Further more, each wait event has been assigned to one of the following classification groups:

Concurrency
System I/O
User I/O
Administrative
Other
Configuration
Scheduler
Cluster
Application
Idle
Network

The data dictionary view, V$EVENT_NAME provides us with all of the wait event names and their related group information. So, to list all of the "Idle" events, we can issue the following SQL:

SELECT name FROM v$event_name
WHERE wait_class='Idle';


As it turns out, there are 62 Idle events, three of which are related to Data Guard:

LNS ASYNC archive log,
LNS ASYNC dest activation,
LNS ASYNC end of log.

Of the 810 remaining "non-Idle" events, just one is classified as "Commit" (log file sync), 590 are classified as "Other" and finally, the most painful sounding is the RAC related "Wait for split-brain resolution" - ouch!

As a side note, Oracle 9i does not have wait event classifications defined by the database itself. However, the Statspack utility does provide similar detail via its STATS$IDLE_EVENT lookup table (which is typically owned by the Perfstat user).

1 July 2008

Enterprise Manager Hang analysis.

In the past, I have seen database locking cause very serious problems. In several cases, databases have become unusable as hundreds of sessions hang, waiting on one or more culprits who are holding a lock on an application "master" table. As a DBA, you need to delve in quickly and work out what is going on as soon as possible. I normally have a bunch of hand written "DBA SQL" scripts ready to use for such occasions. However, there is another (in my opinion, better) approach, the Grid Control "Hang Analysis" page.

Firstly, things have improved over the last few years with this issue in mind. Grid Control can alert us when locking/blocking thresholds are hit. However with the Hang Analysis page, we can quickly see which user session is holding the lock and take action.

To access the Hang Analysis page, connect up to Grid Control, select the database in question, and then select the Performance tab. Scroll to the bottom of the screen and you will find the link labeled "Hang Analysis". When all is good with your system, the page will look fairly plain and boring:




However, if you have a locking issue on your hands, then you will begin to see the page (which is automatically refreshed) provide details of the session holding the lock, plus all of those waiting:





Here, we can see that SID 132 is causing all of our problems and the victims are 149,131,136 and 134 - they are just waiting for 132 to release his/her/its lock(s) before they can continue processing.

At this stage, the Hang Analysis page will highlight the sessions in green, yellow or red. Red being the most serious situation (meaning the sessions are actually hung). At this point, we can click on the red blocker icon for SID 132 and gain the detail we really need and take some action:




We can click on "View Session Details" to find out a extra myriad of detail such as DB User Name, OS Process ID's, Wait States etc:





Finally, by clicking on "Previous SQL" link we can see what on earth our blocker is running:





Having looked at all of that useful information, we can now either call up our user and politely ask them to commit or rollback. More than likely though, if your database has ground to a halt, then return to the previous screen and hit the big nice "Kill Session" button. Most useful!

Now whoever said that us DBA's aren't a ruthless bunch!

16 June 2008

The database black box.

This has to be one of the craziest setups I have heard of in a while...

One of my developer mates was telling me about a new company he is working for and predictably, I had to ask him which database system they are using. As it turns out, his new shop develops under Microsoft SQL Server, but release into Oracle for production. Wow! Now to me, that's just plain madness! It's also pushing what's known as the "Database Black Box" concept to the limit!

Now, I could rant on for a few pages here about why you should have some idea of what is actually going on in your databases engine room, but I would rather point you to a great book on the subject, "Effective Oracle by Design", by Tom Kyte.

As we all know, Tom is a legend in terms Oracle and always provides us with great information and advice. Ultimately, you can take Tom's word for it. Alternatively, you can spend thousands and thousands of Pounds, Euros and Dollars on us DBA's, asking us to come in and solve all of your production performance problems...... Hang on a minute, actually, forget what I said about the book, its erm, just gone out of print.... :)

13 June 2008

ORA-1000 maximum open cursors exceeded.

This is a fairly common Oracle error with one easy fix - up the open_cursors parameter limit in your init.ora or spfile and then go get on with something a bit more interesting. By doing this though, you may be masking over possible greater issues which may come back to bite, normally on a Friday afternoon at around 4:30pm.....

Its always worth delving a little deeper and investigate why the session in question has hit this limit. In the case I was dealing with today, the session in question was running a mass of un-sharable SQL. In fact it had around 7000 SQL statements stored up in the Shared Pool, all syntactically the same except for the final AND clause.

Something along the lines of:

SELECT * FROM apptab WHERE dept=:B1 and ID=543543
SELECT * FROM apptab WHERE dept=:B1 and ID=543544
SELECT * FROM apptab WHERE dept=:B1 and ID=543545

and so on.

The fix was easy, implement another bind variable into the code and pass in the literal values on each execution, such as:

SELECT * FROM apptab WHERE dept=:B1 and ID=:B2

However, the main point is that if I had bumped the open_cursors parameter up and beyond 7000, my problem would appear to go away. However the real issue would be left undiscovered, chewing up and fragmenting my Shared Pool, eventually leading on to those dreaded ORA-04031's.

Ultimately, that's why I like being a DBA - investigating and digging around is the name of the game - One could almost say its a job where you have to be like Columbo (well, almost, perhaps without the dodgy rain coat and cigars).