2008-07-25

DBAs little helper

You might know this situation: Your users call and complain, the DB is hanging. While trying to connect (and making plans what to check first, sessions, waits, locks, ...) you do not get any prompt. Ok, there would be a lot of things you get in mind to check, analyze and then fix. But in fact, the users want their connections back, don't care about current open sessions and just ask to reboot the instance.
Even reboot is not that simple in this situation (have I mentioned the missing prompt in sql*plus?). So the next way is to try kill on os-level (which doesn't do anything at the end, as the process seems to wait for something on OS). A much more entertaining way is kill -9. I hope, if I kill a vital kernel process, the others will do the cleanup and close everything the best way they can (similar to a shutdown abort). But even this doesn't happen. The others are hanging around and doing nothing. At the end I have to kill all processes using -9.
There is just one more thing to do: startup. but, unfortunately, at the login there is one thing missing: 'Connecting to an idle instance' - and a 'startup' just hangs.
Why? Because no one did the cleanup! Do you remember the kill -9 of all, also the last background process? Doing so, I did not gave any process the chance to release the semaphore and the shared memory segments. They are just hanging around. Even no one uses them, sqlplus belives because they exists, there is a running system. So it connects to them, sends messages, but noone ever listens. this makes this sqlplus-instance also hanging.
How can i do the cleanup manually?
The old way, back in those days, when Versions did not contain characters like i or g, is to search all other instances for their ressources using oradebug ipc, and afterwards removing them with ipcs. If anything went wrong - another DB crashed very beautiful and interresting.
The new way, introduced in this millenium, is sysresv, which you can find in $ORACLE_HOME/bin. With this, you can show the semaphor and shared memory segments used by your current os-user, $ORACLE_HOME and $ORACLE_SID (or for multiple SIDs). It can also release the ressources for a given SID. This makes live much easier and more secure in situations like shown above.
I doesn't know why Oracle doesn't mention this binary in its documentation.

Keine Kommentare: