Saturday, 28 July 2012

Background Processors

PMON – The Process Monitor
This process is responsible for cleaning up after abnormally terminated connections. For example, if your dedicated server 'fails' or is killed for some reason, PMON is the process responsible for releasing your resources. PMON will rollback uncommitted work, release locks, and free SGA resources allocated to the failed process.

In addition to cleaning up after aborted connections, PMON is responsible for monitoring the other Oracle background processes and restarting them if necessary (and if possible). If a shared server or a dispatcher fails (crashes) PMON will step in, and restart another one (after cleaning up for the failed process). PMON will watch all of the Oracle processes, and either restart them or terminate the instance
as appropriate. For example, it is appropriate to restart the instance in the event the database log writer process, LGWR (), fails. This is a serious error and the safest path of action is to terminate the instance immediately and let normal recovery fix up the data. This is a very rare occurrence and should be
reported to Oracle support immediately.

The other thing PMON does for the instance, in Oracle 8i, is to register it with the Net8 listener. When an instance starts up, the PMON process polls the well-known port address (unless directed otherwise) to see whether or not a listener is up and running. The well-known/default port used by Oracle is 1521. Now, what happens if the listener is started on some different port? In this case the mechanism is the same,
except that the listener address needs to be explicitly mentioned in the init.ora parameter via the LOCAL_LISTENER setting. If the listener is started, PMON communicates with the listener and passes to it relevant parameters, such as the service name.

SMON – The System Monitor

SMON is the process that gets to do all of the jobs no one else wants to do. It is a sort of 'garbage collector' for the database. Some of the jobs it is responsible for include:
❑ Temporary space clean up – With the advent of 'true' temporary tablespaces, this job has lessened, but has not gone away. For example, when building an index, the extents allocated for the index during the creation are marked as TEMPORARY. If the CREATE INDEX session is aborted for some reason, SMON is responsible for cleaning them up. There are other operations that create temporary extents that SMON would be responsible for as well.

❑ Crash recovery – SMON is responsible for performing crash recovery of a failed instance, upon restart.

❑ Coalescing free space – If you are using dictionary-managed tablespaces, SMON is responsible for taking extents that are free in a tablespace and contiguous with respect to each other, and coalescing them into one 'bigger' free extent. This occurs only on dictionary managed tablespace with a default storage clause that has pctincrease set to a non-zero value. 

❑ Recovering transactions active against unavailable files – This is similar to its role during database startup. Here SMON recovers failed transactions that were skipped during instance/crash recovery due to a file(s) not being available to recover. For example, the file may have been on a disk that was unavailable or not mounted. When the file does become available, SMON will recover it.

❑ Instance recovery of failed node in OPS – In an Oracle Parallel Server configuration, when a node in the cluster goes down (the machine fails), some other node in the instance will open that failed node's redo log files, and perform a recovery of all data for that failed node.

❑ Cleans up OBJ$ – OBJ$ is a low-level data dictionary table that contains an entry for almost every object (table, index, trigger, view, and so on) in the database. There are many times entries in here that represent deleted objects, or objects that represent 'not there' objects, used in Oracle's dependency mechanism. SMON is the process that removes these no longer needed rows.

❑ Shrinks rollback segments – SMON is the process that will perform the automatic shrinking of a rollback segment to its optimal size, if it is set.

❑ 'Offlines' rollback segments – It is possible for the DBA to 'offline' or make unavailable, a rollback segment that has active transactions. It may be possible that active transactions are using this off lined rollback segment. In this case, the rollback is not really off lined; it is marked as 'pending offline'. SMON will periodically try to 'really' offline it in the background until it can.
That should give you a flavor of what SMON does. As evidenced by the ps listing of processes I introduced above, SMON can accumulate quite a lot of CPU over time (the instance from which ps was taken was an active instance that had been up for well over a month). SMON periodically wakes up (or is woken up by the other backgrounds) to perform these housekeeping chores.


RECO – Distributed Database Recovery

RECO has a very focused job; it recovers transactions that are left in a prepared state because of a crash or loss of connection during a two-phase commit (2PC). A 2PC is a distributed protocol that allows for a modification that affects many disparate databases to be committed atomically. It attempts to close the
window for distributed failure as much as possible before committing. In a 2PC between N databases,one of the databases, typically (but not always) the one the client logged in to initially, will be the coordinator.

This one site will ask the other N-1 sites if they are ready to commit. In effect, this one site will go to the N-1 sites, and ask them to be prepared to commit. Each of the N-1 sites reports back their 'prepared state' as YES or NO. If any one of the sites votes NO, the entire transaction is rolled back. If all sites vote YES, then the site coordinator broadcasts a message to make the commit permanent on
each of the N-1 sites.

If after some site votes YES, they are prepared to commit, but before they get the directive from the coordinator to actually commit, the network fails or some other error occurs, the transaction becomes an in-doubt distributed transaction. The 2PC tries to limit the window of time in which this can occur, but cannot remove it. If we have a failure right then and there, the transaction will become the
responsibility of RECO. RECO will try to contact the coordinator of the transaction to discover its outcome. Until it does that, the transaction will remain in its uncommitted state. When the transaction coordinator can be reached again, RECO will either commit the transaction or roll it back.

It should be noted that if the outage is to persist for an extended period of time, and you have some outstanding transactions, you can commit/roll them back manually yourself. You might want to do this since an in doubt distributed transaction can cause writers to block readers – this is the one time this can happen in Oracle. Your DBA could call the DBA of the other database and ask them to query up
the status of those in-doubt transactions. Your DBA can then commit or roll them back, relieving RECO of this task.

CKPT – Checkpoint Process
The checkpoint process doesn't, as its name implies, do a checkpoint (that's mostly the job of DBWn). It simply assists with the checkpointing process by updating the file headers of the data files. It used to be that CKPT was an optional process, but starting with version 8.0 of the database it is always started, so if
you do a ps on UNIX, you'll always see it there. The job of updating data files' headers with checkpoint information used to belong to the LGWR (Log Writer) however, as the number of files increased along with the size of a database over time, this additional task for LGWR became too much of a burden. If LGWR had to update dozens, or hundreds, or even thousands of files, there would be a good chance sessions waiting to commit these transactions would have to wait far too long. CKPT removes this responsibility from LGWR.

DBWn – Database Block Writer
The Database Block Writer (DBWn) is the background process responsible for writing dirty blocks to disk. DBWn will write dirty blocks from the buffer cache, usually in order to make more room in the cache (to free buffers for reads of other data), or to advance a checkpoint (to move forward the position in an online redo log file from which Oracle would have to start reading, in order to recover the
instance in the event of failure). As we discussed previously, when Oracle switches log files, a checkpoint is signaled. Oracle needs to advance the checkpoint so that it no longer needs the online redo log file it just filled up. If it hasn't been able to do that by the time we need to reuse that redo log file, we get the 'checkpoint not complete' message and we must wait.

As you can see, the performance of DBWn can be crucial. If it does not write out blocks fast enough to free buffers for us, we will see waits for FREE_BUFFER_WAITS and 'Write Complete Waits' start to grow. 

We can configure more then one DBWn, up to ten in fact (DBW0 ... DBW9). Most systems run with one database block writer but larger, multi-CPU systems can make use of more than one. If you do configure more then one DBWn, be sure to also increase the init.ora parameter,DB_BLOCK_LRU_LATCHES. This controls the number of LRU list latches (now called touch lists in 8i) – in effect, you want each DBWn to have their own list. If each DBWn shares the same list of blocks to write out to disk then they would only end up contending with other in order to access this list.
Normally, the DBWn uses asynchronous I/O to write blocks to disk. With asynchronous I/O, DBWn gathers up a batch of blocks to be written, and gives them to the operating system. DBWn does not wait for the OS to actually write the blocks out, rather it goes back and collects the next batch to be written.

As the OS completes the writes, it asynchronously notifies DBWn that it completed the write. This allows DBWn to work much faster than if it had to do everything serially. We'll see later, in the Slave Processes section, how we can use I/O slaves to simulate asynchronous I/O on platforms or configurations that do not support it.
I would like to make one final point about DBWn. It will, almost by definition, write out blocks scattered all over disk – DBWn does lots of scattered writes. When you do an update, you'll be modifying index blocks that are stored here and there and data blocks that are randomly distributed on disk as well. 

LGWR, on the other hand, does lots of sequential writes to the redo log. This is an important distinction, and one of the reasons that Oracle has a redo log and a LGWR process. Scattered writes are significantly slower then sequential writes. By having the SGA buffer dirty blocks and the LGWR process do large sequential writes that can recreate these dirty buffers, we achieve an increase in performance. The fact
that DBWn does its slow job in the background while LGWR does its faster job while the user waits, gives us overall better performance. This is true even though Oracle may technically be doing more I/O then it needs to (writes to the log and to the datafile) – the writes to the online redo log could be skipped if,
during a commit, Oracle physically wrote the modified blocks out to disk instead

LGWR – Log Writer

The LGWR process is responsible for flushing to disk the contents of the redo log buffer, located in the SGA. It does this:
❑ Every three seconds, or
❑ Whenever you commit, or
❑ When the redo log buffer is a third full or contains 1 MB of buffered data.

For these reasons, having an enormous redo log buffer is not practical – Oracle will never be able to use it all. The logs are written to with sequential writes as compared to the scattered I/O DBWn must perform. Doing large batch writes like this is much more efficient than doing many scattered writes to various parts of a file. This is one of the main reasons for having a LGWR and redo logs in the first place.
The efficiency in just writing out the changed bytes using sequential I/O outweighs the additional I/O incurred. Oracle could just write database blocks directly to disk when you commit but that would entail a lot of scattered I/O of full blocks – this would be significantly slower than letting LGWR write the changes out sequentially.

ARCn – Archive Process

The job of the ARCn process is to copy an online redo log file to another location when LGWR fills it up. These archived redo log files can then be used to perform media recovery. Whereas online redo log is used to 'fix' the data files in the event of a power failure (when the instance is terminated), archive redo logs are used to 'fix' data files in the event of a hard disk failure. If you lose the disk drive containing the data file, /d01/oradata/ora8i/system.dbf, we can go to our backups from last week, restore that old copy of the file, and ask the database to apply all of the archived and online redo log generated since that backup took place. This will 'catch up' that file with the rest of the data files in our database,
and we can continue processing with no loss of data.

ARCn typically copies online redo log files to at least two other locations (redundancy being a key to not losing data!). These other locations may be disks on the local machine or, more appropriately, at least one will be located on another machine altogether, in the event of a catastrophic failure. 

In many cases, these archived redo log files are copied off by some other process to some tertiary storage device, such as tape. They may also be sent to another machine to be applied to a 'standby database', a failover option offered by Oracle.


BSP – Block Server Process
This process is used exclusively in an Oracle Parallel Server (OPS) environment. An OPS is a configuration of Oracle whereby more then one instance mounts and opens the same database. Each instance of Oracle in this case is running on a different machine in a cluster, and they all access in a read-write fashion the same exact set of database files.

In order to achieve this, the SGA block buffer caches must be kept consistent with respect to each other. This is the main goal of the BSP. In earlier releases of OPS this was accomplished via a 'ping'. That is, if a node in the cluster needed a read consistent view of a block that was locked in exclusive mode by another node, the exchange of data was done via a disk flush (the block was pinged). This was a very
expensive operation just to read data. Now, with the BSP, this exchange is done via very fast cache-tocache exchange over the clusters high-speed connection.

LMON – Lock Monitor Process

This process is used exclusively in an OPS environment. The LMON process monitors all instances in a cluster to detect the failure of an instance. It then facilitates the recovery of the global locks held by the failed instance, in conjunction with the distributed lock manager (DLM) employed by the cluster hardware.

LMD – Lock Manager Daemon
This process is used exclusively in an OPS environment. The LMD process controls the global locks and global resources for the block buffer cache in a clustered environment. Other instances will make requests to the local LMD, in order to request it to release a lock or help find out who has a lock. The LMD also handles global deadlock detection and resolution.

LCKn – Lock Process

The LCKn process is used exclusively in an OPS environment. This process is very similar in functionality to the LMD described above, but handles requests for all global resources other than database block buffers.

No comments:

Post a Comment