分类: oracle
2011-05-17 11:25:11
the following aspects, as i understand are essentially to be understood to have a basic knowledge of the internals.
locks,
latches, enqueues, semaphores, control file structures, redo log files,
i/o constitute the list which is not exhaustive.
locks and enqueues
they are processes that support queuing and concurrency. they are queued
and serviced in a first-in-first-out (fifo) order. locks are mechanisms
that prevent destructive interaction between transactions accessing the
same resource.
resources include two general types of objects: user objects, such as
tables and rows (structures and data) system objects not visible to users,
such as shared data structures in the memory and data dictionary rows.
this script lists the users that are currently holding locks that are currently blocking users as well as the users that are being blocked. it also lists the offending objects (tables, etc).
set heading on
ttitle 'user blocking and waiting for other users'
select
distinct o.object_name,
sh.username||'('||sh.sid||')' "holder",
sw.username||'('||sw.sid||')' "waiter",
decode( lh.lmode, 1, 'null', 2,'row share', 3, 'row exclusive', 4, 'share',
5, 'share row exclusive' , 6, 'exclusive') "lock type"
from all_objects o,
v$session sw,
v$lock lw,
v$session sh,
v$lock lh
where lh.id1 = o.object_id
and lh.id1 = lw.id1
and sh.sid = lh.sid
and sw.sid = lw.sid
and sh.lockwait is null
and sw.lockwait is not null
and lh.type = 'tm'
and lw.type = 'tm'
/
this script shows the operating system process that is being locked out.
select
ses.username||'('||sid||')' users, acc.owner owner,
acc.object object, ses.lockwait, prc.osuser os_process
from v$process prc, v$access acc, v$session ses
where prc.addr = ses.paddr
and ses.sid = acc.sid
and ses.lockwait is not null;
this script shows the sql that people who are currently being locked are trying to run.
select
ses.username||'('||sid||')' users, acc.owner owner,
acc.object object, ses.lockwait, txt.sql_text sqltext
from v$sqltext txt, v$access acc, v$session ses
where txt.address = ses.sql_address
and txt.hash_value = ses.sql_hash_value
and ses.sid = acc.sid
and ses.lockwait is not null;
latches
they are internal oracle mechanism used to protect data structures in the sga from simultaneous access. atomic hardware instructions like test-and-set is used to implement latches. latches are more restrictive than locks in that they are always exclusive. latches are never queued, but will spin or sleep until they obtain a resource, or time out.
semaphores
semaphores are unix operating system facility to used to control the waiting. oracle uses semaphores in hp_ux and solaris to synchronize shadow processes and background processes.
the number of semaphores used by oracle is equal to the number of processes defined in the initialization parameter file.
in aix unix it is not the semaphore but it is a post/wait driver to serialize the tasks.
hidden parameters
the following sql enables you find all the hidden parameters for your database. the sql is executable as sys user. if this sql is executed as system or any other user with dba role may encounter ora-00942: table or view does not exist.
select
*
from sys.x$ksppi
where substr(ksppinm,1,1) = '_';
the following query displays parameter names with their current value:
select
a.ksppinm "parameter",
b.ksppstvl "session value",
c.ksppstvl "instance value"
from x$ksppi a,
x$ksppcv b,
x$ksppsv c
where a.indx = b.indx and a.indx = c.indx
and substr(ksppinm,1,1)='_'
order by a.ksppinm;
x$ tables
to get an exhaustive list of the x$ tables the following query may be used
select distinct table_name from v$indexed_fixed_column where table_name like 'x$%';
the list may not be complete or accurate, but represents an attempt to figure out what information they contain. one should generally not write queries against these tables as they are internal to oracle, and oracle may change them without any prior notification. you may get a list 178 table when in oracle 9.0.1, 155 when in oracle 8.1.6 and so on and so forth.
sub-systems
kernel subsystems:
opi oracle program interface
kk compilation layer - parse sql, compile pl/sql
kx execution layer - bind and execute sql and pl/sql
k2 distributed execution layer - 2pc handling
npi network program interface
kz security layer - validate privs
kq query layer
rpi recursive program interface
ka access layer
kd data layer
kt transaction layer
kc cache layer
ks services layer
kj lock manager layer
kg generic layer
kv kernel variables (eg. x$kvis and x$kvii)
s or ods operating system dependencies
setting up events:
the following events are frequently used by dbas and oracle support to diagnose problems:
10046
trace name context forever, level 4
trace sql statements and show bind variables in trace output.
10046 trace name context forever, level 8
this shows wait events in the sql trace files
10046 trace name context forever, level 12
this shows both bind variable names and wait events in the sql trace files
1401trace name errorstack, level 12
1401 trace name errorstack, level 4
1401 trace name processstate
dumps out trace information if an ora-1401 "inserted value too large for column" error occurs. the 1401 can be replaced by any other oracle server error code that you want to trace.
the following list of events is examples only. they might be version specific, so please call oracle before using them:
10210
trace name context forever, level 10
10211 trace name context forever, level 10
10231 trace name context forever, level 10
these events prevent database block corruptions
10049 trace name context forever, level 2
memory protect cursor
10210 trace name context forever, level 2
data block check
10211 trace name context forever, level 2
index block check
10235 trace name context forever, level 1
memory heap check
10262 trace name context forever, level 300
allow 300 bytes memory leak for connections
use unix oerr command to get the description of an event.
on
unix, you can type "oerr ora 10053" from the command prompt
to get event details.
how can one dump internal database structures?
the following (mostly undocumented) commands can be used to obtain information
about internal database structures.
-- dump control file contents
alter
session set events 'immediate trace name controlf level 10'
/
--
dump file headers
alter session set events 'immediate trace name file_hdrs level 10'
/
--
dump redo log headers
alter session set events 'immediate trace name redohdr level 10'
/
--
dump the system state
alter session set events 'immediate trace name systemstate level 10'
/
--
dump the process state
alter session set events 'immediate trace name processstate level 10'
/
--
dump library cache details
alter session set events 'immediate trace name library_cache level 10'
/
--
dump optimizer statistics whenever a sql statement is parsed (hint: change
statement or flush pool)
alter session set events '10053 trace name context forever, level 1'
/
--
dump a database block (file/ block must be converted to dba address)
-- convert file and block number to a dba (database block address). eg:
variable x varchar2;
exec :x := dbms_utility.make_data_block_address(1,12);
print x
alter
session set events 'immediate trace name blockdump level 50360894'
/
these examples are taken from
oracle
internals - 2 (oracle controlfile structures )
this document tries to discuss the controlfile and its structures as part of the discussions initiated in oracle internals -1.
let it be noted by all the readers that in this process of documentation many books have been referred to and from many web sites information is collected which is not found, generally in oracle documentation. all the sources are not listed here. but, have been acknowledged with honor and dignity.
what a controlfile consists of?
the control file has to be looked at in two parts. part one is header and part 2 is its contents / structures.
the header block contains (1) controlfile block size and (2) the number of blocks in the controlfile. when oracle mounts the database, reads the header and matches them. if they can be matched oracle returns the error message informing the corruption of the controlfile.
what is the block size of controlfile?
the initialization parameter file for every database determines the default block size using db_block_size parameter. this oracle data block size is the default block size for the controlfile.
how many blocks are there in the controlfile?
this is tallied with the size of the controlfile in bytes at the os level. oracle data blocks are in the multiples of os blocks and os blocks are determinable and specific as provided by the os vendors in the os docs. so comparing the size of the controlfile with the size as shown at os level tallies the number of the controlfile blocks.
note:
here is an extract from steve
sessions
must hold an exclusive lock on the cf enqueue for the duration of controlfile
transactions. this prevents concurrent controlfile transactions, and in-flux
controlfile reads, because a shared lock on the cf enqueue is needed for
controlfile reads. however, there is also a need for recoverability should
a process, instance or system failure occur during a controlfile transaction.
for
the first record section of the controlfile, the database information
entry section, this requirement is trivial, because the database information
entry only takes about 210 bytes and is therefore guaranteed to always
fit into a single controlfile block that can be written atomically. therefore
changes to the database entry can be implicitly committed as they are
written, without any recoverability concerns.
maintaining
all the information in duplicate provides recoverability for changes to
the other controlfile records sections. two physical blocks represent
each logical block. one contains the current information, and the other
contains either an old copy of the information, or a pending version that
is yet to be committed. to keep track of which physical copy of each logical
block contains the current information, oracle maintains a block version
bitmap with the database information entry in the first record section
of the controlfile.
to
read information from the controlfile, a session must first read the block
version bitmap to determine which physical block to read. then if a change
must be made to the logical block, the change is first written to the
alternate physical block for that logical block, and then committed by
atomically rewriting the block containing the block version bitmap with
the bit representing that logical block flipped. when changes need to
be made to multiple records in the same controlfile block, such as when
updating the checkpoint scn in all online datafiles, those changes are
buffered and then written together. note that each controlfile transaction
requires at least 4 serial i/o operations against the controlfile, and
possibly more if multiple blocks are affected, or if the controlfile is
multiplexed and asynchronous i/o is not available. so controlfile transactions
are potentially expensive in terms of i/o latency.
whenever
a controlfile transaction is committed, the controlfile sequence number
is incremented. this number is recorded with the block version bitmap
and database information entry in the first record section of the controlfile.
it is used in the cache header of each controlfile block in place of an
scn to detect possible split blocks from hot backups. it is also used
in queries that perform multiple controlfile reads to ensure that a consistent
snapshot of the controlfile has been seen. if not, an ora-00235 error
is returned.
the controlfile transaction mechanism is not used for updates to the checkpoint heartbeat. instead the size of the checkpoint progress record is overstated as half of the available space in a controlfile block, so that one physical block is allocated to the checkpoint progress record section per thread. then, instead of using pairs of physical blocks to represent each logical block, each checkpoint progress record is maintained in its own physical block so that checkpoint heartbeat writes can be performed and committed atomically without affecting any other data.
steve on controlfile transactions: ()
sessions
must hold an exclusive lock on the cf enqueue for the duration of controlfile
transactions. this prevents concurrent controlfile transactions, and in-flux
controlfile reads, because a shared lock on the cf enqueue is needed for
controlfile reads. however, there is also a need for recoverability should
a process, instance or system failure occur during a controlfile transaction.
for
the first record section of the controlfile, the database information
entry section, this requirement is trivial, because the database information
entry only takes about 210 bytes and is therefore guaranteed to always
fit into a single controlfile block that can be written atomically. therefore
changes to the database entry can be implicitly committed as they are
written, without any recoverability concerns.
maintaining
all the information in duplicate provides recoverability for changes to
the other controlfile records sections. two physical blocks represent
each logical block. one contains the current information, and the other
contains either an old copy of the information, or a pending version that
is yet to be committed. to keep track of which physical copy of each logical
block contains the current information, oracle maintains a block version
bitmap with the database information entry in the first record section
of the controlfile.
to
read information from the controlfile, a session must first read the block
version bitmap to determine which physical block to read. then if a change
must be made to the logical block, the change is first written to the
alternate physical block for that logical block, and then committed by
atomically rewriting the block containing the block version bitmap with
the bit representing that logical block flipped. when changes need to
be made to multiple records in the same controlfile block, such as when
updating the checkpoint scn in all online datafiles, those changes are
buffered and then written together. note that each controlfile transaction
requires at least 4 serial i/o operations against the controlfile, and
possibly more if multiple blocks are affected, or if the controlfile is
multiplexed and asynchronous i/o is not available. so controlfile transactions
are potentially expensive in terms of i/o latency.
whenever
a controlfile transaction is committed, the controlfile sequence number
is incremented. this number is recorded with the block version bitmap
and database information entry in the first record section of the controlfile.
it is used in the cache header of each controlfile block in place of an
scn to detect possible split blocks from hot backups. it is also used
in queries that perform multiple controlfile reads to ensure that a consistent
snapshot of the controlfile has been seen. if not, an ora-00235 error
is returned.
the controlfile transaction mechanism is not used for updates to the checkpoint
heartbeat. instead the size of the checkpoint progress record is overstated
as half of the available space in a controlfile block, so that one physical
block is allocated to the checkpoint progress record section per thread.
then, instead of using pairs of physical blocks to represent each logical
block, each checkpoint progress record is maintained in its own physical
block so that checkpoint heartbeat writes can be performed and committed
atomically without affecting any other data.
select
addrd,
indx,
decode (indx,0,'database',
1,'ckpt progress',
2,'redo thread',
3,'redo log',
4,'datafile',
5,'filename',
6,'tablespace',
7,'temporary filename',
8,'rman configuration',
9,'log history,
10,'offline range',
11,'archived log',
12,'backup set',
13,'backup piece',
14,'backup datafile',
15,'backup redolog',
16,'datafile copy',
17,'backup corruption',
18,'copy corruption',
19,'deleted object',
20,'proxy copy',
21,'reserved4'),
inst_id,
rslbn,
rsrsz,
rsnum,
rsnus,
rsiol,
rsilw,
rsrlw
from sys.x$kccrs;
this may be throwing light, enough to the reader to understand the role of controlfile when rman is used with no catalog. this part of this paper is covering 'how the updates to controlfile take place' as it is thought fit to discuss that part in the redolog files and backup and recovery section.
size of the controlfile
overwriting the information already contained in the controlfile reuses the space in the controlfiles and thus a bit of control can be exercised by the user. the particular initialization parameter 'control_file_record_keep_time' sets the minimum number of days that must have elapsed before a reusable controlfile record slot can be reused. the default is 7 days. if all the slots in a record section are in use and that number of days has not yet elapsed since the timestamp on the earliest entry, then oracle dynamically expands the record section (and thus the controlfile too) to make more slots available, up to a maximum of 65535 slots per section, or the controlfile size limit. (the controlfile size limit is based on the number of blocks that can be represented in the block version bitmap, and is thus most unlikely to be reached.) informational "kccrsz" messages about the dynamic expansion of the controlfile (or the failure to do so) may be seen in the alert log file for the instance. the control_file_record_keep_time parameter can also be set to zero to prevent keep time related controlfile expansion, if it suits the requirements of maintenance of database. controlfile backups are advocated along with the datafiles only because the resizing takes place under the protection of cf enqueue. if an instance failure or a system failure occurs during that event of resizing the controlfiles may get corrupted.
the
contents of the current controlfile can be dumped in text form to a process
trace file in the user_dump_dest directory using the controlf dump for
a given oracle database. the levels for this dump are as follows.
alter session set events 'immediate trace name controlf level 3'; is issued against the database from one user session.
(this
is for 8.1.6.3 version)
dump of control files, seq # 22327 = 0x5737
file header:
software vsn=135266304=0x8100000, compatibility vsn=134217728=0x8000000
db id=2607436010=0x9b6a50ea, db name='srinivas'
control seq=22327=0x5737, file size=364=0x16c
file number=0, blksiz=8192, file type=1 control
the above is part of the header information when a sql statement
(
this is against 9.0.1.3.1 version)
dump of control files, seq # 10624 = 0x2980
file header:
software vsn=150994944=0x9000000, compatibility vsn=134217728=0x8000000
db id=3228358760=0xc06cd868, db name='know9i'
control seq=10624=0x2980, file size=366=0x16e
file number=0, blksiz=4096, file type=1 control
oracle internals - 3 (redolog files)
oracle has three important structures that take care of the database. the oracle internals maintain the structural details of these structures. these structures are (1) datafiles (2) controlfiles and (3) redolog files.
oracle has been allowing the multiplexing of controlfiles and redolog files. the user defines the multiplexed controlfile destinations while creating the initialization parameter file for the database proposed to be created. in the 'create database' command defines the redolog groups and members and their destinations.
multiplexing, mirroring or otherwise securing the database files is done by choosing the additional software like raid to stripe, mirror, set parity and/or duplex, emc for replicating the datafiles and other software that enables to copy or duplex datafiles. oracle has provided the capability of backing of the datafiles or any set of files identified as a backup set to tow destinations using rman by identifying different channels through parallelization work-around or set duplex levels etc.
as the aim of this article is to discuss oracle internals the passing reference is done to those issues, which are discussed at appropriate places.
redo log buffers
log_buffer parameter set in the initialization parameter defines the size of buffers in bytes allocated in the sga. although the size of redo entries is measured in bytes, lgwr writes the redo to the log files on disk in blocks. the size of redo log blocks is fixed in the oracle source code and is operating system specific. oracle's documentation uses the term "operating system block size" to refer to the log block size. normally it is the smallest unit of i/o supported by the operating system for raw i/o, but on some operating systems it is the smallest possible unit of file system based i/o.
the
operating system block default sizes for various operating systems are
as under and lgwr writes to log files the redo buffers in that default
sizes respectively.
in some operating systems the block size is modifiable.
the log block size is the unit for the setting of the log_checkpoint_interval, _log_io_size and max_dump_file_size parameters. therefore, it is an important constant to know. if your operating system is not listed in the table above, then you can find your log block size using the following query.
to know what is the operating system block size as also the log block size use this sql logging in as sys user.
select max(lebsz) from sys.x$kccle;
the value for log_buffer (as set in parameter file is to be reflected in the v$sga and v$sgastat views under name 'log_buffer'. the following sql statements can be issued and found whether they match are not.
select
sum(value) from v$sga;
select sum(bytes) from v$sgastat;
the discrepancy, if exists, is because of the log buffer guard page.
on platforms that support memory protection there are guard pages on each side of the log buffer. these are one memory protection unit in size - normally 4k or 8k. oracle uses an mprotect() system call during instance startup to set the permissions on the guard pages to prot_none. thus any attempt to read from or write to either of the guard pages will return the eacces error and cause the process to abort, even if the effective user id of the process is oracle (steve).
up to 7.x there was a perfect match between these two. as said by steve, if the difference is 4k or 8k, it is because of log buffer guard page. if it exceeds that 8k what is that? the answer is still searched. here is one example:
sql> select sum(value) from v$sga;
sum(value)
----------
137020380
sql> select sum(bytes) from v$sgastat;
sum(bytes)
----------
136990084
sql> select (137020380-136990084)/1024 from dual;
(137020380-136990084)/1024
--------------------------
29.5859375