aix平台 单机 10.2.0.4 突然实例宕机
alert.log 中记录
-
mon apr 12 13:25:58 2021
-
thread 1 advanced to log sequence 140477 (lgwr switch)
-
current log# 9 seq# 140477 mem# 0: /oradata/orcl/redo09.log
-
mon apr 12 13:26:38 2021
-
kcf: write/open error block=0x6706 online=1
-
file=38 /oradata/orcl/tsdat02_08.dbf
-
error=27063 txt: 'ibm aix risc system/6000 error: 16: device busy
-
additional information: -1
-
additional information: 8192'
-
automatic datafile offline due to write error on
-
file 38: /oradata/orcl/tsdat02_08.dbf
-
mon apr 12 13:26:43 2021
-
errors in file /home/oracle/admin/orcl/udump/orcl_ora_14155964.trc:
-
ora-00376: file 38 cannot be read at this time
-
ora-01110: data file 38: '/oradata/orcl/tsdat02_08.dbf'
-
ora-00376: file 38 cannot be read at this time
-
ora-01110: data file 38: '/oradata/orcl/tsdat02_08.dbf'
-
mon apr 12 13:26:49 2021
-
errors in file /home/oracle/admin/orcl/bdump/orcl_pmon_34537506.trc:
-
ora-00376: file 38 cannot be read at this time
-
ora-01110: data file 38: '/oradata/orcl/tsdat02_08.dbf'
-
mon apr 12 13:42:08 2021
-
errors in file /home/oracle/admin/orcl/bdump/orcl_lgwr_53149820.trc:
-
ora-00494: enqueue [cf] held for too long (more than 900 seconds) by 'inst 1, osid 36307002'
-
mon apr 12 13:42:09 2021
-
system state dumped to trace file /home/oracle/admin/orcl/bdump/orcl_lgwr_53149820.trc
-
killing enqueue blocker (pid=36307002) on resource cf-00000000-00000000
-
by killing session 552.1
-
mon apr 12 13:47:11 2021
-
errors in file /home/oracle/admin/orcl/bdump/orcl_lgwr_53149820.trc:
-
ora-00494: enqueue [cf] held for too long (more than 900 seconds) by 'inst 1, osid 36307002'
-
mon apr 12 13:47:12 2021
-
system state dumped to trace file /home/oracle/admin/orcl/bdump/orcl_lgwr_53149820.trc
-
killing enqueue blocker (pid=36307002) on resource cf-00000000-00000000
-
by terminating the process
-
lgwr: terminating instance due to error 2103
-
instance terminated by lgwr, pid = 53149820
看到读写38号文件时 提示error=27063 txt: 'ibm aix risc system/6000 error: 16: device busy
查看os日志
-
[ oracle ]host1:/home/oracle/admin/orcl/bdump:> errpt|head
-
identifier timestamp t c resource_name description
-
dcb47997 0412142821 t h hdisk112 disk operation error
-
dcb47997 0412142821 t h hdisk119 disk operation error
-
b6267342 0412142821 p h hdisk119 disk operation error
-
b6267342 0412142821 p h hdisk140 disk operation error
-
b6267342 0412142821 p h hdisk119 disk operation error
-
b6267342 0412142821 p h hdisk112 disk operation error
-
dcb47997 0412142721 t h hdisk112 disk operation error
-
dcb47997 0412142721 t h hdisk112 disk operation error
-
b6267342 0412142721 p h hdisk119 disk operation error
[ oracle ]host1:/home/oracle/admin/orcl/bdump:> errpt -aj dcb47997|more
---------------------------------------------------------------------------
label: sc_disk_err4
identifier: dcb47997
date/time: mon apr 12 14:28:34 gmt 08:00 2021
sequence number: 847721
machine id: 00f710984c00
node id: host1
class: h
type: temp
wpar: global
resource name: hdisk112
resource class: disk
resource type: clar_fc_vraid
location: u5802.001.9k8n405-p1-c2-t2-w500601690960398d-l3000000000000
vpd:
manufacturer................dgc
machine type and model......vraid
ros level and id............0533
serial number...............cetv2173400018
subsystem vendor/device id..vnx5600
device specific.(pq)........00
device specific.(vs)........255f081ceccl
device specific.(ui)........600601600a804500571d3d8381c2ea11
fru label...................0025
device specific.(z0)........10
device specific.(z1)........10
description
disk operation error
probable causes
media
dasd device
user causes
media defective
recommended actions
for removable media, change media and retry
perform problem determination procedures
failure causes
media
disk drive
recommended actions
for removable media, change media and retry
perform problem determination procedures
detail data
path id
0
sense data
0a00 2800 2117 0920 0004 0004 0000 0000 0000 0000 0000 0000 0118 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 034a 045c 0007 8000 0000 0000 0000 0000 0000 0000 0000 0083 0000
0000 003d 001d
---------------------------------------------------------------------------
据说是磁盘坏道或者链路闪断导致。
先处理故障,直接启动实例
startup 耗时20分钟终于起来了
实例启动后,38号文件一直报错
-
mon apr 12 14:22:33 2021
-
errors in file /home/oracle/admin/orcl/bdump/orcl_smon_34537570.trc:
-
ora-00376: file 38 cannot be read at this time
-
ora-01110: data file 38: '/oradata/orcl/tsdat02_08.dbf'
-
oracle instance orcl (pid = 9) - error 376 encountered while recovering transaction (30, 43) on object 1192947.
-
mon apr 12 14:22:34 2021
-
errors in file /home/oracle/admin/orcl/bdump/orcl_smon_34537570.trc:
-
ora-00376: file 38 cannot be read at this time
-
ora-01110: data file 38: '/oradata/orcl/tsdat02_08.dbf'
-
oracle instance orcl (pid = 9) - error 376 encountered while recovering transaction (30, 43) on object 1192947.
-
mon apr 12 14:22:35 2021
-
errors in file /home/oracle/admin/orcl/bdump/orcl_smon_34537570.trc:
-
ora-00376: file 38 cannot be read at this time
-
ora-01110: data file 38: '/oradata/orcl/tsdat02_08.dbf'
检查38号数据文件
select file#,status,bytes/1024/1024,name from v$datafile;
-
...
-
37 online 30720 /oradata/orcl/tsdat03_12.dbf
-
38 recover 30720 /oradata/orcl/tsdat02_08.dbf
-
39 online 30720/oradata/orcl/tsdat03_13.dbf
-
...
归档模式,问题发现的比较及时,赶紧修复
recover datafile 38;
alter database datafile 38 online;
很快顺利修复,又躲过一劫。
参考:
how to recover offline dropped datafile in archivelog mode (doc id 286355.1)
阅读(1143) | 评论(0) | 转发(0) |