近期比较爆发,宕机了好几个,龙生九子,各有不同,先记录下来,后面有时间再深入研究
测试库突然宕机
先看alert.log
-
kkjcre1p: unable to spawn jobq slave process
-
errors in file /home/ora/diag/rdbms/eastdb/orcl/trace/orcl_cjq0_3462.trc:
-
process j000 died, see its trace file
-
kkjcre1p: unable to spawn jobq slave process
-
errors in file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_cjq0_3462.trc:
-
process w000 died, see its trace file
-
process j000 died, see its trace file
-
kkjcre1p: unable to spawn jobq slave process
-
errors in file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_cjq0_3462.trc:
-
fri apr 16 08:46:18 2021
-
process j000 died, see its trace file
-
kkjcre1p: unable to spawn jobq slave process
-
errors in file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_cjq0_3462.trc:
-
fri apr 16 08:46:21 2021
-
process w000 died, see its trace file
-
fri apr 16 08:46:21 2021
-
pmon (ospid: 3335): terminating the instance due to error 474
-
fri apr 16 08:46:22 2021
-
system state dump requested by (instance=1, osid=3335 (pmon)), summary=[abnormal instance termination].
-
system state dumped to trace file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_diag_3370.trc
-
instance terminated by pmon, pid = 3335
关键信息是 error 474,这个代表smon完蛋了。
smon是干啥的?
那么,smon宕机从哪里入手分析?
很好还是diag的trace文件,这里是 orcl_diag_3370.trc
搜索process 13:smon ,其中的13 是这台机器的
oracle id 进程编号,其他机器上会不同
继续往下搜session wait history,看看有无异常的等待:
-
session wait history:
-
elapsed time of 0.263819 sec since current wait
-
0: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247545 seq_num=7117 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.439011 sec of elapsed time
-
1: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247544 seq_num=7116 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.253953 sec of elapsed time
-
2: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247543 seq_num=7115 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.030880 sec of elapsed time
-
3: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247542 seq_num=7114 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.047717 sec of elapsed time
-
4: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247541 seq_num=7113 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.007141 sec of elapsed time
-
5: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247540 seq_num=7112 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.176498 sec of elapsed time
-
6: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247539 seq_num=7111 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.183811 sec of elapsed time
-
7: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247538 seq_num=7110 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.088497 sec of elapsed time
-
8: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247537 seq_num=7109 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.262751 sec of elapsed time
-
9: waited for 'smon timer'
-
sleep time=0x12c, failed=0x0, =0x0
-
wait_id=9247536 seq_num=7108 snap_id=1
-
wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
-
wait times: max=5 min 0 sec
-
wait counts: calls=1 os=99
-
occurred after 0.029236 sec of elapsed time
-
sampled session history of session 66 serial 1
-
---------------------------------------------------
-
the sampled session history is constructed by sampling
-
the target session every 1 second. the sampling process
-
captures at each sample if the session is in a non-idle wait,
-
an idle wait, or not in a wait. if the session is in a
-
non-idle wait then one interval is shown for all the samples
-
the session was in the same non-idle wait. if the
-
session is in an idle wait or not in a wait for
-
consecutive samples then one interval is shown for all
-
the consecutive samples. though we display these consecutive
-
samples in a single interval the session may not be continuously
-
idle or not in a wait (the sampling process does not know).
-
-
the history is displayed in reverse chronological order.
没看到有什么异常。
改转向其他地方了,对,就是pmon的trace文件。
直接到最底部
-
0bf4f4ec0 00000000 00000000 00000000 00000000 [................]
-
repeat 113 times
-
0bf4f55e0 bf4f55e0 00000000 bf4f55e0 00000000 [.uo......uo.....]
-
0bf4f55f0 00000000 00000000 bf4f55f8 00000000 [.........uo.....]
-
0bf4f5600 bf4f55f8 00000000 00000000 00000000 [.uo.............]
-
0bf4f5610 00000000 00000000 00000000 00000000 [................]
-
repeat 1 times
-
kjzduptcctx: notifying diag for crash event
-
----- abridged call stack trace -----
-
ksedsts()461<-kjzdssdmp()267<-kjzduptcctx()232<-kjzdicrshnfy()53<-ksuitm()1332<-ksulhdcb()499<-ksucln()1243<-ksbrdp()971<-opirip()623<-opidrv()603<-sou2o()103<-opimai_real()266<-ssthrdmain()252<-main()201<-__libc_start_main()253<-_start()36
-
-
----- end of abridged call stack trace -----
-
-
*** 2021-04-16 08:46:21.779
-
pmon (ospid: 3335): terminating the instance due to error 474
-
ksuitm: waiting up to [5] seconds before killing diag(3370)
call stack trace对于问题定位非常重要。
我感觉其中关键的函数是ksucln()
猜测还是smon的老本行,清理对象时遇到问题。
smon宕机相关问题
-
ora-474:smon进程终止并出现错误
-
1- ora-00474:smon进程在并行事务恢复期间因错误而终止
-
-
凯发app官方网站的解决方案:
-
-
通过在您的init@sid.ora中添加以下参数来关闭并行恢复,
-
fast_start_parallel_rollback = false
-
反弹实例。
-
-
有关更多详细信息,请参阅:
-
-
ora-600 [15789]和ora-474(doc id 1094645.1)
-
-
-
2-导致数据库崩溃的ora-600 [504]和ora-474实例崩溃,ora-600 [kcbnew_3]可以使它们崩溃。(低于11.2.0.2的版本)
-
-
凯发app官方网站的解决方案:
-
-
升级到10.2.0.5或11.2.0.2或更高版本
-
或
-
检查mos平台上一次性修补程序:9084487的可用性。
-
-
有关更多详细信息,请参阅:
-
-
ora-00600 [504]和ora-474导致数据库崩溃(文档id 1209577.1)
-
-
3-在警报日志中报告的ora-600 [13011]和ora-474,其中跟踪失败的sql类似于“从smon_scn_time删除,其中scn =(从smon_scn_time中选择min(scn))”
-
-
凯发app官方网站的解决方案:
-
-
分析表smon_scn_time验证结构级联并重建其所有索引
-
-
有关更多详细信息,请参阅:
-
-
实例终止于错误ora-00474:smon进程终止于错误(文档id 1361872.1)
-
-
如果报告了不同表的错误,请尝试相同的凯发app官方网站的解决方案(分析报告的表和重建其索引)
-
-
有关此错误的疑难解答,请参阅以下文档,以了解更多详细信息:
-
-
了解和诊断ora-00600 [13011]错误(文档id 1392778.1)
-
-
4-使用ora-474和ora-660 [4464] / ora-600 [4427](在低于11.2.0.2的版本上)导致实例崩溃
-
-
这是bug 11814907:用ora-00474重新启动实例:由于关闭了smon过程而导致错误终止错误9857702的重复项:返还ora-600 [4464]
-
-
凯发app官方网站的解决方案:
-
-
升级到11.2.0.2或更高版本,或者安装临时补丁9857702(如果适用于您的平台)
-
-
5-警报日志中报告了ora-00600 [kdourp_inorder2]和ora-00474(版本低于11.2)
-
-
是错误7627304:ora-00600 [kdourp_inorder2]和ora-00474:smon,过程pmon终止实例已作为错误7662491的副本关闭:实例崩溃/ ora-600 [kddummy_blkchk]恢复期间命中
-
-
凯发app官方网站的解决方案:
-
-
升级至11.2或安装临时补丁7662491(如果适用于您的平台)
参考:
troubleshooting ora-46x and ora-47x xxxx process terminated with error (doc id 1907129.1)
srdc - instance termination (non-rac) issues : checklist of evidence to supply (doc id 2507010.1)
数据库系统监视进程(smon)(文档id 1495163.1)
对于宕机问题,搜集方法可以用
tfactl,顺便看看帮助内容----很丰富。
-
[oracle@shdb01 ~]$ tfactl diagcollect -srdc -help
-
service request data collection (srdc).
-
usage : /opt/oracle.ahf/tfa/bin/tfactl diagcollect -srdc [-tag ] [-z ] [-last | -from -to | -for ] -database
-
-tag the files will be collected into tagname directory inside
-
repository
-
-z the collection zip file will be given this name within the
-
tfa collection repository
-
-last files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours
-
-since same as -last. kept for backward compatibility.
-
-from "mon/dd/yyyy hh:mm:ss" from
-
or "yyyy-mm-dd hh:mm:ss"
-
or "yyyy-mm-ddthh:mm:ss"
-
or "yyyy-mm-dd"
-
-to "mon/dd/yyyy hh:mm:ss" to
-
or "yyyy-mm-dd hh:mm:ss"
-
or "yyyy-mm-ddthh:mm:ss"
-
or "yyyy-mm-dd"
-
-for "mon/dd/yyyy" for .
-
or "yyyy-mm-dd"
-
can be any of the following,
-
dbcorrupt required diagnostic data collection for a generic database corruption
-
listener_services srdc - data collection for tns-12516 / tns-12518 / tns-12519 / tns-12520.
-
naming_services srdc - data collection for ora-12154 / ora-12514 / ora-12528.
-
ora-00020 srdc for database ora-00020 maximum number of processes exceeded
-
ora-00060 srdc for ora-00060. internal error code.
-
ora-00494 srdc for ora-00494.
-
ora-00600 srdc for ora-00600. internal error code.
-
ora-00700 srdc for ora-00700. soft internal error.
-
ora-01031 srdc - how to collect standard information for ora - 1031 /ora -1017 during sysdba connections
-
ora-01555 srdc - ora-1555: checklist of evidence to supply (doc id 1682708.1)
-
ora-01578 srdc - required diagnostic data collection for ora-01578
-
ora-01628 srdc for database ora-01628 snapshot too old problems
-
ora-04020 srdc for ora-04020
-
ora-04021 srdc for ora-04021.
-
ora-04030 srdc for ora-04030. os process private memory was exhausted.
-
ora-04031 srdc for ora-04031. more shared memory is needed in the shared/streams pool.
-
ora-07445 srdc for ora-07445. exception encountered, core dump.
-
ora-08102 srdc - required diagnostic data collection for ora-08102.
-
ora-08103 srdc - required diagnostic data collection for ora-08103.
-
ora-12751 srdc for ora-12751. internal error code.
-
ora-22924 srdc - ora-22924 or ora-1555 on lob data: checklist of evidence to supply (doc id 1682707.1)
-
ora-27300 srdc for ora-27300. os system dependent operation:open failed with status: (status).
-
ora-27301 srdc for ora-27301. os failure message: (message).
-
ora-27302 srdc for ora-27302. failure occurred at: (module).
-
ora-30036 srdc for database ora-30036 unable to extend undo tablespace problems
-
tns-12154 srdc - data collection for tns-12154.
-
tns-12514 srdc - data collection for tns-12514.
-
tns-12516 srdc - data collection for tns-12516.
-
tns-12518 srdc - data collection for tns-12518.
-
tns-12519 srdc - data collection for tns-12519.
-
tns-12520 srdc - data collection for tns-12520.
-
tns-12528 srdc - data collection for tns-12528.
-
ahf srdc - data collection for orachk or exachk issue, after running orachk -debug or exachk -debug.
-
crs srdc for crs
-
crsasm srdc for asm crs related errors
-
crsasmcell srdc for asm crs cell related errors
-
dbacl srdc - how to collect standard information for access control lists (acls).
-
dbaqgen srdc - how to collect information for troubleshooting problem in an oracle advanced queuing environment.
-
dbaqmon srdc - how to collect information for troubleshooting queue monitor (qmon) issues.
-
dbaqnotify srdc - how to collect information for troubleshooting notification in an advanced queuing environment.
-
dbaqperf srdc - how to collect information for troubleshooting performance in an oracle advanced queuing environment.
-
dbaqpurge srdc - how to collect information for troubleshooting non-purged messages in an advanced queuing environment
-
dbasm srdc automation: enhance asm/dbfs/dnfs/acfs collections
-
dbaudit srdc - how to collect standard information for database auditing
-
dbaum srdc - aum : checklist of evidence to supply (doc id 1682741.1)
-
dbaumwaitevents srdc - wait events related to undo: checklist of evidence to supply (doc id 1682723.1)
-
dbawrspace srdc for database awr space problems
-
dbbeqconnection srdc - bequeath connection issues: checklist of evidence to supply (doc id 1928047.1)
-
dbdatapatch srdc - data collection for datapatch issues.
-
dbddlerrors srdc - ddl errors: checklist of evidence to supply
-
dbemon srdc - how to collect information for troubleshooting event monitor (emon) issues
-
dbenqdeq srdc - how to collect standard information for advanced queueing issues using tfa collector (recommended) or manual steps
-
dbexp srdc - how to collect information for troubleshooting export (exp) related problems
-
dbexpdp srdc - diagnostic collection for datapump export generic issues
-
dbexpdpapi srdc - diagnostic collection for datapump export api issues
-
dbexpdpperf srdc - diagnostic collection for datapump export performance issues
-
dbexpdptts srdc - data to supply for transportable tablespace datapump and original export, import
-
dbfra srdc - required diagnostic data collection for fra related errors.
-
dbfs srdc for dbfs.
-
dbggclassicmode srdc for doc id 1913426.1, 1913376.1 and 1912964.1
-
dbggintegratedmode srdc for goldengate extract/replicat abends problems.
-
dbhang srdc for database hang problems
-
dbimp srdc - diagnostic collection for traditional import issues
-
dbimpdp srdc - diagnostic collection for datapump import (impdp) generic issues
-
dbimpdpperf srdc - diagnostic collection for datapump import (impdp) performance issues
-
dbinstall srdc for oracle rdbms install problems.
-
dbinstancecrash srdc - instance termination (non-rac) issues : checklist of evidence to supply (doc id 2507010.1)
-
dbinvalidcomp srdc - invalid components and objects : checklist of evidence to supply
-
dbinvalidobj srdc - objects getting invalidated: checklist of evidence to supply
-
dbparameterfiles srdc - parameter files :checklist of evidence to supply.
-
dbparameters srdc - database parameters: checklist of evidence to supply.
-
dbpartition srdc - data to supply for create/maintain partitioned/subpartitioned table/index issues
-
dbpartitionperf srdc - data to supply for slow create/alter/drop commands against partitioned table/index
-
dbpatchconflict srdc for oracle rdbms patch conflict problems.
-
dbpatchinstall
-
dbperf srdc for database performance problems
-
dbplugincompliance srdc - collect relevant diagnostic information for all compliance related issues within enterprise manager 12c and 13c for oracle database.
-
dbpreupgrade srdc for database preupgrade problems.
-
dbprocmgmt srdc - generic process management and related issues: checklist of evidence to supply (doc id 2500734.1)
-
dbrac srdc for rac specific issues
-
dbracinst srdc automation: enhance asm/dbfs/dnfs/acfs collections
-
dbracmin minimal srdc for rac specific issues
-
dbracperf srdc for rac database performance problems
-
dbrman srdc - required diagnostic data collection for rman related errors.
-
dbrmanperf srdc - required diagnostic data collection for rman performance(1671509.1).
-
dbscn srdc for database scn problems.
-
dbshutdown srdc - shutdown issues : checklist of evidence to supply (doc id 1906473.1)
-
dbslowddl srdc - slow ddl: checklist of evidence to supply
-
dbspatialexportimport srdc - data collection for oracle spatial export/import issues.
-
dbspatialinstall srdc - data collection for oracle spatial installation issues.
-
dbsqlperf srdc - how to collect standard information for a sql performance problem using tfa collector.
-
dbstandalonedbca srdc - dbca issues: checklist of evidence to supply
-
dbstartup srdc - startup issues: checklist of evidence to supply (doc id 1905616.1)
-
dbtde srdc - how to collect standard information for transparent data encryption (tde) (doc id 1905607.1)
-
dbtextinstall srdc - data collection for oracle text installation issues - 12c.
-
dbtextupgrade srdc - data collection for oracle text upgrade issues - 12c.
-
dbundocorruption srdc - required diagnostic data collection for undo corruption.
-
dbunixresources srdc to capture diagnostic data for db issues related to o/s resources
-
dbupgrade srdc for database upgrade problems.
-
dbvault srdc - how to collect standard information for database vault
-
dbwindowsresources srdc - db on windows resources : checklist of evidence to supply.
-
dbwinservice srdc - oracleservice on windows: checklist of evidence to supply (doc id 1918781.1)
-
dbxdb srdc for database xdb installation and invalid object problems
-
dnfs srdc for dnfs.
-
emagentgeneric srdc - collect trace/log information for enterprise manager management agent generic issues
-
emagentpatching srdc - collect trace/log information for failures during enterprise manager 13c management agent patching.
-
emagentperf em srdc - collect diagnostic data for em agent performance issues.
-
emagentstartup srdc - collecting logs for enterprise manager 13c agent startup errors.
-
emagtpatchdeploy srdc - collecting log files for em 13c agent or agent patch deployment.
-
emagtupgpatch srdc - collecting log files for em 13c agent upgrade or local installation or patching.
-
emcliadd em srdc - errors during the adding of a database/listener/asm target via emcli.
-
emclusdisc em srdc - cluster target, cluster (rac) database or asm target is not discovered.
-
emdbaasdeploy srdc - collect trace/log information for failures during database as a service(dbaas) deployment.
-
emdbsys em srdc - database system target is not discovered/detected/removed/renamed correctly.
-
emdebugoff srdc for unsetting em debug.
-
emdebugon srdc for setting em debug.
-
emfleetpatching srdc - collecting diagnostic data for enterprise manager fleet maintenance patching issues.
-
emgendisc em srdc - general error is received when discovering or removing a database/listener/asm target.
-
emmetricalert srdc for em metric events not raised and general metric alert related issues.
-
emomscrash srdc - collect diagnostic data for all enterprise manager oms crash / restart performance issues.
-
emomsheap srdc - collecting diagnostic data for enterprise manager oms heap usage alert performance issues.
-
emomshungcpu srdc - collecting diagnostic data for enterprise manager oms hung or high cpu usage performance issues.
-
emomspatching srdc - collect trace/log information for failures during enterprise manager 13c oms patching.
-
empatchplancrt srdc - collecting diagnostic data for enterprise manager patch plan creation issues.
-
emprocdisc em srdc - database/listener/asm target is not discovered/detected by the discovery process.
-
emtbsmetric srdc - collect relevant diagnostic information for all tablespace space used (%) metric issues within enterprise manager for oracle database 12c and 13c.
-
esexalogic srdc - exalogic full exalogs data collection information.
-
exservice srdc - exadata: storage software service or offload server service failures.
-
exsmartscan srdc - exadata: smart scan not working issues.
-
gg_abend srdc for doc id 2650417.1
-
ggintegratedmodenodb srdc for goldengate extract/replicat abends problems.
-
gridinfra srdc automation: enhance asm/dbfs/dnfs/acfs collections
-
gridinfrainst srdc automation: enhance asm/dbfs/dnfs/acfs collections
-
instterm srdc for instance terminated events, such as ora-00469: ora-00470: ora-00480: ora-00490: ora-00491, ora-00492, ora-00493, ora-00495, ora-00496, ora-00497, ora-00498
-
internalerror srdc for all other types of internal database errors.
-
ora1000 srdc - open cursors:checklist of evidence to supply.
-
ora18 srdc - ora-18 or sessions parameter: checklist of evidence to supply.
-
ora25319 srdc - how to collect information for troubleshooting an ora-25319 error in an advanced queuing environment.
-
ora4023 srdc - ora-4023 : checklist of evidence to supply
-
ora4063 srdc - ora-4063 : checklist of evidence to supply
-
ora445 srdc - ora-445 or unable to spawn process: checklist of evidence to supply (doc id 2500730.1)
-
xdb600 srdc - required diagnostic data collection for xdb ora-00600 and ora-07445 internal error issues using tfa collector
-
xdbinstall srdc - required diagnostic data collection for xdb installation and invalid object for issues for 12c and onward
-
zlgeneric srdc - zero data loss recovery appliance (zdlra) data collection.
-
[oracle@shdb01 ~]$
结合alert.log,从最早的告警开始,发现15日14点awr就没有生成。
从 dba_hist_active_sess_history 看看出问题前库里在忙啥
set lines 500
set long 9999
set pages 999
set serveroutput on size 1000000
alter session set nls_date_format = 'yyyy/mm/dd hh24:mi:ss';
alter session set nls_timestamp_format = 'yyyy-mm-dd hh24.mi.ss.ff';
select instance_number, sample_id,sample_time,count(*) cnt
from dba_hist_active_sess_history where sample_time between
to_timestamp('2021/04/15 13:00', 'yyyy/mm/dd hh24:mi') and
to_timestamp('2021/04/16 10:00', 'yyyy/mm/dd hh24:mi')
group by instance_number, sample_id,sample_time
order by instance_number, sample_id,sample_time;
也没有数据了(宕机前也没有什么会话,大早晨8:30测试库能有什么业务)。
m000进程没有日志文件,只有j000的日志中每隔2秒提示:
process j000 is dead ... state=ksosp_spawned
操作系统的messages中出问题时有oom报错:
-
apr 11 18:58:27 host auditd[1737]: audit daemon rotating log files
-
apr 11 22:05:14 host auditd[1737]: audit daemon rotating log files
-
apr 12 11:09:15 host auditd[1737]: audit daemon rotating log files
-
apr 12 20:10:36 host auditd[1737]: audit daemon rotating log files
-
apr 13 13:01:16 host auditd[1737]: audit daemon rotating log files
-
apr 13 19:23:24 host auditd[1737]: audit daemon rotating log files
-
apr 14 13:49:20 host auditd[1737]: audit daemon rotating log files
-
apr 15 12:23:09 host auditd[1737]: audit daemon rotating log files
-
apr 15 17:34:48 host kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
-
apr 15 17:34:48 host kernel: oracle cpuset=/ mems_allowed=0
-
apr 15 17:34:48 host kernel: pid: 3388, comm: oracle tainted: g --------------- t 2.6.32-431.el6.x86_64 #1
-
apr 15 17:34:48 host kernel: call trace:
-
apr 15 17:34:48 host kernel: [] ? cpuset_print_task_mems_allowed 0x91/0xb0
-
apr 15 17:34:48 host kernel: [] ? dump_header 0x90/0x1b0
-
apr 15 17:34:48 host kernel: [] ? security_real_capable_noaudit 0x3c/0x70
-
apr 15 17:34:48 host kernel: [] ? oom_kill_process 0x82/0x2a0
-
apr 15 17:34:48 host kernel: [] ? select_bad_process 0xe1/0x120
-
apr 15 17:34:48 host kernel: [] ? out_of_memory 0x220/0x3c0
-
apr 15 17:34:48 host kernel: [] ? __alloc_pages_nodemask 0x8ac/0x8d0
-
apr 15 17:34:48 host kernel: [] ? alloc_pages_current 0xaa/0x110
-
apr 15 17:34:48 host kernel: [] ? __page_cache_alloc 0x87/0x90
-
apr 15 17:34:48 host kernel: [] ? find_get_page 0x1e/0xa0
-
apr 15 17:34:48 host kernel: [] ? filemap_fault 0x1a7/0x500
-
apr 15 17:34:48 host kernel: [] ? __do_fault 0x54/0x530
-
apr 15 17:34:48 host kernel: [] ? handle_pte_fault 0xf7/0xb00
-
apr 15 17:34:48 host kernel: [] ? rb_reserve_next_event 0xb4/0x370
-
apr 15 17:34:48 host kernel: [] ? native_sched_clock 0x13/0x80
-
apr 15 17:34:48 host kernel: [] ? rb_reserve_next_event 0xb4/0x370
-
apr 15 17:34:48 host kernel: [] ? native_sched_clock 0x13/0x80
-
apr 15 17:34:48 host kernel: [] ? handle_mm_fault 0x22a/0x300
-
apr 15 17:34:48 host kernel: [] ? __do_page_fault 0x138/0x480
-
apr 15 17:34:48 host kernel: [] ? thread_group_times 0x3d/0x120
-
apr 15 17:34:48 host kernel: [] ? ring_buffer_lock_reserve 0xa2/0x160
-
apr 15 17:34:48 host kernel: [] ? mmput 0x1e/0x120
-
apr 15 17:34:48 host kernel: [] ? trace_nowake_buffer_unlock_commit 0x43/0x60
-
apr 15 17:34:48 host kernel: [] ? ftrace_raw_event_sys_exit 0xb9/0xc0
-
apr 15 17:34:48 host kernel: [] ? do_page_fault 0x3e/0xa0
-
apr 15 17:34:48 host kernel: [] ? page_fault 0x25/0x30
-
apr 15 17:34:48 host kernel: mem-info:
-
apr 15 17:34:48 host kernel: node 0 dma per-cpu:
-
apr 15 17:34:48 host kernel: cpu 0: hi: 0, btch: 1 usd: 0
-
apr 15 17:34:48 host kernel: cpu 1: hi: 0, btch: 1 usd: 0
-
apr 15 17:34:48 host kernel: cpu 2: hi: 0, btch: 1 usd: 0
-
apr 15 17:34:48 host kernel: cpu 3: hi: 0, btch: 1 usd: 0
-
apr 15 17:34:48 host kernel: node 0 dma32 per-cpu:
-
apr 15 17:34:48 host kernel: cpu 0: hi: 186, btch: 31 usd: 0
-
apr 15 17:34:48 host kernel: cpu 1: hi: 186, btch: 31 usd: 0
-
apr 15 17:34:48 host kernel: cpu 2: hi: 186, btch: 31 usd: 0
-
apr 15 17:34:48 host kernel: cpu 3: hi: 186, btch: 31 usd: 0
-
apr 15 17:34:48 host kernel: node 0 normal per-cpu:
-
apr 15 17:34:48 host kernel: cpu 0: hi: 186, btch: 31 usd: 0
-
apr 15 17:34:48 host kernel: cpu 1: hi: 186, btch: 31 usd: 0
-
apr 15 17:34:48 host kernel: cpu 2: hi: 186, btch: 31 usd: 23
-
apr 15 17:34:48 host kernel: cpu 3: hi: 186, btch: 31 usd: 0
-
apr 15 17:34:48 host kernel: active_anon:463690 inactive_anon:140220 isolated_anon:0
-
apr 15 17:34:48 host kernel: active_file:245 inactive_file:504 isolated_file:0
-
apr 15 17:34:48 host kernel: unevictable:0 dirty:11 writeback:0 unstable:0
-
apr 15 17:34:48 host kernel: free:22140 slab_reclaimable:10286 slab_unreclaimable:84990
-
apr 15 17:34:48 host kernel: mapped:11740 shmem:46989 pagetables:215832 bounce:0
-
apr 15 17:34:48 host kernel: node 0 dma free:15684kb min:248kb low:308kb high:372kb active_anon:0kb inactive_anon:0kb active_file:0kb inactive_file:0kb unevictable:0kb isolated(anon):0kb isolated(file):0kb present:15292kb mlocked:0kb dirty:0kb writeback:0kb mapped:0kb shmem:0kb slab_reclaimable:0kb slab_unreclaimable:0kb kernel_stack:0kb pagetables:0kb unstable:0kb bounce:0kb writeback_tmp:0kb pages_scanned:0 all_unreclaimable? yes
-
apr 15 17:34:48 host kernel: lowmem_reserve[]: 0 3000 4010 4010
-
apr 15 17:34:48 host kernel: node 0 dma32 free:55048kb min:50372kb low:62964kb high:75556kb active_anon:1627600kb inactive_anon:333656kb active_file:916kb inactive_file:1996kb unevictable:0kb isolated(anon):0kb isolated(file):0kb present:3072096kb mlocked:0kb dirty:32kb writeback:0kb mapped:23000kb shmem:120460kb slab_reclaimable:23224kb slab_unreclaimable:199140kb kernel_stack:27016kb pagetables:538512kb unstable:0kb bounce:0kb writeback_tmp:0kb pages_scanned:0 all_unreclaimable? no
-
apr 15 17:34:48 host kernel: lowmem_reserve[]: 0 0 1010 1010
-
apr 15 17:34:48 host kernel: node 0 normal free:17828kb min:16956kb low:21192kb high:25432kb active_anon:227160kb inactive_anon:227224kb active_file:64kb inactive_file:20kb unevictable:0kb isolated(anon):0kb isolated(file):0kb present:1034240kb mlocked:0kb dirty:12kb writeback:0kb mapped:23960kb shmem:67496kb slab_reclaimable:17920kb slab_unreclaimable:140820kb kernel_stack:7400kb pagetables:324816kb unstable:0kb bounce:0kb writeback_tmp:0kb pages_scanned:16 all_unreclaimable? no
-
apr 15 17:34:48 host kernel: lowmem_reserve[]: 0 0 0 0
-
apr 15 17:34:48 host kernel: node 0 dma: 1*4kb 4*8kb 2*16kb 2*32kb 3*64kb 0*128kb 0*256kb 0*512kb 1*1024kb 1*2048kb 3*4096kb = 15684kb
-
apr 15 17:34:48 host kernel: node 0 dma32: 5534*4kb 1983*8kb 236*16kb 85*32kb 110*64kb 28*128kb 0*256kb 0*512kb 0*1024kb 0*2048kb 0*4096kb = 55120kb
-
apr 15 17:34:48 host kernel: node 0 normal: 4451*4kb 3*8kb 0*16kb 0*32kb 0*64kb 0*128kb 0*256kb 0*512kb 0*1024kb 0*2048kb 0*4096kb = 17828kb
-
apr 15 17:34:48 host kernel: 50408 total pagecache pages
-
apr 15 17:34:48 host kernel: 2357 pages in swap cache
-
apr 15 17:34:48 host kernel: swap cache stats: add 13218790, delete 13216433, find 24808505/26126131
-
apr 15 17:34:48 host kernel: free swap = 0kb
-
apr 15 17:34:48 host kernel: total swap = 4194296kb
-
apr 15 17:34:48 host kernel: 1048560 pages ram
-
apr 15 17:34:48 host kernel: 67274 pages reserved
-
apr 15 17:34:48 host kernel: 179030 pages shared
-
apr 15 17:34:48 host kernel: 816137 pages non-shared
-
apr 15 17:34:48 host kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
-
apr 15 17:34:48 host kernel: [ 465] 0 465 2814 1 1 -17 -1000 udevd
-
apr 15 17:34:48 host kernel: [ 1589] 0 1589 47371 136 0 0 0 vmtoolsd
-
apr 15 17:34:48 host kernel: [ 1737] 0 1737 23300 73 2 -17 -1000 auditd
-
apr 15 17:34:48 host kernel: [ 1739] 0 1739 20521 80 1 0 0 audispd
-
apr 15 17:34:48 host kernel: [ 1740] 0 1740 5301 42 0 0 0 sedispatch
-
apr 15 17:34:48 host kernel: [ 1814] 0 1814 2705 44 2 0 0 irqbalance
-
apr 15 17:34:48 host kernel: [ 1833] 32 1833 4759 22 0 0 0 rpcbind
-
apr 15 17:34:48 host kernel: [ 1942] 0 1942 3396 44 3 -17 -1000 lldpad
free swap = 0kb ?
估计是内存不足,部署osw,再观察吧。
阅读(2511) | 评论(0) | 转发(0) |