pmon terminating the instance due to error 474 smon -凯发app官方网站

凯发app官方网站-凯发k8官网下载客户端中心 | | 凯发app官方网站-凯发k8官网下载客户端中心
  • 博客访问: 3502709
  • 博文数量: 718
  • 博客积分: 1860
  • 博客等级: 上尉
  • 技术积分: 7790
  • 用 户 组: 普通用户
  • 注册时间: 2008-04-07 08:51
个人简介

偶尔有空上来看看

文章分类

全部博文(718)

文章存档

2024年(4)

2023年(74)

2022年(134)

2021年(238)

2020年(115)

2019年(11)

2018年(9)

2017年(9)

2016年(17)

2015年(7)

2014年(4)

2013年(1)

2012年(11)

2011年(27)

2010年(35)

2009年(11)

2008年(11)

最近访客
相关博文
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·
  • ·

分类: oracle

2021-04-16 20:01:32


近期比较爆发,宕机了好几个,龙生九子,各有不同,先记录下来,后面有时间再深入研究

测试库突然宕机

先看alert.log 

  1. kkjcre1p: unable to spawn jobq slave process
  2. errors in file /home/ora/diag/rdbms/eastdb/orcl/trace/orcl_cjq0_3462.trc:
  3. process j000 died, see its trace file
  4. kkjcre1p: unable to spawn jobq slave process
  5. errors in file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_cjq0_3462.trc:
  6. process w000 died, see its trace file
  7. process j000 died, see its trace file
  8. kkjcre1p: unable to spawn jobq slave process
  9. errors in file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_cjq0_3462.trc:
  10. fri apr 16 08:46:18 2021
  11. process j000 died, see its trace file
  12. kkjcre1p: unable to spawn jobq slave process
  13. errors in file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_cjq0_3462.trc:
  14. fri apr 16 08:46:21 2021
  15. process w000 died, see its trace file
  16. fri apr 16 08:46:21 2021
  17. pmon (ospid: 3335): terminating the instance due to error 474
  18. fri apr 16 08:46:22 2021
  19. system state dump requested by (instance=1, osid=3335 (pmon)), summary=[abnormal instance termination].
  20. system state dumped to trace file /home/ora/diag/rdbms/orcl/orcl/trace/orcl_diag_3370.trc
  21. instance terminated by pmon, pid = 3335
关键信息是 error 474,这个代表smon完蛋了。



smon是干啥的?
那么,smon宕机从哪里入手分析?

很好还是diag的trace文件,这里是 orcl_diag_3370.trc
搜索process 13:smon ,其中的13 是这台机器的 oracle id 进程编号,其他机器上会不同
继续往下搜session wait history,看看有无异常的等待:

点击(此处)折叠或打开

  1. session wait history:
  2.         elapsed time of 0.263819 sec since current wait
  3.      0: waited for 'smon timer'
  4.         sleep time=0x12c, failed=0x0, =0x0
  5.         wait_id=9247545 seq_num=7117 snap_id=1
  6.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  7.         wait times: max=5 min 0 sec
  8.         wait counts: calls=1 os=99
  9.         occurred after 0.439011 sec of elapsed time
  10.      1: waited for 'smon timer'
  11.         sleep time=0x12c, failed=0x0, =0x0
  12.         wait_id=9247544 seq_num=7116 snap_id=1
  13.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  14.         wait times: max=5 min 0 sec
  15.         wait counts: calls=1 os=99
  16.         occurred after 0.253953 sec of elapsed time
  17.      2: waited for 'smon timer'
  18.         sleep time=0x12c, failed=0x0, =0x0
  19.         wait_id=9247543 seq_num=7115 snap_id=1
  20.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  21.         wait times: max=5 min 0 sec
  22.         wait counts: calls=1 os=99
  23.         occurred after 0.030880 sec of elapsed time
  24.      3: waited for 'smon timer'
  25.         sleep time=0x12c, failed=0x0, =0x0
  26.         wait_id=9247542 seq_num=7114 snap_id=1
  27.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  28.         wait times: max=5 min 0 sec
  29.         wait counts: calls=1 os=99
  30.         occurred after 0.047717 sec of elapsed time
  31.      4: waited for 'smon timer'
  32.         sleep time=0x12c, failed=0x0, =0x0
  33.         wait_id=9247541 seq_num=7113 snap_id=1
  34.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  35.         wait times: max=5 min 0 sec
  36.         wait counts: calls=1 os=99
  37.         occurred after 0.007141 sec of elapsed time
  38.      5: waited for 'smon timer'
  39.         sleep time=0x12c, failed=0x0, =0x0
  40.         wait_id=9247540 seq_num=7112 snap_id=1
  41.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  42.         wait times: max=5 min 0 sec
  43.         wait counts: calls=1 os=99
  44.         occurred after 0.176498 sec of elapsed time
  45.      6: waited for 'smon timer'
  46.         sleep time=0x12c, failed=0x0, =0x0
  47.         wait_id=9247539 seq_num=7111 snap_id=1
  48.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  49.         wait times: max=5 min 0 sec
  50.         wait counts: calls=1 os=99
  51.         occurred after 0.183811 sec of elapsed time
  52.      7: waited for 'smon timer'
  53.         sleep time=0x12c, failed=0x0, =0x0
  54.         wait_id=9247538 seq_num=7110 snap_id=1
  55.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  56.         wait times: max=5 min 0 sec
  57.         wait counts: calls=1 os=99
  58.         occurred after 0.088497 sec of elapsed time
  59.      8: waited for 'smon timer'
  60.         sleep time=0x12c, failed=0x0, =0x0
  61.         wait_id=9247537 seq_num=7109 snap_id=1
  62.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  63.         wait times: max=5 min 0 sec
  64.         wait counts: calls=1 os=99
  65.         occurred after 0.262751 sec of elapsed time
  66.      9: waited for 'smon timer'
  67.         sleep time=0x12c, failed=0x0, =0x0
  68.         wait_id=9247536 seq_num=7108 snap_id=1
  69.         wait times: snap=5 min 0 sec, exc=5 min 0 sec, total=5 min 0 sec
  70.         wait times: max=5 min 0 sec
  71.         wait counts: calls=1 os=99
  72.         occurred after 0.029236 sec of elapsed time
  73.     sampled session history of session 66 serial 1
  74.     ---------------------------------------------------
  75.     the sampled session history is constructed by sampling
  76.     the target session every 1 second. the sampling process
  77.     captures at each sample if the session is in a non-idle wait,
  78.     an idle wait, or not in a wait. if the session is in a
  79.     non-idle wait then one interval is shown for all the samples
  80.     the session was in the same non-idle wait. if the
  81.     session is in an idle wait or not in a wait for
  82.     consecutive samples then one interval is shown for all
  83.     the consecutive samples. though we display these consecutive
  84.     samples in a single interval the session may not be continuously
  85.     idle or not in a wait (the sampling process does not know).
  86.  
  87.     the history is displayed in reverse chronological order.
没看到有什么异常。

改转向其他地方了,对,就是pmon的trace文件。
直接到最底部

点击(此处)折叠或打开

  1. 0bf4f4ec0 00000000 00000000 00000000 00000000 [................]
  2.         repeat 113 times
  3. 0bf4f55e0 bf4f55e0 00000000 bf4f55e0 00000000 [.uo......uo.....]
  4. 0bf4f55f0 00000000 00000000 bf4f55f8 00000000 [.........uo.....]
  5. 0bf4f5600 bf4f55f8 00000000 00000000 00000000 [.uo.............]
  6. 0bf4f5610 00000000 00000000 00000000 00000000 [................]
  7.   repeat 1 times
  8. kjzduptcctx: notifying diag for crash event
  9. ----- abridged call stack trace -----
  10. ksedsts()461<-kjzdssdmp()267<-kjzduptcctx()232<-kjzdicrshnfy()53<-ksuitm()1332<-ksulhdcb()499<-ksucln()1243<-ksbrdp()971<-opirip()623<-opidrv()603<-sou2o()103<-opimai_real()266<-ssthrdmain()252<-main()201<-__libc_start_main()253<-_start()36
  11.  
  12. ----- end of abridged call stack trace -----

  13. *** 2021-04-16 08:46:21.779
  14. pmon (ospid: 3335): terminating the instance due to error 474
  15. ksuitm: waiting up to [5] seconds before killing diag(3370)
call stack trace对于问题定位非常重要。
我感觉其中关键的函数是ksucln()

猜测还是smon的老本行,清理对象时遇到问题。


smon宕机相关问题

  1. ora-474:smon进程终止并出现错误
  2. 1- ora-00474:smon进程在并行事务恢复期间因错误而终止

  3. 凯发app官方网站的解决方案:

  4. 通过在您的init@sid.ora中添加以下参数来关闭并行恢复,
  5. fast_start_parallel_rollback = false
  6. 反弹实例。

  7. 有关更多详细信息,请参阅:

  8. ora-600 [15789]和ora-474(doc id 1094645.1)


  9. 2-导致数据库崩溃的ora-600 [504]和ora-474实例崩溃,ora-600 [kcbnew_3]可以使它们崩溃。(低于11.2.0.2的版本)

  10. 凯发app官方网站的解决方案:

  11. 升级到10.2.0.5或11.2.0.2或更高版本

  12. 检查mos平台上一次性修补程序:9084487的可用性。

  13. 有关更多详细信息,请参阅:

  14. ora-00600 [504]和ora-474导致数据库崩溃(文档id 1209577.1)

  15. 3-在警报日志中报告的ora-600 [13011]和ora-474,其中跟踪失败的sql类似于“从smon_scn_time删除,其中scn =(从smon_scn_time中选择min(scn))”

  16. 凯发app官方网站的解决方案:

  17. 分析表smon_scn_time验证结构级联并重建其所有索引

  18. 有关更多详细信息,请参阅:

  19. 实例终止于错误ora-00474:smon进程终止于错误(文档id 1361872.1)

  20. 如果报告了不同表的错误,请尝试相同的凯发app官方网站的解决方案(分析报告的表和重建其索引)

  21. 有关此错误的疑难解答,请参阅以下文档,以了解更多详细信息:

  22. 了解和诊断ora-00600 [13011]错误(文档id 1392778.1)

  23. 4-使用ora-474和ora-660 [4464] / ora-600 [4427](在低于11.2.0.2的版本上)导致实例崩溃

  24. 这是bug 11814907:用ora-00474重新启动实例:由于关闭了smon过程而导致错误终止错误9857702的重复项:返还ora-600 [4464]

  25. 凯发app官方网站的解决方案:

  26. 升级到11.2.0.2或更高版本,或者安装临时补丁9857702(如果适用于您的平台)

  27. 5-警报日志中报告了ora-00600 [kdourp_inorder2]和ora-00474(版本低于11.2)

  28. 是错误7627304:ora-00600 [kdourp_inorder2]和ora-00474:smon,过程pmon终止实例已作为错误7662491的副本关闭:实例崩溃/ ora-600 [kddummy_blkchk]恢复期间命中

  29. 凯发app官方网站的解决方案:

  30. 升级至11.2或安装临时补丁7662491(如果适用于您的平台)


参考:
troubleshooting ora-46x and ora-47x xxxx process terminated with error (doc id 1907129.1)
srdc - instance termination (non-rac) issues : checklist of evidence to supply (doc id 2507010.1)
数据库系统监视进程(smon)(文档id 1495163.1)

对于宕机问题,搜集方法可以用 tfactl,顺便看看帮助内容----很丰富。

  1. [oracle@shdb01 ~]$ tfactl diagcollect -srdc -help
  2. service request data collection (srdc).
  3. usage : /opt/oracle.ahf/tfa/bin/tfactl diagcollect -srdc [-tag ] [-z ] [-last | -from -to | -for ] -database
  4. -tag the files will be collected into tagname directory inside
  5. repository
  6. -z the collection zip file will be given this name within the
  7. tfa collection repository
  8. -last files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours
  9. -since same as -last. kept for backward compatibility.
  10. -from "mon/dd/yyyy hh:mm:ss" from
  11. or "yyyy-mm-dd hh:mm:ss"
  12. or "yyyy-mm-ddthh:mm:ss"
  13. or "yyyy-mm-dd"
  14. -to "mon/dd/yyyy hh:mm:ss" to
  15. or "yyyy-mm-dd hh:mm:ss"
  16. or "yyyy-mm-ddthh:mm:ss"
  17. or "yyyy-mm-dd"
  18. -for "mon/dd/yyyy" for .
  19. or "yyyy-mm-dd"
  20. can be any of the following,
  21. dbcorrupt required diagnostic data collection for a generic database corruption
  22. listener_services srdc - data collection for tns-12516 / tns-12518 / tns-12519 / tns-12520.
  23. naming_services srdc - data collection for ora-12154 / ora-12514 / ora-12528.
  24. ora-00020 srdc for database ora-00020 maximum number of processes exceeded
  25. ora-00060 srdc for ora-00060. internal error code.
  26. ora-00494 srdc for ora-00494.
  27. ora-00600 srdc for ora-00600. internal error code.
  28. ora-00700 srdc for ora-00700. soft internal error.
  29. ora-01031 srdc - how to collect standard information for ora - 1031 /ora -1017 during sysdba connections
  30. ora-01555 srdc - ora-1555: checklist of evidence to supply (doc id 1682708.1)
  31. ora-01578 srdc - required diagnostic data collection for ora-01578
  32. ora-01628 srdc for database ora-01628 snapshot too old problems
  33. ora-04020 srdc for ora-04020
  34. ora-04021 srdc for ora-04021.
  35. ora-04030 srdc for ora-04030. os process private memory was exhausted.
  36. ora-04031 srdc for ora-04031. more shared memory is needed in the shared/streams pool.
  37. ora-07445 srdc for ora-07445. exception encountered, core dump.
  38. ora-08102 srdc - required diagnostic data collection for ora-08102.
  39. ora-08103 srdc - required diagnostic data collection for ora-08103.
  40. ora-12751 srdc for ora-12751. internal error code.
  41. ora-22924 srdc - ora-22924 or ora-1555 on lob data: checklist of evidence to supply (doc id 1682707.1)
  42. ora-27300 srdc for ora-27300. os system dependent operation:open failed with status: (status).
  43. ora-27301 srdc for ora-27301. os failure message: (message).
  44. ora-27302 srdc for ora-27302. failure occurred at: (module).
  45. ora-30036 srdc for database ora-30036 unable to extend undo tablespace problems
  46. tns-12154 srdc - data collection for tns-12154.
  47. tns-12514 srdc - data collection for tns-12514.
  48. tns-12516 srdc - data collection for tns-12516.
  49. tns-12518 srdc - data collection for tns-12518.
  50. tns-12519 srdc - data collection for tns-12519.
  51. tns-12520 srdc - data collection for tns-12520.
  52. tns-12528 srdc - data collection for tns-12528.
  53. ahf srdc - data collection for orachk or exachk issue, after running orachk -debug or exachk -debug.
  54. crs srdc for crs
  55. crsasm srdc for asm crs related errors
  56. crsasmcell srdc for asm crs cell related errors
  57. dbacl srdc - how to collect standard information for access control lists (acls).
  58. dbaqgen srdc - how to collect information for troubleshooting problem in an oracle advanced queuing environment.
  59. dbaqmon srdc - how to collect information for troubleshooting queue monitor (qmon) issues.
  60. dbaqnotify srdc - how to collect information for troubleshooting notification in an advanced queuing environment.
  61. dbaqperf srdc - how to collect information for troubleshooting performance in an oracle advanced queuing environment.
  62. dbaqpurge srdc - how to collect information for troubleshooting non-purged messages in an advanced queuing environment
  63. dbasm srdc automation: enhance asm/dbfs/dnfs/acfs collections
  64. dbaudit srdc - how to collect standard information for database auditing
  65. dbaum srdc - aum : checklist of evidence to supply (doc id 1682741.1)
  66. dbaumwaitevents srdc - wait events related to undo: checklist of evidence to supply (doc id 1682723.1)
  67. dbawrspace srdc for database awr space problems
  68. dbbeqconnection srdc - bequeath connection issues: checklist of evidence to supply (doc id 1928047.1)
  69. dbdatapatch srdc - data collection for datapatch issues.
  70. dbddlerrors srdc - ddl errors: checklist of evidence to supply
  71. dbemon srdc - how to collect information for troubleshooting event monitor (emon) issues
  72. dbenqdeq srdc - how to collect standard information for advanced queueing issues using tfa collector (recommended) or manual steps
  73. dbexp srdc - how to collect information for troubleshooting export (exp) related problems
  74. dbexpdp srdc - diagnostic collection for datapump export generic issues
  75. dbexpdpapi srdc - diagnostic collection for datapump export api issues
  76. dbexpdpperf srdc - diagnostic collection for datapump export performance issues
  77. dbexpdptts srdc - data to supply for transportable tablespace datapump and original export, import
  78. dbfra srdc - required diagnostic data collection for fra related errors.
  79. dbfs srdc for dbfs.
  80. dbggclassicmode srdc for doc id 1913426.1, 1913376.1 and 1912964.1
  81. dbggintegratedmode srdc for goldengate extract/replicat abends problems.
  82. dbhang srdc for database hang problems
  83. dbimp srdc - diagnostic collection for traditional import issues
  84. dbimpdp srdc - diagnostic collection for datapump import (impdp) generic issues
  85. dbimpdpperf srdc - diagnostic collection for datapump import (impdp) performance issues
  86. dbinstall srdc for oracle rdbms install problems.
  87. dbinstancecrash srdc - instance termination (non-rac) issues : checklist of evidence to supply (doc id 2507010.1)
  88. dbinvalidcomp srdc - invalid components and objects : checklist of evidence to supply
  89. dbinvalidobj srdc - objects getting invalidated: checklist of evidence to supply
  90. dbparameterfiles srdc - parameter files :checklist of evidence to supply.
  91. dbparameters srdc - database parameters: checklist of evidence to supply.
  92. dbpartition srdc - data to supply for create/maintain partitioned/subpartitioned table/index issues
  93. dbpartitionperf srdc - data to supply for slow create/alter/drop commands against partitioned table/index
  94. dbpatchconflict srdc for oracle rdbms patch conflict problems.
  95. dbpatchinstall
  96. dbperf srdc for database performance problems
  97. dbplugincompliance srdc - collect relevant diagnostic information for all compliance related issues within enterprise manager 12c and 13c for oracle database.
  98. dbpreupgrade srdc for database preupgrade problems.
  99. dbprocmgmt srdc - generic process management and related issues: checklist of evidence to supply (doc id 2500734.1)
  100. dbrac srdc for rac specific issues
  101. dbracinst srdc automation: enhance asm/dbfs/dnfs/acfs collections
  102. dbracmin minimal srdc for rac specific issues
  103. dbracperf srdc for rac database performance problems
  104. dbrman srdc - required diagnostic data collection for rman related errors.
  105. dbrmanperf srdc - required diagnostic data collection for rman performance(1671509.1).
  106. dbscn srdc for database scn problems.
  107. dbshutdown srdc - shutdown issues : checklist of evidence to supply (doc id 1906473.1)
  108. dbslowddl srdc - slow ddl: checklist of evidence to supply
  109. dbspatialexportimport srdc - data collection for oracle spatial export/import issues.
  110. dbspatialinstall srdc - data collection for oracle spatial installation issues.
  111. dbsqlperf srdc - how to collect standard information for a sql performance problem using tfa collector.
  112. dbstandalonedbca srdc - dbca issues: checklist of evidence to supply
  113. dbstartup srdc - startup issues: checklist of evidence to supply (doc id 1905616.1)
  114. dbtde srdc - how to collect standard information for transparent data encryption (tde) (doc id 1905607.1)
  115. dbtextinstall srdc - data collection for oracle text installation issues - 12c.
  116. dbtextupgrade srdc - data collection for oracle text upgrade issues - 12c.
  117. dbundocorruption srdc - required diagnostic data collection for undo corruption.
  118. dbunixresources srdc to capture diagnostic data for db issues related to o/s resources
  119. dbupgrade srdc for database upgrade problems.
  120. dbvault srdc - how to collect standard information for database vault
  121. dbwindowsresources srdc - db on windows resources : checklist of evidence to supply.
  122. dbwinservice srdc - oracleservice on windows: checklist of evidence to supply (doc id 1918781.1)
  123. dbxdb srdc for database xdb installation and invalid object problems
  124. dnfs srdc for dnfs.
  125. emagentgeneric srdc - collect trace/log information for enterprise manager management agent generic issues
  126. emagentpatching srdc - collect trace/log information for failures during enterprise manager 13c management agent patching.
  127. emagentperf em srdc - collect diagnostic data for em agent performance issues.
  128. emagentstartup srdc - collecting logs for enterprise manager 13c agent startup errors.
  129. emagtpatchdeploy srdc - collecting log files for em 13c agent or agent patch deployment.
  130. emagtupgpatch srdc - collecting log files for em 13c agent upgrade or local installation or patching.
  131. emcliadd em srdc - errors during the adding of a database/listener/asm target via emcli.
  132. emclusdisc em srdc - cluster target, cluster (rac) database or asm target is not discovered.
  133. emdbaasdeploy srdc - collect trace/log information for failures during database as a service(dbaas) deployment.
  134. emdbsys em srdc - database system target is not discovered/detected/removed/renamed correctly.
  135. emdebugoff srdc for unsetting em debug.
  136. emdebugon srdc for setting em debug.
  137. emfleetpatching srdc - collecting diagnostic data for enterprise manager fleet maintenance patching issues.
  138. emgendisc em srdc - general error is received when discovering or removing a database/listener/asm target.
  139. emmetricalert srdc for em metric events not raised and general metric alert related issues.
  140. emomscrash srdc - collect diagnostic data for all enterprise manager oms crash / restart performance issues.
  141. emomsheap srdc - collecting diagnostic data for enterprise manager oms heap usage alert performance issues.
  142. emomshungcpu srdc - collecting diagnostic data for enterprise manager oms hung or high cpu usage performance issues.
  143. emomspatching srdc - collect trace/log information for failures during enterprise manager 13c oms patching.
  144. empatchplancrt srdc - collecting diagnostic data for enterprise manager patch plan creation issues.
  145. emprocdisc em srdc - database/listener/asm target is not discovered/detected by the discovery process.
  146. emtbsmetric srdc - collect relevant diagnostic information for all tablespace space used (%) metric issues within enterprise manager for oracle database 12c and 13c.
  147. esexalogic srdc - exalogic full exalogs data collection information.
  148. exservice srdc - exadata: storage software service or offload server service failures.
  149. exsmartscan srdc - exadata: smart scan not working issues.
  150. gg_abend srdc for doc id 2650417.1
  151. ggintegratedmodenodb srdc for goldengate extract/replicat abends problems.
  152. gridinfra srdc automation: enhance asm/dbfs/dnfs/acfs collections
  153. gridinfrainst srdc automation: enhance asm/dbfs/dnfs/acfs collections
  154. instterm srdc for instance terminated events, such as ora-00469: ora-00470: ora-00480: ora-00490: ora-00491, ora-00492, ora-00493, ora-00495, ora-00496, ora-00497, ora-00498
  155. internalerror srdc for all other types of internal database errors.
  156. ora1000 srdc - open cursors:checklist of evidence to supply.
  157. ora18 srdc - ora-18 or sessions parameter: checklist of evidence to supply.
  158. ora25319 srdc - how to collect information for troubleshooting an ora-25319 error in an advanced queuing environment.
  159. ora4023 srdc - ora-4023 : checklist of evidence to supply
  160. ora4063 srdc - ora-4063 : checklist of evidence to supply
  161. ora445 srdc - ora-445 or unable to spawn process: checklist of evidence to supply (doc id 2500730.1)
  162. xdb600 srdc - required diagnostic data collection for xdb ora-00600 and ora-07445 internal error issues using tfa collector
  163. xdbinstall srdc - required diagnostic data collection for xdb installation and invalid object for issues for 12c and onward
  164. zlgeneric srdc - zero data loss recovery appliance (zdlra) data collection.
  165. [oracle@shdb01 ~]$
结合alert.log,从最早的告警开始,发现15日14点awr就没有生成。

从 dba_hist_active_sess_history 看看出问题前库里在忙啥

set lines 500
set long 9999
set pages 999
set serveroutput on size 1000000 
alter session set nls_date_format = 'yyyy/mm/dd hh24:mi:ss';
alter session set nls_timestamp_format = 'yyyy-mm-dd hh24.mi.ss.ff';

select instance_number, sample_id,sample_time,count(*) cnt
from dba_hist_active_sess_history where sample_time between 
 to_timestamp('2021/04/15 13:00', 'yyyy/mm/dd hh24:mi') and
to_timestamp('2021/04/16 10:00', 'yyyy/mm/dd hh24:mi')
group by instance_number, sample_id,sample_time
order by instance_number, sample_id,sample_time;  

也没有数据了(宕机前也没有什么会话,大早晨8:30测试库能有什么业务)。

m000进程没有日志文件,只有j000的日志中每隔2秒提示:
process j000 is dead ... state=ksosp_spawned

操作系统的messages中出问题时有oom报错:

  1. apr 11 18:58:27 host auditd[1737]: audit daemon rotating log files
  2. apr 11 22:05:14 host auditd[1737]: audit daemon rotating log files
  3. apr 12 11:09:15 host auditd[1737]: audit daemon rotating log files
  4. apr 12 20:10:36 host auditd[1737]: audit daemon rotating log files
  5. apr 13 13:01:16 host auditd[1737]: audit daemon rotating log files
  6. apr 13 19:23:24 host auditd[1737]: audit daemon rotating log files
  7. apr 14 13:49:20 host auditd[1737]: audit daemon rotating log files
  8. apr 15 12:23:09 host auditd[1737]: audit daemon rotating log files
  9. apr 15 17:34:48 host kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
  10. apr 15 17:34:48 host kernel: oracle cpuset=/ mems_allowed=0
  11. apr 15 17:34:48 host kernel: pid: 3388, comm: oracle tainted: g --------------- t 2.6.32-431.el6.x86_64 #1
  12. apr 15 17:34:48 host kernel: call trace:
  13. apr 15 17:34:48 host kernel: [] ? cpuset_print_task_mems_allowed 0x91/0xb0
  14. apr 15 17:34:48 host kernel: [] ? dump_header 0x90/0x1b0
  15. apr 15 17:34:48 host kernel: [] ? security_real_capable_noaudit 0x3c/0x70
  16. apr 15 17:34:48 host kernel: [] ? oom_kill_process 0x82/0x2a0
  17. apr 15 17:34:48 host kernel: [] ? select_bad_process 0xe1/0x120
  18. apr 15 17:34:48 host kernel: [] ? out_of_memory 0x220/0x3c0
  19. apr 15 17:34:48 host kernel: [] ? __alloc_pages_nodemask 0x8ac/0x8d0
  20. apr 15 17:34:48 host kernel: [] ? alloc_pages_current 0xaa/0x110
  21. apr 15 17:34:48 host kernel: [] ? __page_cache_alloc 0x87/0x90
  22. apr 15 17:34:48 host kernel: [] ? find_get_page 0x1e/0xa0
  23. apr 15 17:34:48 host kernel: [] ? filemap_fault 0x1a7/0x500
  24. apr 15 17:34:48 host kernel: [] ? __do_fault 0x54/0x530
  25. apr 15 17:34:48 host kernel: [] ? handle_pte_fault 0xf7/0xb00
  26. apr 15 17:34:48 host kernel: [] ? rb_reserve_next_event 0xb4/0x370
  27. apr 15 17:34:48 host kernel: [] ? native_sched_clock 0x13/0x80
  28. apr 15 17:34:48 host kernel: [] ? rb_reserve_next_event 0xb4/0x370
  29. apr 15 17:34:48 host kernel: [] ? native_sched_clock 0x13/0x80
  30. apr 15 17:34:48 host kernel: [] ? handle_mm_fault 0x22a/0x300
  31. apr 15 17:34:48 host kernel: [] ? __do_page_fault 0x138/0x480
  32. apr 15 17:34:48 host kernel: [] ? thread_group_times 0x3d/0x120
  33. apr 15 17:34:48 host kernel: [] ? ring_buffer_lock_reserve 0xa2/0x160
  34. apr 15 17:34:48 host kernel: [] ? mmput 0x1e/0x120
  35. apr 15 17:34:48 host kernel: [] ? trace_nowake_buffer_unlock_commit 0x43/0x60
  36. apr 15 17:34:48 host kernel: [] ? ftrace_raw_event_sys_exit 0xb9/0xc0
  37. apr 15 17:34:48 host kernel: [] ? do_page_fault 0x3e/0xa0
  38. apr 15 17:34:48 host kernel: [] ? page_fault 0x25/0x30
  39. apr 15 17:34:48 host kernel: mem-info:
  40. apr 15 17:34:48 host kernel: node 0 dma per-cpu:
  41. apr 15 17:34:48 host kernel: cpu 0: hi: 0, btch: 1 usd: 0
  42. apr 15 17:34:48 host kernel: cpu 1: hi: 0, btch: 1 usd: 0
  43. apr 15 17:34:48 host kernel: cpu 2: hi: 0, btch: 1 usd: 0
  44. apr 15 17:34:48 host kernel: cpu 3: hi: 0, btch: 1 usd: 0
  45. apr 15 17:34:48 host kernel: node 0 dma32 per-cpu:
  46. apr 15 17:34:48 host kernel: cpu 0: hi: 186, btch: 31 usd: 0
  47. apr 15 17:34:48 host kernel: cpu 1: hi: 186, btch: 31 usd: 0
  48. apr 15 17:34:48 host kernel: cpu 2: hi: 186, btch: 31 usd: 0
  49. apr 15 17:34:48 host kernel: cpu 3: hi: 186, btch: 31 usd: 0
  50. apr 15 17:34:48 host kernel: node 0 normal per-cpu:
  51. apr 15 17:34:48 host kernel: cpu 0: hi: 186, btch: 31 usd: 0
  52. apr 15 17:34:48 host kernel: cpu 1: hi: 186, btch: 31 usd: 0
  53. apr 15 17:34:48 host kernel: cpu 2: hi: 186, btch: 31 usd: 23
  54. apr 15 17:34:48 host kernel: cpu 3: hi: 186, btch: 31 usd: 0
  55. apr 15 17:34:48 host kernel: active_anon:463690 inactive_anon:140220 isolated_anon:0
  56. apr 15 17:34:48 host kernel: active_file:245 inactive_file:504 isolated_file:0
  57. apr 15 17:34:48 host kernel: unevictable:0 dirty:11 writeback:0 unstable:0
  58. apr 15 17:34:48 host kernel: free:22140 slab_reclaimable:10286 slab_unreclaimable:84990
  59. apr 15 17:34:48 host kernel: mapped:11740 shmem:46989 pagetables:215832 bounce:0
  60. apr 15 17:34:48 host kernel: node 0 dma free:15684kb min:248kb low:308kb high:372kb active_anon:0kb inactive_anon:0kb active_file:0kb inactive_file:0kb unevictable:0kb isolated(anon):0kb isolated(file):0kb present:15292kb mlocked:0kb dirty:0kb writeback:0kb mapped:0kb shmem:0kb slab_reclaimable:0kb slab_unreclaimable:0kb kernel_stack:0kb pagetables:0kb unstable:0kb bounce:0kb writeback_tmp:0kb pages_scanned:0 all_unreclaimable? yes
  61. apr 15 17:34:48 host kernel: lowmem_reserve[]: 0 3000 4010 4010
  62. apr 15 17:34:48 host kernel: node 0 dma32 free:55048kb min:50372kb low:62964kb high:75556kb active_anon:1627600kb inactive_anon:333656kb active_file:916kb inactive_file:1996kb unevictable:0kb isolated(anon):0kb isolated(file):0kb present:3072096kb mlocked:0kb dirty:32kb writeback:0kb mapped:23000kb shmem:120460kb slab_reclaimable:23224kb slab_unreclaimable:199140kb kernel_stack:27016kb pagetables:538512kb unstable:0kb bounce:0kb writeback_tmp:0kb pages_scanned:0 all_unreclaimable? no
  63. apr 15 17:34:48 host kernel: lowmem_reserve[]: 0 0 1010 1010
  64. apr 15 17:34:48 host kernel: node 0 normal free:17828kb min:16956kb low:21192kb high:25432kb active_anon:227160kb inactive_anon:227224kb active_file:64kb inactive_file:20kb unevictable:0kb isolated(anon):0kb isolated(file):0kb present:1034240kb mlocked:0kb dirty:12kb writeback:0kb mapped:23960kb shmem:67496kb slab_reclaimable:17920kb slab_unreclaimable:140820kb kernel_stack:7400kb pagetables:324816kb unstable:0kb bounce:0kb writeback_tmp:0kb pages_scanned:16 all_unreclaimable? no
  65. apr 15 17:34:48 host kernel: lowmem_reserve[]: 0 0 0 0
  66. apr 15 17:34:48 host kernel: node 0 dma: 1*4kb 4*8kb 2*16kb 2*32kb 3*64kb 0*128kb 0*256kb 0*512kb 1*1024kb 1*2048kb 3*4096kb = 15684kb
  67. apr 15 17:34:48 host kernel: node 0 dma32: 5534*4kb 1983*8kb 236*16kb 85*32kb 110*64kb 28*128kb 0*256kb 0*512kb 0*1024kb 0*2048kb 0*4096kb = 55120kb
  68. apr 15 17:34:48 host kernel: node 0 normal: 4451*4kb 3*8kb 0*16kb 0*32kb 0*64kb 0*128kb 0*256kb 0*512kb 0*1024kb 0*2048kb 0*4096kb = 17828kb
  69. apr 15 17:34:48 host kernel: 50408 total pagecache pages
  70. apr 15 17:34:48 host kernel: 2357 pages in swap cache
  71. apr 15 17:34:48 host kernel: swap cache stats: add 13218790, delete 13216433, find 24808505/26126131
  72. apr 15 17:34:48 host kernel: free swap = 0kb
  73. apr 15 17:34:48 host kernel: total swap = 4194296kb
  74. apr 15 17:34:48 host kernel: 1048560 pages ram
  75. apr 15 17:34:48 host kernel: 67274 pages reserved
  76. apr 15 17:34:48 host kernel: 179030 pages shared
  77. apr 15 17:34:48 host kernel: 816137 pages non-shared
  78. apr 15 17:34:48 host kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
  79. apr 15 17:34:48 host kernel: [ 465] 0 465 2814 1 1 -17 -1000 udevd
  80. apr 15 17:34:48 host kernel: [ 1589] 0 1589 47371 136 0 0 0 vmtoolsd
  81. apr 15 17:34:48 host kernel: [ 1737] 0 1737 23300 73 2 -17 -1000 auditd
  82. apr 15 17:34:48 host kernel: [ 1739] 0 1739 20521 80 1 0 0 audispd
  83. apr 15 17:34:48 host kernel: [ 1740] 0 1740 5301 42 0 0 0 sedispatch
  84. apr 15 17:34:48 host kernel: [ 1814] 0 1814 2705 44 2 0 0 irqbalance
  85. apr 15 17:34:48 host kernel: [ 1833] 32 1833 4759 22 0 0 0 rpcbind
  86. apr 15 17:34:48 host kernel: [ 1942] 0 1942 3396 44 3 -17 -1000 lldpad

free swap = 0kb ?
估计是内存不足,部署osw,再观察吧。
阅读(2511) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~
")); function link(t){ var href= $(t).attr('href'); href ="?url=" encodeuricomponent(location.href); $(t).attr('href',href); //setcookie("returnouturl", location.href, 60, "/"); }
网站地图