11.2.0.4 rac打上20200414 psu后crsctl stat res -t看到ora.drivers.acfs为offline,通常没人用acfs也不用管,事情到此结束。
[root@db01 ~]# crsctl status resource ora.registry.acfs -f
看各参数设置打补丁前后都一样。
官方mos搜了搜发现还是和os内核有关,但是刚装好时都时online,怎么升级psu后就offline了,是你打补丁破坏了这个系统,赔钱!
继续瞎分析。
搜索一番(人品时刻到来了)
发现 这个服务由orarootagent负责启动,这个知识点基本没用。
意外发现 ora.diskmon offline的原因,见note:1346881.1 - 11.2.0.3 grid infrastructure diskmon will be offline by default in non-exadata environment,总算有点收获了,这个offline能解释了。
继续搜索(人品时刻又到来了)
有人介绍了一下这个服务是干什么的(我不想了解),acfs是一个通用的便携式群集文件系统,可以在许多操作系统上运行,作为oracle 11.2或更高版本中grid infrastructure安装的一部分进行安装。acfs最初在linux和windows(从oracle 11.2.0.1开始)以及solaris,aix和oracle 11.2.0.2上可用。acfs在linux / unix上兼容posix和x / open,可以通过nas协议(例如nfs和cifs)进行远程访问。
如果正常运行,通过lsmod能看到启动了
[root@db01 ~]# lsmod | grep oracle
oracleacfs 1990406 0
oracleadvm 250040 0
oracleoks 427672 2 oracleacfs,oracleadvm
目标指向这个lsmod是什么东西呢?linux的命令,不懂没关系(但后面会说这个关系)
发现 acfs开头有几个命令,都放在$grid_home/bin/目录下,例如在linux中,用grid用户敲acfs后按tab键能列出来好几个相关维护命令,令人开眼(不想学习)。
[root@db01 ~]# acfsroot disable --禁用acfs
[root@db01 ~]# acfsroot uninstall --卸载acfs
acfs-9312: existing advm/acfs installation detected.
acfs-9314: removing previous advm/acfs installation.
acfs-9315: previous advm/acfs components successfully removed.
[root@db01 ~]# acfsload stop --停止acfs模块
[root@db01 ~]# acfsload start --启动acfs模块
acfs-9391: checking for existing advm/acfs installation.
acfs-9392: validating advm/acfs installation files for operating system.
acfs-9393: verifying asm administrator setup.
acfs-9308: loading installed advm/acfs drivers.
acfs-9154: loading 'oracleoks.ko' driver.
fatal: module oracleoks not found.
acfs-9109: oracleoks.ko driver failed to load.
acfs-9127: not all advm/acfs drivers have been loaded.
[root@db01 ~]#
安装acfs,并打印详细过程:
[root@db01 ~]# acfsroot install -v
acfs-9500: location of oracle home is '/u01/app/11.2/grid' as determined from the internal configuration data
acfs-9300: advm/acfs distribution files found.
acfs-9155: checking for existing 'oracleoks.ko' driver installation.
acfs-9155: checking for existing 'oracleoks.ko' driver installation.
acfs-9312: existing advm/acfs installation detected.
acfs-9314: removing previous advm/acfs installation.
acfs-9315: previous advm/acfs components successfully removed.
acfs-9307: installing requested advm/acfs software.
acfs-9503: advm and acfs driver media location is '/u01/app/11.2/grid/install/usm/oracle/el6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin'
acfs-9504: copying file '/u01/app/11.2/grid/install/usm/oracle/el6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin/oracleadvm.ko' to the path '/lib/modules/2.6.32-696.23.1.el6.x86_64/extra/usm/oracleadvm.ko'
acfs-9504: copying file '/u01/app/11.2/grid/install/usm/oracle/el6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin/oracleoks.ko' to the path '/lib/modules/2.6.32-696.23.1.el6.x86_64/extra/usm/oracleoks.ko'
acfs-9504: copying file '/u01/app/11.2/grid/install/usm/oracle/el6/x86_64/2.6.32-696/2.6.32-696.el6-x86_64/bin/oracleacfs.ko' to the path '/lib/modules/2.6.32-696.23.1.el6.x86_64/extra/usm/oracleacfs.ko'
acfs-9308: loading installed advm/acfs drivers.
acfs-9321: creating udev for advm/acfs.
acfs-9323: creating module dependencies - this may take some time.
acfs-9154: loading 'oracleoks.ko' driver.
fatal: module oracleoks not found.
acfs-9109: oracleoks.ko driver failed to load.
acfs-9428: message 9428 not found; product=usm; facility=acfs
acfs-9310: advm/acfs installation failed.
来来回回跟
oracleoks.ko 这个文件有关且报错acfs-9109。
搜一下这个文件,通过痛苦的比较(find / -name
oracleoks.ko)发现,的确在安装psu后这个文件有变化,此处有论文一篇:《虚拟机快照与调试效率的重要性》。
执行启动命令
[root@db01 ~]# crsctl start res ora.drivers.acfs -init
crs-2672: attempting to start 'ora.drivers.acfs' on 'db01'
crs-5016: process "/u01/app/11.2/grid/bin/acfsload" spawned by agent "/u01/app/11.2/grid/bin/orarootagent.bin" for action "start" failed: details at "(:clsn00010:)" in "/u01/app/11.2/grid/log/db01/agent/ohasd/orarootagent_root//orarootagent_root.log"
crs-2674: start of 'ora.drivers.acfs' on 'db01' failed
crs-4000: command start failed, or completed with errors.
日志如下:
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] execcmd ret = 1
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)acfs-9391: checking for existing advm/acfs installation.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)acfs-9392: validating advm/acfs installation files for operating system.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)acfs-9393: verifying asm administrator setup.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)acfs-9308: loading installed advm/acfs drivers.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)acfs-9154: loading 'oracleoks.ko' driver.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)fatal: module oracleoks not found.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)acfs-9109: oracleoks.ko driver failed to load.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)acfs-9127: not all advm/acfs drivers have been loaded.
2020-06-20 04:53:42.154: [ora.drivers.acfs][2848847616]{0:0:605} [start] (:clsn00010:)
看来关键在
acfs-9109这个报错,看看怎么说的。
[grid@db01 ~]$ oerr acfs 9109
09109, 0, "%s driver failed to load."
// *cause: the driver failed to load.
// *action: view the system specific os kernel log
// (for instance, /var/log/messages on linux, event log on windows).
// if the drivers have not previously been unloaded
// ('crsctl stop crs', 'acfsload stop', 'acfsroot uninstall'), it is
// not possible to reload them.
// if a specific error has occurred, than clear the error condition
// and try again. if the os and\or architecture is not
// supported by the drivers, than contact
// oracle support services for an updated driver package.
[grid@db01 ~]$ oerr acfs 9127
09127, 0, "not all advm/acfs drivers have been loaded."
// *cause: advm/acfs device drivers have been started but not all
// of them are detected as running.
// *action: try 'acfsload stop' followed by 'acfsload start'.
// if that does not start all drivers, than contact oracle support
// services.
经过严密测试:
[root@db01 ~]# acfsdriverstate version
acfs-9325: driver os kernel version = 2.6.32-696.23.1.el6.x86_64(x86_64).
acfs-9326: driver oracle version = 190625.
重大结论是:
acfsload sotp后lsmod |grep acfs就看不到东西了
acfsload start 后就能看到了
so
,解决方法是:
按官方资料doc
1369107.1,升级os内核。
--------------------------------------------------------------
参考:
how to install/reinstall or deinstall acfs modules/installation manually? (doc id 1371067.1)
acfs support on os platforms (certification matrix). (文档 id 1369107.1)
命令:
crs_stat -p ora.diskmon
crsctl start resource ora.cssd
crsctl modify resource "ora.cssd" -attr "auto_start=1" or crsctl modify resource "ora.diskmon" -attr "auto_start=1"
crsctl modify resource "ora.cssd" -attr "auto_start=never" crsctl modify resource "ora.diskmon" -attr "auto_start=never"
(感觉有用,如果上面一点没用的话看看这个吧)
(这个能看到几个acfs常用维护命令)