前天处理了一启RAC ASM 内存泄露问题,系统为公司最后一套RAC one node 版本为11.2.0.3
这套核心RAC one node系统出现大量 log file switch buffer busy waits等待,并伴随大量cursor: pin S wait on X read by other session。
SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
10917 PROD_DATA2 JDBC Thin Client buffer busy waits 14/907008/8 0 294 /bgx0f86ucnw8v A 355 7628
13549 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 172 /520mkxqpf15q8 A 294 20514
4456 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 6rsq14x8a7n5r/dsxmfbstazq7u A 525 94053
4521 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 6rsq14x8a7n5r/dsxmfbstazq7u A 508 94052
6831 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 1k8vrykzkvccg/d5dfm29f41s0d A 739 1400
3170 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:12: 1397817350/21363/0 0 391 4uj3xtfzq3a91/gky4m8njtfrcd A 392 536146
4611 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 fqc7yvkzxjzk7/cqb6f9fkfkcsq A 725 932
13448 PROD_DATA2 JDBC Thin Client buffer busy waits 56/645165/1 0 730 fqc7yvkzxjzk7/cqb6f9fkfkcsq A 730 736
940 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 fqc7yvkzxjzk7/cqb6f9fkfkcsq A 710 953
8026 PROD_DATA2 JDBC Thin Client buffer busy waits 20/279682/8 0 441 3f0thwr0rg23c/c4h5dg81ffxyr A 441 94043
9290 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 3f0thwr0rg23c/c4h5dg81ffxyr A 220 473
2750 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 199 b3dmt8325pmvf/bpk6g015uzqzt A 417 81687
3510 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 223 c5n6d774as72b/5ak0xdkuvc65n A 273 536146
2323 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 172 c5n6d774as72b/5ak0xdkuvc65n A 407 536140
3767 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 725 5q60jbr6700u7/g2th2agwfkk1t A 725 1133
10205 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 717 5q60jbr6700u7/g2th2agwfkk1t A 717 1186
10046 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 670 5q60jbr6700u7/g2th2agwfkk1t A 670 1186
10044 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 689 5q60jbr6700u7/g2th2agwfkk1t A 689 24971
6395 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35201/8 0 693 5q60jbr6700u7/g2th2agwfkk1t A 693 1522
9535 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 703 5q60jbr6700u7/g2th2agwfkk1t A 703 42595
6917 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 5q60jbr6700u7/g2th2agwfkk1t A 709 1221
2902 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 700 5q60jbr6700u7/g2th2agwfkk1t A 700 2146
9554 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 5q60jbr6700u7/g2th2agwfkk1t A 726 2147
11662 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:73: 1397817350/21911/0 0 377 cg3urj768yv8g/apun8g5mb9byt A 378 18995
9378 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 cg3urj768yv8g/7g63srczfgfd6 A 627 19057
8698 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:73: 1397817350/21911/0 0 551 cg3urj768yv8g/5smgjxsscf91w A 551 18996
7575 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 204 cg3urj768yv8g/793hdqw3bn04q A 204 19035
...
13028 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 623 cg3urj768yv8g/8s97vpbnuktp2 A 623 19061
...
6659 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 615 cg3urj768yv8g/8s97vpbnuktp2 A 616 19035
5187 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:56: 1397817350/21911/0 0 418 cg3urj768yv8g/793hdqw3bn04q A 418 19069
4933 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 576 cg3urj768yv8g/9quv2rcpqjxwt A 576 18977
4346 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 219 cg3urj768yv8g/8s97vpbnuktp2 A 546 943
....
4082 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:14: 1397817350/21911/0 0 460 cg3urj768yv8g/520mkxqpf15q8 A 460 1522
7267 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:73: 1397817350/21911/0 0 415 cg3urj768yv8g/8s97vpbnuktp2 A 415 919
9532 PROD_DATA2 JDBC Thin Client DFS lock handle 1398145029/21916/0 0 353 2xw94b38cybhv/4001f9ha480pb A 354 358380
9716 PROD_DATA2 JDBC Thin Client DFS lock handle 1398145029/21916/0 0 264 2xw94b38cybhv/4001f9ha480pb A 265 2147
512 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 fhd852bb8agv6/gq6xnzt1spkzp A 654 533810
7001 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 4q0c1a3bsynyw/guwq1fpq9r1dy A 261 536140
11348 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 d3wp3qvbz5bna/1w0c0jhzmp1hh A 226 7299
11164 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 9fwybsrc3h1dd/70k37x7hwg0w5 A 701 1185
3096 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 220 9fwybsrc3h1dd/70k37x7hwg0w5 A 510 1522
8100 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 91r0dubcptymp/dsxmfbstazq7u A 649 95706
8192 PROD_DATA2 JDBC Thin Client buffer busy waits 24/274502/1 0 645 91r0dubcptymp/dsxmfbstazq7u A 645 94041
8783 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 91r0dubcptymp/dsxmfbstazq7u A 664 94039
6667 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 91r0dubcptymp/dsxmfbstazq7u A 685 1272
5445 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 91r0dubcptymp/dsxmfbstazq7u A 688 1185
38 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 199 91r0dubcptymp/dsxmfbstazq7u A 695 1263
9386 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 91r0dubcptymp/dsxmfbstazq7u A 685 95693
8868 PROD_DATA2 JDBC Thin Client buffer busy waits 53/118272/8 0 675 ghy358rghc4ty/3p3z1j33tk594 A 675 486608
6475 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 ghy358rghc4ty/3p3z1j33tk594 A 706 486608
6899 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 ftmka9gnhru7t/1w0c0jhzmp1hh A 738 442302
4003 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 8a5g4frpyags9/1w0c0jhzmp1hh A 401 435103
SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
861 PROD_DATA2 dcb-srv-0343-vm02 buffer busy waits 60/1078781/1 0 504 fjnq76rpzzpdy/a6tzf0tjrwvqh A 505 20966
12361 PROD_DATA2 xen-4-211-vm05 buffer busy waits 60/1078781/1 0 402 fjnq76rpzzpdy/a6tzf0tjrwvqh A 403 20976
6213 TOAD JDBC Thin Client buffer busy waits 41/34704/571 0 542 5kjukxbrv5w3r/dbkpyhyfyvdpu A 542 1577749
7513 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 cvvzaj3syc76b/1w0c0jhzmp1hh A 703 502831
8015 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 1tg6ux7vc22zm/32bjxyxc0xwfs A 463 899
10817 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 205 1tg6ux7vc22zm/32bjxyxc0xwfs A 720 18977
12268 PROD_DATA2 JDBC Thin Client buffer busy waits 57/1302268/1 0 283 3szwvvzw5aq13/bpaj7frdt63mx A 284 82802
5959 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 3szwvvzw5aq13/bpaj7frdt63mx A 583 75654
10975 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/a4td25ftgtqt8 A 40 51
10994 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/459f3z9u4fb3u A 34 34
10999 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/a4td25ftgtqt8 A 30 30
11015 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/a4td25ftgtqt8 A 8 9
4688 PROD_DATA2 JDBC Thin Client read by other session 20/1806690/1 0 2 4gzs73jtmhyd0/f2n7vrpx4v30v A 4 10
4626 PROD_DATA2 JDBC Thin Client db file sequential read 27/630098/1 0 0 11n7q3tu19vvp/adghpykwhquu2 A 1 23
10566 PROD_DATA2 JDBC Thin Client db file sequential read 57/780833/1 0 0 50731ypu4mu31/adghpykwhquu2 A 1 34
12527 PROD_DATA2 xen-21205-vm02 db file sequential read 28/168522/1 0 0 7szz51xufmuc4/520mkxqpf15q8 A 5 33
5489 PROD_DATA2 JDBC Thin Client read by other session 60/931838/1 0 1 1zbqmb9ukkhpd/459f3z9u4fb3u A 13 13
5063 PROD_DATA2 JDBC Thin Client db file sequential read 17/871948/1 0 2 g8jjuy5wt7h92/64yswjj6cdv5j A 12 16
SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
10302 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 44 42zra6dwxqwrd/cnm20xg9c333h A 44 44
07008/21474836480
11330 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 15 42zra6dwxqwrd/523rw17fc75fb A 14 38
07008/21474836480
.....
11698 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 2 42zra6dwxqwrd/459f3z9u4fb3u A 1 2
07008/21474836480
6473 PROD_DATA2 JDBC Thin Client read by other session 6/4044/1 0 0 42zra6dwxqwrd/gzcbp0razp0vn A 44 46
10136 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 25 42zra6dwxqwrd/523rw17fc75fb A 25 41 07008/21474836480 36320/21474836480
3669 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 47
36320/21474836480
4854 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 52
36320/21474836480
265 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 50
36320/21474836480
9355 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 52
......
36320/21474836480
4764 rows selected.
主要原因archive进程无法归档。ASM无法分配新的文件标识。
ASMCMD [+arch/item/archivelog/2012_10_19] > cp thread_1_seq_3.257.797108745 aaaaa.test; ASMCMD-8012: can not determine file type for file ORA-00569: Failed to acquire global enqueue. (DBD ERROR: OCIStmtExecute)
结合ASM trace log:
Wed Jun 05 07:39:52 2013 Dumping diagnostic data in directory=[cdmp_20130605073952], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412241]. Wed Jun 05 07:45:58 2013 Dumping diagnostic data in directory=[cdmp_20130605074557], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412242]. Wed Jun 05 07:52:44 2013 Dumping diagnostic data in directory=[cdmp_20130605075244], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412243]. Wed Jun 05 07:57:58 2013 Dumping diagnostic data in directory=[cdmp_20130605075758], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412244]. Wed Jun 05 08:03:59 2013 Dumping diagnostic data in directory=[cdmp_20130605080359], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412245]. Wed Jun 05 08:36:09 2013 SQL> /* ASMCMD */alter diskgroup /*ASMCMD*/ "data" add directory '+data/arch' SUCCESS: /* ASMCMD */alter diskgroup /*ASMCMD*/ "data" add directory '+data/arch' Wed Jun 05 08:42:40 2013 Dumping diagnostic data in directory=[cdmp_20130605084240], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=463685]. Wed Jun 05 08:47:47 2013 Dumping diagnostic data in directory=[cdmp_20130605084746], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=463686]. Wed Jun 05 08:52:18 2013 NOTE: ASM client item_1:item disconnected unexpectedly. NOTE: check client alert log. NOTE: Trace records dumped in trace file /u01/grid/11.2.0/diag/asm/+asm/+ASM1/trace/+ASM1_ora_25588.trc
早上07:39 ASM1 instance 出现问题 导致arch进程无法完成归档 进而导致后续的等待。
(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/oracle/11.2.0/oracle/product/db_1/bin/oracle)(ARGV0=oracleitem_1)(ENVS='ORACLE_HOME=/u01/oracle/11.2.0/oracle/product/db_1,ORACLE_SID=item_1,LD_LIBRARY_PATH=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=item_1)))
2013-06-05 07:39:53.587: [ora.item.db][1607948608] {1:21407:34683} [check] InstConnection::connectInt: server not attached
2013-06-05 07:39:53.717: [ USRTHRD][1607948608] {1:21407:34683} InstConnection:~InstConnection: this 1fc95100
2013-06-05 07:39:53.718: [ora.item.db][1607948608] {1:21407:34683} [check] DbAgent::isDgbHandleFO } status = 16646
Wed Jun 05 07:39:52 2013
ARC3: Error 19504 Creating archive log file to '+ARCH'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance item_1 - Archival Error
ORA-16038: log 7 sequence# 59248 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 7 thread 1: '+DATA/item/onlinelog/group_7.322.797746361'
For ASM2 instance:
Wed Jun 05 07:39:51 2013
Errors in file /u01/grid/11.2.0/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_3061.trc (incident=412241):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Incident details in: /u01/grid/11.2.0/diag/asm/+asm/+ASM2/incident/incdir_412241/+ASM2_lmd0_3061_i412241.trc
GES resource已经无法在shared_pool 中初始化,对于这个问题还是比较少见的 针对ASM instance的shared_pool 4031,在11.2.0.3之后
oracle的建议值为memory_target =1536MB 针对这个参数oracle也没有放出相关的最佳时间文档,相信在下一个release的时候oracle会改变
memory_target的default value .目前默认为:
SQL> show parameter target NAME TYPE VALUE ------------------------------------ --------------------------------- ------------------------------ memory_max_target big integer 1084M memory_target big integer 1084M pga_aggregate_target big integer 0 sga_target big integer 0
由于ASM中的memory 同样要负责GES GCS GRD 等资源的分配,memory fragment是一个不可忽视的问题,只是对于这个case而言大家极少的关注到
ASM 端的4031问题,针对于ASM在11.2.0.3(11.2.0.3 changed some default behaviors)的表现,oracle也应该意识到了这个问题。可以参考如下
文档获取一些有用信息。1536M 应该是oracle确定的一个最小安全值 这个也是得到了研发的认可,没有设置此参数的同学需要注意。
另外针对这个RAC 即使架构为RAC ONE NODE oracle在一些resource control方面仍然是常规RAC的行为,而RAC node NODE 也仅仅是减少了DB端的
一些资源争用。
针对这个case 可以做如下工作:
1 针对11gR2 设置 ASM memory_target 到一个安全的值
2 针对长时间running的database machines 需要做定期重启工作。这次出现问题的机器连续跑了220多天。
3 监控asm log 即使ASM 在11g中的稳定性已经得到了很大的加强。