阅读使人充实,讨论使人敏捷,写作使人精确。DB印象推荐搜索PostgreSQLOracleMySQL分布式>>>锁等待处理概述锁等待是我们比较常见的问题,无论是Oracle、MySQL还是PostgreSQL,都设计了大量的锁来保证并发操作的数据一致性,同时,当锁等待产生时会以队列方式进行,先到先得,后者依序排队等候。在高并发场景中,锁等待往往在某些凑巧的情况下能发挥非常巨大的杀伤力,严重的锁等待队列,简单的行锁等待就可以将系统的CPU打垮。
例如一个表的操作非常频繁,如果刚好前面有一个长事务没有完成,然后又刚好有事务需要获得一个排他锁,那么接下来的频繁DML请求都会被堵塞,如果队列中再来一个DDL,那么后果就更加严重。对于PG来说,解决锁等待通常有以下几种方式:1.设置lock_timeout、deadlock_timeout让等待自动超时,以便进程及时从等待队列中退出。
2.杀掉锁资源的阻塞源头,即持有锁的源头进程holder(或者叫root blocker):select pg_terminate_backend(holder_pid);
注意这里不是杀掉waiter,尤其是在高并发场景,waiter是杀不完的,比如Oracle中杀掉enq: TX - row lock contention的等待会话并非治本办法。
3.最后,就是整改优化业务逻辑了。
锁问题的处理其实并不复杂,关键是如何找到持有锁的源头,下面使用Tbase(基于pgxc)分析演示。
>>>锁等待场景模拟分析
首先模拟一个最简单的行锁等待现场:
在pgxc:cn001节点开启会话session A
显式事务不提交testdb=# begin;
BEGIN
testdb=# update aken set name=aa where name=aa and id =1;
UPDATE 1
testdb=#
在pgxc:cn001节点session B
在session 1未提交的情况下更新同一行,事务挂起:postgres=# \c testdb;
You are now connected to database "testdb" as user "dbmgr".
testdb=# update aken set name=aa where name=aa and id =1;
pgxc:cn002节点session C
在session 1、session2未提交的情况下更新同一行,事务挂起:postgres=# \c testdb
You are now connected to database "testdb" as user "dbmgr".
testdb=# update aken set name=aa where name=aa and id =1;
接着观察DN主节点的日志:
可以在csv日志中直接搜索关键字“waiting”来查看相关信息:-10-29 17:51:51.312 CST,"dbmgr","testdb",12240,coord(0,0),"192.168.155.177:38707",5f9a90b3.2fd0,coord(0,0),1,"UPDATE waiting",
-10-29 17:51:47 CST,60/5771127,1743392585,LOG,00000,"process 12240 still waiting for ShareLock on transaction 1743392584 after 1000.042 ms",
"Process holding the lock: 9724. Wait queue: 12240.",,,,"while updating tuple (0,11) in relation ""aken""",
"UPDATEakenSETname=aa::textWHERE((name=aa::text)AND(id=1))",,,"pgxc:cn001"
-10-29 17:55:47.579 CST,"dbmgr","testdb",2406,coord(0,0),"192.168.136.55:42768",5f9a9192.966,coord(0,0),1,"UPDATE waiting",
-10-29 17:55:30 CST,100/6995248,1743392586,LOG,00000,"process 2406 still waiting for ExclusiveLock on tuple (0,11) of relation 19054 of database 18466 after 1000.044 ms",
"Process holding the lock: 12240. Wait queue: 2406.",,,,,"UPDATE aken SET name = aa::text WHERE ((name = aa::text) AND (id = 1))",,,"pgxc:cn002"
上面的日志可以看出:
1.由pgxc:cn001发起的事务transaction-1743392585的进程12240在请求ShareLock on transaction 1743392584时被进程9724的事务1743392584阻塞;
2.由pgxc:cn002发起的事务transaction-1743392586的进程2406在请求ExclusiveLock on tuple (0,11) of relation 19054 of database 18466时被进程12240阻塞;
3.整个等待链为:进程9724(rootlockholder)--阻塞-->进程12240(ShareLockwaiter)--阻塞-->进程2406(ExclusiveLockwaiter)。
除了查看日志信息,我们也可以在DN主节点进入对应的database通过相关的监控视图查看lockinfo、wait_event信息。
1.查看lock信息:
可以看到各种mode的锁的进程持有、等待请况。postgres=# \c testdb;
You are now connected to database "testdb" as user "dbmgr".
testdb=# select pid,CASE granted WHEN f THEN waiter ELSE holder END as pid_role,mode,pg_blocking_pids(pid) bloker_pid,relation::regclass,virtualxid,virtualtransaction,transactionid from pg_locks
testdb-# where pid<>pg_backend_pid() order by pid,pid_role;
pid | pid_role | mode | bloker_pid | relation | virtualxid | virtualtransaction | transactionid
-------+----------+------------------+------------+----------+-------------+--------------------+---------------
2406 | holder | RowExclusiveLock | {12240} | aken | | 100/6995248 |
2406 | holder | ExclusiveLock | {12240} | | 100/6995248 | 100/6995248 |
2406 | holder | ExclusiveLock | {12240} | | | 100/6995248 | 1743392586
2406 | waiter | ExclusiveLock | {12240} | aken | | 100/6995248 |
9724 | holder | RowExclusiveLock | {} | aken | | 57/6548285 |
9724 | holder | ExclusiveLock | {} | | 57/6548285 | 57/6548285 |
9724 | holder | ExclusiveLock | {} | | | 57/6548285 | 1743392584
12240 | holder | ExclusiveLock | {9724} | | | 60/5771127 | 1743392585
12240 | holder | ExclusiveLock | {9724} | | 60/5771127 | 60/5771127 |
12240 | holder | RowExclusiveLock | {9724} | aken | | 60/5771127 |
12240 | holder | ExclusiveLock | {9724} | aken | | 60/5771127 |
12240 | waiter | ShareLock | {9724} | | | 60/5771127 | 1743392584
(12rows)
testdb=#