做分布式事务的时候用到了存储过程加事务,由于忘记在存储过程中捕获异常rollback,导致了死锁
不过我select ... for update锁的是A表,实际线上却是B表被锁死
sql一块之前理解不深入,debug的过程重新复习一下
bug重现
简化逻辑重现核心bug
锁A表
1 2 3 4 5 6 7 8 9 10 11 client1: mysql> start transaction; Query OK, 0 rows affected (0.01 sec) mysql> select count(1) from xids where xid = "xid_{1825}_wtf" and status = 0 for update; +----------+ | count(1) | +----------+ | 1 | +----------+ 1 row in set (0.01 sec)
查看innodb_trx
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 mysql> use information_schema Database changed mysql> select * from innodb_trx \G; *************************** 1. row *************************** trx_id: 21059318 trx_state: RUNNING trx_started: 2018-12-21 17:12:16 trx_requested_lock_id: NULL trx_wait_started: NULL trx_weight: 2 trx_mysql_thread_id: 22129 trx_query: NULL trx_operation_state: NULL trx_tables_in_use: 0 trx_tables_locked: 1 trx_lock_structs: 2 trx_lock_memory_bytes: 1136 trx_rows_locked: 1 trx_rows_modified: 0 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_adaptive_hash_latched: 0 trx_adaptive_hash_timeout: 0 trx_is_read_only: 0 trx_autocommit_non_locking: 0 1 row in set (0.01 sec)
通过trx_tables_locked字段,可以看到锁住了一个表,也就是我说的表A
事务执行失败
1 2 3 4 5 6 client 1: mysql> update users set balance = balance - convert(1,unsigned) where user_id = 100000000003; ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(`dsp_adlib`.`users`.`balance` - cast(1 as unsigned))' client 2: update users set balance = 0 where user_id = 100000000003;
由于balance是有符号数,和一个无符号相减并且结果小于0导致溢出(之前触发器bug也遇到过。。同一个坑踩了两次)
这个时候client2的update语句等待锁,也就是client1死锁了
查看下事务和锁情况
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 mysql> select * from innodb_trx \G; *************************** 1. row *************************** trx_id: 21059551 trx_state: LOCK WAIT trx_started: 2018-12-21 17:19:41 trx_requested_lock_id: 21059551:566:3:43 trx_wait_started: 2018-12-21 17:19:41 trx_weight: 2 trx_mysql_thread_id: 22131 trx_query: update users set balance = 0 where user_id = 100000000003 trx_operation_state: starting index read trx_tables_in_use: 1 trx_tables_locked: 1 trx_lock_structs: 2 trx_lock_memory_bytes: 1136 trx_rows_locked: 1 trx_rows_modified: 0 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_adaptive_hash_latched: 0 trx_adaptive_hash_timeout: 0 trx_is_read_only: 0 trx_autocommit_non_locking: 0 *************************** 2. row *************************** trx_id: 21059318 trx_state: RUNNING trx_started: 2018-12-21 17:12:16 trx_requested_lock_id: NULL trx_wait_started: NULL trx_weight: 4 trx_mysql_thread_id: 22129 trx_query: NULL trx_operation_state: NULL trx_tables_in_use: 0 trx_tables_locked: 2 trx_lock_structs: 4 trx_lock_memory_bytes: 1136 trx_rows_locked: 2 trx_rows_modified: 0 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_adaptive_hash_latched: 0 trx_adaptive_hash_timeout: 0 trx_is_read_only: 0 trx_autocommit_non_locking: 0 2 rows in set (0.01 sec) ERROR: No query specified mysql> select * from innodb_locks \G *************************** 1. row *************************** lock_id: 21059551:566:3:43 lock_trx_id: 21059551 lock_mode: X lock_type: RECORD lock_table: `dsp_adlib`.`users` lock_index: PRIMARY lock_space: 566 lock_page: 3 lock_rec: 43 lock_data: 100000000003 *************************** 2. row *************************** lock_id: 21059318:566:3:43 lock_trx_id: 21059318 lock_mode: X lock_type: RECORD lock_table: `dsp_adlib`.`users` lock_index: PRIMARY lock_space: 566 lock_page: 3 lock_rec: 43 lock_data: 100000000003 2 rows in set, 1 warning (0.01 sec) mysql> select * from innodb_lock_waits \G *************************** 1. row *************************** requesting_trx_id: 21059551 requested_lock_id: 21059551:566:3:43 blocking_trx_id: 21059318 blocking_lock_id: 21059318:566:3:43 1 row in set, 1 warning (0.01 sec)
可以看到client2事务id 21059551,由于client1事务id 21059318死锁无法继续进行。
client1的事务trx_tables_locked字段由1变为2,即表B被锁死
死锁分析
再次简化流程以后,尝试update成功是否会死锁
1 2 3 4 5 6 client1: start transaction; update users set balance = balance - 1 where user_id = 100000000003; 然后client2执行: update users set balance = 0 where user_id = 100000000003;
发现balance表依然被锁死,那么这个和for update锁没有关系了。
只要事务过程中修改的表,就会被锁
重新阅读了美团的文章Innodb中的事务隔离级别和锁的关系
1 2 3 4 5 6 7 8 9 10 快照读:就是select select * from table ....; 当前读:特殊的读操作,插入/更新/删除操作,属于当前读,处理的都是当前的数据,需要加锁。 select * from table where ? lock in share mode; select * from table where ? for update; insert; update ; delete; ... RR级别中,事务A在update后加锁,事务B无法插入新数据,这样事务A在update前后读的数据保持一致,避免了幻读。这个锁,就是Gap锁。
之前只理解了这篇文章关于RC和RR模式下MVCC的分析(快照读的部分),没有仔细关注过GAP锁。
现在才发现原来事务中的update会导致锁表,而且锁也分很多种,
record锁,单条索引记录上加锁,如果没有索引会导致锁表
gap锁,在索引记录之间的间隙中加锁,或者是在某一条索引记录之前或者之后加锁,并不包括该索引记录本身。
gap lock的机制主要是解决RR模式下的幻读问题
next-key锁,每个next-key锁由record锁和2个gap锁组成
MySQL InnoDB中唯一索引和非唯一索引时的加锁情况
唯一索引和非唯一索引的范围查询会加多个next-key锁
非唯一索引的精确查询会加一个next-key锁
唯一索引的精确查询会加一个record锁
参考资料
查看事务锁:innodb_trx + innodb_locks + innodb_lock_waits
Innodb中的事务隔离级别和锁的关系
MySQL InnoDB中唯一索引和非唯一索引时的加锁情况