红黑树改为顺序统计红黑树

插入操作修改

• 记录插入过程中向下查找时经过的全部节点，如果插入成功，那么对它们size+1
• 旋转时，更新将变为子节点的原父节点的size，并且更新新父节点的size

删除操作修改

• 找到真正删除的后继节点deleteNextNode后，从其到根节点的size全部-1

deleteNextNode本身也需要size-1，将其视为一个没有size的虚节点，在后续删除的旋转操作中才可以正确进行计算

• 旋转，插入操作时已经修改了这一部分代码

性能测试

测试对象

• RBTreeBase<T, false>为不带rank功能的红黑树
• RBTreeBase<T, true>为带rank功能的红黑树
• SkipList<T>为我自己写的不带rank功能的跳表
• ZSet<T>为redis跳表，带rank功能，除了forward节点以外还带了一个反向的backward节点
• 红黑树天生支持逆序遍历，从这个角度看，完整结构的跳表性能下降会很明显
• 下面的测试也论证了这一点，ZSet<T>作为完整体，各项性能会比SkipList<T>下降了20%左右，使其不堪的性能雪上加霜
• set<T>为标准库红黑树
• unordered_set<T>为标准库哈希表
• 纯粹好奇性能差距放在这里，由于支持的操作不同，不存在比较意义

测试源码

https://github.com/tedcy/algorithm_test/blob/master/order_set/test.cpp

顺序统计(Rank)测试

根据排名上下限遍历（rangeByRank）测试

rangeByRank是getRank和rangeByRank的结合

别人的性能测试

Skip lists, are they really performing as good as Pugh paper claim?

Times have changed a bit since William Pugh wrote his original paper. We see no mention in his paper about the memory hierarchy of the CPU and operating system which has become such a prevalent focus today (now often equally as important as algorithmic complexity).

His input case for benchmarking had a measly 2^16 elements, and hardware back then typically had, at most, 32-bit extended memory addressing available. This made the size of a pointer half the size or smaller than what we’re used to today on 64-bit machines. Meanwhile a string field, e.g., could be just as large, making the ratio between the elements stored in the skip list and the pointers required by a skip node potentially a lot smaller, especially given that we often need a number of pointers per skip node.

C Compilers weren’t so aggressive at optimization back then with respect to things like register allocation and instruction selection. Even average hand-written assembly could often provide a significant benefit in performance. Compiler hints like register and inline actually made a big deal during those times. While this might seem kind of moot since both a balanced BST and skip list implementation would be on equal footing here, optimization of even a basic loop was a more manual process. When optimization is an increasingly manual process, something that is easier to implement is often easier to optimize. Skip lists are often considered to be a lot easier to implement than a balancing tree.

So all of these factors probably had a part in Pugh’s conclusions at the time. Yet times have changed: hardware has changed, operating systems have changed, compilers have changed, more research has been done into these topics, etc.

Today the benefits of locality of reference dominates things to a point where even a linearithmic complexity algorithm can outperform a linear one provided that the former is considerably more cache or page-friendly. Paying close attention to the way the system grabs chunks of memory from upper levels of the memory hierarchy (ex: secondary stage) with slower but bigger memory and down to the little L1 cache line and teeny register is a bigger deal than ever before, and no longer “micro” if you ask me when the benefits can rival algorithmic improvements.

The skip list is potentially crippled here given the considerably larger size of nodes, and just as importantly, the variable size of nodes (which makes them difficult to allocate very efficiently).

跳表的时间复杂度分析

常规测试时间复杂度分析

遍历时间复杂度分析

• 先找到最小节点，这个复杂度是$O(\log n)$

• 然后依次找后继节点，后继节点的查找有两种模式

• 向下：如果存在右子节点，说明下一个节点是右子树的最左节点

• 因此找到右子节点的向下最左节点
• 向上：如果不存在右子节点，说明当前节点已经是某子树的最大几点，因此需要向上遍历，直到找到当前子树作为左子树的父节点

• 因此找到第一个子节点是父节点的左子节点，使得子节点代表的子树成为一颗左子树
• 如图，绿色是遍历时经过节点：

顺带一提，后继的向下向上和前驱的向上向下刚好的逆向的过程

graph TB
subgraph id[ ]
subgraph 后继向下,前驱向上
c3((2))---d3((...))
c3((2))---e3((5))
d3((...))---f3((...))
d3((...))---g3((...))
e3((5))---h3((4))
e3((5))---i3((...))
h3((4))---j3((3))
h3((4))---k3((...))
i3((...))---l3((...))
i3((...))---m3((...))
style c3 fill:#9f9,stroke:#333,stroke-width:4px
style e3 fill:#9f9,stroke:#333,stroke-width:4px
style h3 fill:#9f9,stroke:#333,stroke-width:4px
style j3 fill:#9f9,stroke:#333,stroke-width:4px
end
subgraph 后继向上,前驱向下
a4((7))---e4((2))
e4((2))---h4((...))
e4((2))---i4((4))
h4((...))---j4((...))
h4((...))---k4((...))
i4((4))---l4((...))
i4((4))---m4((6))
a4((7))---b4((...))
b4((...))---c4((...))
b4((...))---d4((...))
style a4 fill:#9f9,stroke:#333,stroke-width:4px
style e4 fill:#9f9,stroke:#333,stroke-width:4px
style i4 fill:#9f9,stroke:#333,stroke-width:4px
style m4 fill:#9f9,stroke:#333,stroke-width:4px
end
end
• 在遍历的过程中，红黑树的每一个节点在向下时经过一次，再向上时又会经过一次，因此这一部分的时间复杂度是$2O(n)$

• 综上$2O(n)$+$O(\log n)$约等于$2O(n)$

插入，删除，查询时间复杂度分析

插桩分析

查询每个元素分析

4个元素{1,2,3,4}的红黑树的如图

graph TB
subgraph id[ ]
subgraph 红黑树
a2((2))---b2((1))
a2((2))---c2((3))
c2((3))---e2((nil))
c2((3))---f2((4))
style f2 fill:#f9f,stroke:#333,stroke-width:4px
end
end

• 红黑树结构如下：
• 具体查询过程如下：

=线代表查询过程

时间复杂度分析

• 如果节点没有上一层指针，那么正向搜索过程就不可能经过上一层指针，所以只能由左节点遍历过来
• 例如上文举例{1,2,3,4}元素的跳表，需要寻找元素3，他只有1层，从元素3第1层回溯，那只能从元素2第1层遍历过来
• 如果节点有上一层指针，那么正向搜索过程就不可能从左节点遍历过来，最差情况下也会从上面的节点下降到当前节点
• 例如上文举例{1,2,3,4}元素的跳表，需要寻找元素3，他只有1层，当前回溯节点的是元素1第1层，如果元素1有上一层，就至少会从上一层查询next失败以后，才会下降到当前层

• 如果节点x无i+1层指针，那么需要向左走，这种情况概率为1-p
• 如果节点x有i+1层指针，那么需要向上走，这种情况概率为p

$C(k)$表示向上攀爬k个层级所需要走过的平均查找路径长度

• $C(0) = 0$
• $C(k) = (1-p)(C(k) + 1) + p(C(k-1) + 1)$
• 化简：
• $C(k) = (C(k) + 1) - (pC(k) - p) + (pC(k-1) + p)$
• $C(k) = C(k) + 1 - pC(k) + pC(k-1)$
• $pC(k) = 1 + pC(k-1)$
• $C(k) = \dfrac{1}{p} + C(k-1)$
• $C(k) = \dfrac{k}{p}$

$p = 0.25$代入，得$(\log_4n - 1) * 4$，也就是$(\log_2n - 1) * 2$

总结

redis为什么使用跳表？

antirez on March 6, 2010 | root | parent | next [–]

There are a few reasons:

1. They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.

2. A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.

3. They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

About the Append Only durability & speed, I don’t think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.

About threads: our experience shows that Redis is mostly I/O bound. I’m using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the “Redis Cluster” solution that I plan to develop in the future.