实现HTTP长连接

发表于 2024-06-23 更新于 2025-07-30 分类于 HTTP 阅读次数：本文字数： 27k 阅读时长 ≈ 25 分钟

业务有个需求是为taf的HTTP客户端实现长连接

我5年前在前公司就写过cpp的HTTP连接池版本sheep/HTTP_client实现广告项目的RTB

当初的那个实现很粗糙，凑活用就行，但是现在的这个实现是给全公司人用的，首先摆在面前的问题就是实现HTTP长连接的哪个版本？

个人倾向于1.1版本，对服务端要求低比较通用，最后确实选择了1.1版本，选定了版本随之而来又有这些问题：

是否需要实现Pipelining
是否需要解析协议上的任何控制字段
需要哪些配置字段？默认值是什么？

例如连接池的默认的空闲连接数量，这种默认设置最怕大佬问到底，为什么设为5？为什么设为10？

有现有方案就可以直接转移仇恨：为什么golang设为x？为什么nginx设为x？
以域名为粒度实现每个域名一个连接池，还是以ip为粒度

当初是实现了一个基于ip的Client用来实现rpc框架（每个ip单连接多路复用），基于Client又封装了ClientPool给redis, HTTP, mysql用（每个ip连接池）

现在看着不太对劲，HTTP的长连接应该以域名为粒度吧？
稀疏链接问题梳理

这个放到taf框架的博文里面去了，因为初版不打算为HTTP客户端实现太多功能

这些问题将根据RFC标准，golang，nginx实现来进行，其中golang的实现比较简单，可以对其进行源码分析

实现HTTP长连接的哪个版本

HTTP的发展历史

HTTP/1.0的RFC 1945: Hypertext Transfer Protocol -- HTTP/1.0出自1996年

在第二年就出现了HTTP/1.1的RFC 2068: Hypertext Transfer Protocol -- HTTP/1.1，1999年被RFC2616: Hypertext Transfer Protocol -- HTTP/1.1取代了，这也是最终版本

隔了15年都没有后续发展，直到2015年出现了HTTP/2.0的RFC 7540: Hypertext Transfer Protocol Version 2 (HTTP/2)

在22年6月同时发布了连号的RFC标准RFC9113: HTTP/2和RFC 9114: HTTP/3，前者废弃了15年的HTTP/2.0标准

HTTP/1.1的长连接

由于HTTP/1.0的短链接非常低效，因此HTTP/1.1引入了长连接，支持Pipelining

RFC文档关于长连接的介绍：

8 Connections

8.1 Persistent Connections

8.1.1 Purpose

Prior to persistent connections, a separate TCP connection was established to fetch each URL, increasing the load on HTTP servers and causing congestion on the Internet. The use of inline images and other associated data often require a client to make multiple requests of the same server in a short amount of time. Analysis of these performance problems and results from a prototype implementation are available [26] [30]. Implementation experience and measurements of actual HTTP/1.1 (RFC 2068) implementations show good results [39]. Alternatives have also been explored, for example,T/TCP [27].

在持久连接之前，每个URL的获取都需要单独建立一个TCP连接，这增加了HTTP服务器的负载并导致互联网的拥堵。使用内嵌图像和其他相关数据时，客户端通常需要在短时间内向同一服务器发出多个请求。关于这些性能问题的分析和原型实现的结果可参考 [26] [30]。HTTP/1.1（RFC 2068）实现的实践经验和测量结果显示了良好的效果 [39]。也探讨了其他替代方案，例如，T/TCP [27]。

Persistent HTTP connections have a number of advantages:

持久HTTP连接有以下几个优点：

By opening and closing fewer TCP connections, CPU time is saved in routers and hosts (clients, servers, proxies, gateways, tunnels, or caches), and memory used for TCP protocol control blocks can be saved in hosts.

打开和关闭较少的TCP连接，在路由器和主机（客户端、服务器、代理、网关、隧道或缓存）上节省CPU时间，并在主机上节省用于TCP协议控制块的内存。

HTTP requests and responses can be pipelined on a connection.Pipelining allows a client to make multiple requests without waiting for each response, allowing a single TCP connection to be used much more efficiently, with much lower elapsed time.

HTTP请求和响应可以在一个连接上进行管道化。管道化允许客户端在不等待每个响应的情况下发出多个请求，使单个TCP连接能够更高效地使用，显著降低总消耗时间。

Network congestion is reduced by reducing the number of packets caused by TCP opens, and by allowing TCP sufficient time to determine the congestion state of the network.

通过减少TCP连接建立引起的数据包数量，降低网络拥堵，并使TCP有足够的时间确定网络的拥堵状态。

Latency on subsequent requests is reduced since there is no time spent in TCP's connection opening handshake.

减少后续请求的延迟，因为不必花时间在TCP的连接建立握手上。

HTTP can evolve more gracefully, since errors can be reported without the penalty of closing the TCP connection. Clients using future versions of HTTP might optimistically try a new feature, but if communicating with an older server, retry with old semantics after an error is reported.

HTTP可以更平稳地演进，因为错误可以在不关闭TCP连接的情况下报告。使用未来版本HTTP的客户端可以乐观地尝试新特性，如果与老旧服务器通信时发生错误，可以根据错误报告后再使用旧语义重试。

关于Pipelining的介绍

8.1.2.2 Pipelining

A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received.

支持持久连接的客户端可以“管道化”其请求（即，在不等待每个响应的情况下发送多个请求）。服务器必须按照收到请求的顺序发送这些请求的响应。

Pipelining的Wiki介绍图：

HTTP/1.1的优点官方文档说的很清楚了，在引入了Pipelining后似乎无懈可击，但是15年后就被HTTP/2.0狠狠打脸了

HTTP/2.0的长连接

RFC文档的Introduction:

The performance of applications using the Hypertext Transfer Protocol (HTTP, [HTTP]) is linked to how each version of HTTP uses the underlying transport, and the conditions under which the transport operates.

使用超文本传输协议（HTTP，[HTTP]）的应用程序性能与各版本HTTP如何利用底层传输方式以及传输操作的条件有关。

Making multiple concurrent requests can reduce latency and improve application performance. HTTP/1.0 allowed only one request to be outstanding at a time on a given TCP [TCP] connection. HTTP/1.1 [HTTP/1.1] added request pipelining, but this only partially addressed request concurrency and still suffers from application-layer head-of-line blocking. Therefore, HTTP/1.0 and HTTP/1.1 clients use multiple connections to a server to make concurrent requests.

多个并发请求可能会降低延迟并提高应用程序的性能。HTTP/1.0只允许在给定的TCP [TCP]连接上一次处理一个未完成的请求。HTTP/1.1 [HTTP/1.1]增加了请求流水线化，但这种解决并发请求的方式只解决了一部分问题，应用程序层的队头阻塞问题仍然存在。因此，HTTP/1.0 和 HTTP/1.1客户端要实现并发请求，需要与服务器建立多个请求。

Furthermore, HTTP fields are often repetitive and verbose, causing unnecessary network traffic as well as causing the initial TCP congestion window to quickly fill. This can result in excessive latency when multiple requests are made on a new TCP connection.

此外，HTTP字段经常会重复而且冗长，增加了不必要的网络流量，也导致初始TCP拥塞窗口迅速填满。当在新的TCP连接上发出多个请求时，可能会出现过大的延迟。

HTTP/2 addresses these issues by defining an optimized mapping of HTTP's semantics to an underlying connection. Specifically, it allows interleaving of messages on the same connection and uses an efficient coding for HTTP fields. It also allows prioritization of requests, letting more important requests complete more quickly, further improving performance.

HTTP/2通过定义HTTP语义优化映射到底层连接来解决这些问题。具体来说，它允许在同一连接上交错消息，并使用高效的编码方法来处理HTTP字段。它还允许对请求进行优先级排序，从而让更重要的请求更快地完成，进一步提高性能。

The resulting protocol is more friendly to the network because fewer TCP connections can be used in comparison to HTTP/1.x. This means less competition with other flows and longer-lived connections, which in turn lead to better utilization of available network capacity. Note, however, that TCP head-of-line blocking is not addressed by this protocol.

结果表明，这种协议对网络更友好，因为与HTTP/1.x相比，可以使用更少的TCP连接。这意味着与其他流量的竞争较少，并且连接的生命周期更长，从而更好地利用了可用网络容量。但是，需要注意的是，该协议并没有解决TCP的队头阻塞问题。

Finally, HTTP/2 also enables more efficient processing of messages through use of binary message framing.

最后，HTTP/2通过使用二进制消息帧，使得消息处理更加高效。

HTTP/1.1的缺点

可以看到，HTTP/2.0发现HTTP/1.1的Pipelining由于没有引入Req和Rsp的对应关系，所以只能按顺序来返回Rsp，假如有一个Rsp Blocking了，那么会导致队头阻塞（head-of-line blocking）卡住所有的Rsp

所以HTTP/1.1还是需要建立多条连接，在新建连接的时候，由于TCP的慢启动设计，很容易会卡住，所以不完美。

HTTP/2.0多路复用

HTTP/2.0引入了Stream概念，每个Req和Rsp使用StreamID进行映射，通过这种方式，可以对同一条TCP连接多路复用

一个Req拆分成多个Frame发送，Rsp同理，Frame是二进制数据包，结构如下：

+-----------------------------------------------+
|                 Length (24)                   |
+---------------+---------------+---------------+
|   Type (8)    |   Flags (8)   |
+-+-------------+---------------+-------------------------------+
|R|                 Stream Identifier (31)                      |
+=+=============================================================+
|                   Frame Payload (0...)                      ...
+---------------------------------------------------------------+

客户端发送示意图如下（这里全部为奇数流，因为客户端产生的流，其ID为奇数，服务端产生的流，其ID为偶数）：

HTTP/2.0流量控制

HTTP层的流量控制相对TCP来说更为简单，只设计了类似于TCP的接收窗口，发送端最大只能发送接收窗口的大小，当流量控制窗口大小变为 0 的时候，发送方必须停止发送数据，直到接收到响应消息，告诉你其已将窗口大小更新为非 0 值。

我想了一下这里为什么没有设计出一个TCP的发送窗口：

在TCP层，如果缺少发送窗口，若接收端挂掉了，发送端可能无法及时得知，并可能继续发送大量数据，加重了网络拥塞。TCP协议可以通过一系列的机制如超时重传、连续确认等，检查和处理此类状况。所以在TCP层级，发送窗口和接收窗口共同作用，一方面防止发送端过载，另一方面防止网络拥塞。

而在HTTP/2，由于它是基于TCP协议的，网络层面的连通性、拥塞等问题会被TCP处理。因此，HTTP/2主要通过接收窗口来避免处理HTTP请求的应用层过载，为每个Stream进行更精准的控制，而并不需要另外定义一个发送窗口。

其他诸如流优先级，首部压缩，服务端推送之类的和长连接关系没有那么大，略过不提

HTTP/3.0的长连接

Introduction

HTTP semantics ([HTTP]) are used for a broad range of services on the Internet. These semantics have most commonly been used with HTTP/1.1 and HTTP/2. HTTP/1.1 has been used over a variety of transport and session layers, while HTTP/2 has been used primarily with TLS over TCP. HTTP/3 supports the same semantics over a new transport protocol: QUIC.

HTTP 语义被广泛用于 Internet 上的服务。这些语义最常用于 HTTP/1.1 和 HTTP/2。 HTTP/1.1 已被用于各种传输层和会话层之上，而 HTTP/2 主要结合 TCP 上的 TLS 来使用。 HTTP/3 通过新的传输协议 QUIC 以支持相同的语义。

1.1. Prior Versions of HTTP

HTTP/1.1 ([HTTP/1.1]) uses whitespace-delimited text fields to convey HTTP messages. While these exchanges are human readable, using whitespace for message formatting leads to parsing complexity and excessive tolerance of variant behavior.

HTTP/1.1 使用空格分隔的文本字段来传达 HTTP 消息。虽然这些交换是人类可读的，但使用空格进行消息格式化会导致解析复杂性和对变体行为的过度容忍。

Because HTTP/1.1 does not include a multiplexing layer, multiple TCP connections are often used to service requests in parallel. However, that has a negative impact on congestion control and network efficiency, since TCP does not share congestion control across multiple connections.

由于 HTTP/1.1 不包含多路复用层，因此通常使用多个 TCP 连接来并行处理请求。然而，这对拥塞控制和网络效率有负面影响，因为 TCP 不跨多个连接共享拥塞控制。

HTTP/2 ([HTTP/2]) introduced a binary framing and multiplexing layer to improve latency without modifying the transport layer. However, because the parallel nature of HTTP/2's multiplexing is not visible to TCP's loss recovery mechanisms, a lost or reordered packet causes all active transactions to experience a stall regardless of whether that transaction was directly impacted by the lost packet.

HTTP/2 引入了二进制帧和多路复用层，以在不修改传输层的情况下改善延迟。但是，由于 HTTP/2 多路复用的并行特性对 TCP 的丢失恢复机制不可见，因此丢失或重新排序的数据包会导致所有活动事务都经历停顿，无论该事务是否受到丢失数据包的直接影响。

HTTP/2.0的缺点

和HTTP/1.1不同的是，HTTP/2.0的文档已经预见到了它的缺点，也就是TCP的队头阻塞问题

在TCP协议中，由于所有数据都在同一个连接上，如果一个数据包丢失了，那么后续的所有数据都必须等待这个数据包重传后才能被处理

而在QUIC协议中，每一个请求或响应都作为独立的流存在。每个流都有自己独立的序列号，因此，即使在一个流中的包丢失了，也不会影响到其他流中的包的传输和处理。这有效地解决了TCP中的队头阻塞问题。

另外，QUIC采用了前向纠错（FEC）技术来进一步减少因为丢包造成的延迟。FEC可以根据已收到的包直接重建丢失的包，而无需等待重传，由此可以极大地减小因为丢包所引起的延迟。

基于QUIC的HTTP/3.0在HTTP/2.0的基础上微调，即可达成当前最完美的长连接机制

选定方案：HTTP/1.1

HTTP/2.0作为一个中间版本，没有彻底解决Pipelining的问题，从一定意义上来说，反而更低效了

因为HTTP/1.1可以不开启Pipelining来避免队头阻塞问题，但是HTTP/2.0在网络质量不佳时一定会遇到TCP的队头阻塞问题

因此实现长链接只有两个方案，HTTP/1.1和HTTP/3.0，考虑到当前大部分服务端都不支持HTTP/3.0，因此我只需要实现HTTP/1.1的长链接即可

是否需要实现Pipelining

刚才已经简述过HTTP/1.1的Pipelining的缺点了，这里再详细分析一下

简陋的设计

Pipelining在文档中的全部叙述只有一小段：

A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received.

支持持久连接的客户端可以“管道化”其请求（即，在不等待每个响应的情况下发送多个请求）。服务器必须按照收到请求的顺序发送这些请求的响应。

Clients which assume persistent connections and pipeline immediately after connection establishment SHOULD be prepared to retry their connection if the first pipelined attempt fails. If a client does such a retry, it MUST NOT pipeline before it knows the connection is persistent. Clients MUST also be prepared to resend their requests if the server closes the connection before sending all of the corresponding responses.

如果第一次管道连接尝试失败，则在连接建立后立即采用持久连接和管道连接的客户端应准备重试其连接。如果客户机进行了这样的重试，则在知道连接是持久的之前，它不能进行管道传输。如果服务器在发送所有相应响应之前关闭了连接，则客户端还必须准备好重新发送其请求。

Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods (see section 9.1.2). Otherwise, a premature termination of the transport connection could lead to indeterminate results. A client wishing to send a non-idempotent request SHOULD wait to send that request until it has received the response status for the previous request.

客户端不应使用非幂等方法或非幂等方法序列来处理请求（参见第9.1.2节）。否则，过早终止传输连接可能会导致不确定的结果。希望发送非幂等请求的客户端应该等待发送该请求，直到它收到前一个请求的响应状态。

看完这一小丁点描述，我有很多实现上的问题：

如何确定服务端是否支持Pipelining？

长连接的设计是有协议定义的，当客户端尝试使用HTTP/1.1时，发送
1
Connection: keep-alive
服务端不支持的话，会明确的返回
1
Connection: close
怎么到了Pipelining的设计上如此儿戏？只要有一个支持协议，那么客户端就可以从容的在不支持的情况退避到纯连接池方案上

HTTP/1.1的设计只能让客户端根据回包情况来猜测服务端是否支持，在TCP开启了Nagle算法大情况下根本无从猜测

我愿称其为计算机届黑暗森林法则的猜疑链，结果就是“不要回答（Pipelining）”
应该一次性输送多少个请求？

也就是流量控制在哪里？显然这个设计也只能靠客户端自己去猜

RFC文档的大部分问题还得看mozilla的HTTP/1.1 Pipelining FAQ才能得到一点点的解惑（因为大部分都是没细节的废话）

外网环境下的阻力：HTTP代理

比如在proxy上，对8个图片有缓存，来了10个图片请求，因为2个图片没缓存，就得把整体10个请求全部发给服务端处理，8个缓存就浪费掉了

另外在多层HTTP代理下，只要有一个HTTP代理不支持Pipelining，整条链路都会不可用

内网环境下的阻力：相对连接池真的高效么？

服务端对于客户端连接池是有IO多路复用的，C10K在HTTP/1.1的当年还是个著名问题，在当今已经不是什么大问题了

相比之下Pipelining由于队头阻塞问题，显然只能是连接池方案在连接上限的时候，用来锦上添花的方案

支持程度

根据HTTP_pipelining

Internet Explorer 11 does not support pipelining

Mozilla browsers (such as Mozilla Firefox, SeaMonkey and Camino) used to support pipelining; however, it was removed in Firefox 54.

Google Chrome previously supported pipelining, but it has been disabled due to bugs and problems with poorly behaving servers

libcurl previously had limited support for pipelining using the CURLMOPT_PIPELINING option,[30] but this support was removed in version 7.65.0

可以看到，大部分实现由于我提到的上述问题，都对Pipelining说了NO

是否需要额外解析协议上的任何字段

根据RFC文档：

8.1.2.1 Negotiation

An HTTP/1.1 server MAY assume that a HTTP/1.1 client intends to maintain a persistent connection unless a Connection header including the connection-token "close" was sent in the request. If the server chooses to close the connection immediately after sending the response, it SHOULD send a Connection header including the connection-token close.

一个HTTP/1.1的服务器可以假设一个HTTP/1.1的客户端希望保持一个持续连接，除非在请求中发送了包含连接令牌 "close" 的Connection头。如果服务器选择在发送响应后立即关闭连接，它应该发送一个包含连接令牌 "close" 的Connection头。

An HTTP/1.1 client MAY expect a connection to remain open, but would decide to keep it open based on whether the response from a server contains a Connection header with the connection-token close. In case the client does not want to maintain a connection for more than that request, it SHOULD send a Connection header including the connection-token close.

一个HTTP/1.1的客户端可以期望一个连接保持开启，但它是否选择保持连接开启，取决于服务器的响应中是否包含了一个Connection头，且该头包含了连接令牌 "close"。在客户端不想维护超过那个请求的连接时，它应该发送一个包含连接令牌 "close" 的Connection头。

If either the client or the server sends the close token in the Connection header, that request becomes the last one for the connection.

如果客户端或服务器在Connection头中发送了close令牌，那么那个请求就成为了该连接的最后一个请求。

Clients and servers SHOULD NOT assume that a persistent connection is maintained for HTTP versions less than 1.1 unless it is explicitly signaled. See section 19.6.2 for more information on backward compatibility with HTTP/1.0 clients.

客户端和服务器不应假设对于HTTP 1.1以下版本的连接会被持久保持，除非该连接被明确标记。关于与HTTP/1.0客户端的向后兼容性的更多信息，请参见19.6.2节。

In order to remain persistent, all messages on the connection MUST have a self-defined message length (i.e., one not defined by closure of the connection), as described in section 4.4.

为了保持连接持续，连接上的所有消息都必须有一个自定义的消息长度（即，一个不由关闭连接所定义的），如4.4节所述。

首先，需要保证通过Header中Content-Length来读取正确的包长度即可，而这是HTTP/1.0就需要实现的Header，问题不大

其次，服务器和客户端都可以通过Connection: close的Header来声名自己不再支持长连接，并且由自己来关闭连接

~~因此协议上无需额外对字段做任何解析，只需要监听TCP的FIN事件从而close连接即可~~

25/7/30更新

实践中发现如果客户端不解析Connection: close，在服务端没有立刻断开连接的情况下（例如nginx），会有问题

在第10个tcp包，nginx返回了http包

在第11个tcp包，客户端响应了ack以后

在第12个服务端才发送FIN关闭连接

如果有问题的客户端不在第10个包就解析出Connection: close，得知服务端即将关闭连接

而是第12个包收到之前，又复用连接发送了包，会导致这一条连接直接被reset掉（因为服务端已经关闭连接了）

以域名为粒度实现每个域名一个连接池，还是以ip为粒度

golang 1.21源码分析

1 2	~ go version go version go1.21.0 linux/amd64

类图

客户端 net/http.Client 是级别较高的抽象，它提供了 HTTP 的一些细节，包括 Cookies 和重定向

而 net/http.Transport 会处理 HTTP/HTTPS 协议的底层实现细节，其中会包含连接重用、构建请求以及发送请求等功能。

classDiagram
direction TB
class Request {
    +Method string //例如"GET"
    +URL *url.URL //url的封装
    +Proto      string // 例如"HTTP/1.0"
    +Header Header
    +Body io.ReadCloser //二进制流
    +Response *Response
}
class Response {
    
}
class RoundTripper {
    <<interface>>
    RoundTrip(*Request) (*Response, error) 发Req，收Rsp，正好是往返旅行（RoundTrip）
}
class Client {
    -send(req *Request, deadline time.Time) (resp *Response, didTimeout func() bool, err error) tcp发包入口
    -do(req *Request) (*Response, error) 做一些http的解析，设置超时，调用send
    +Do(req *Request) (*Response, error) http调用对外入口，无逻辑直接调用do
    +Transport RoundTripper //连接池
    +Timeout time.Duration //超时配置
}
class Transport {
    +RoundTrip(req *Request) (*Response, error) 没逻辑，直接调用roundTrip
    -roundTrip(req *Request) (*Response, error)
    -getConn(treq *transportRequest, cm connectMethod) (pc *persistConn, err error) 连接池核心逻辑获取连接
    -idleConn map[connectMethodKey][]*persistConn //以connectMethodKey为粒度，存储不同connectMethodKey的连接池
}
class persistConn {
    -roundTrip(req *transportRequest) (resp *Response, err error) tcp层的roundTrip实现
    -writeLoop() 发包由writeLoop处理
    -readLoop() 收包由readLoop处理
}
class connectMethod {
    //由Request的一部分构造而来，连接池的“粒度”
    proxyURL     *url.URL
    targetScheme string // "http" or "https"
    targetAddr string
    onlyH1     bool
}
class transportRequest {
    +*Request //组合继承了Request
    -trace     *httptrace.ClientTrace
    -cancelKey cancelKey
}
Client --> Transport
Client --> Request
Client --> Response
Request --> Response
Transport ..|> RoundTripper
persistConn --* Transport
Transport --> transportRequest
Transport --> connectMethod
transportRequest ..|> Request

流程图

sequenceDiagram
participant 用户
participant Client
participant Transport
participant queueForIdleConn as queueForIdleConn函数
participant queueForDial as queueForDial函数
participant persistConn_roundTrip as persistConn::roundTrip函数
用户 ->>+ Client : Do(req)
Client ->>+ Client : do(Req)
Client ->>+ Client : 根据Timeout设置初始化deadline<br>send(req, deadline)
Client ->>+ Transport : rt.RoundTrip(req)
Transport ->>+ Transport : roundTrip(req)
alt 如果 req.URL.Scheme == "https" :
Transport ->>+ Transport : https注册了单独的RoundTrip类型http2Transport<br>altRT := t.alternateRoundTripper(req)<br>altRT.RoundTrip(req)
Transport ->>- Transport : return resp
Transport ->> 用户 : 提前返回给用户：return resp
else 如果 req.URL.Scheme != "https"
Transport ->>+ Transport : treq和cm都是基于req的信息的封装，详见类图<br>getConn(tReq, cm)
Transport ->>+ Transport : w := &wantConn{}
par 并行 获取一个空闲连接
Transport ->>+ queueForIdleConn : queueForIdleConn(w)
queueForIdleConn ->>+ queueForIdleConn : list, ok := t.idleConn[w.key]<br>每个idle连接是append进list的，这里获取最近刚使用的连接
loop 遍历list中的pconn := list[len(list)-1], 每一轮后list = list[:len(list)-1]
queueForIdleConn ->> queueForIdleConn : 传递给wantConn:<br>w.tryDeliver(pconn)<br>如果传递成功，那么t.idleLRU.remove(pconn)
end
queueForIdleConn ->>- queueForIdleConn : return
queueForIdleConn ->>+ queueForIdleConn : 遍历完没有可用连接，挂在等待列表上t.idleConnWait[w.key].pushBack(w)
queueForIdleConn ->>- queueForIdleConn : return
queueForIdleConn ->>- Transport : return
and 并行 新建一个连接
Transport ->>+ queueForDial : queueForDial(w)
queueForDial ->>+ queueForDial : 创建新连接，并且异步通知w<br>go dialConnFor(w)
queueForDial ->>- Transport : return
and 在w.ready上等待可用连接 
Transport ->>- Transport : select<br>case <-w.ready:<br>return w.pc, w.err
end
Transport ->>+ persistConn_roundTrip : roundTrip(treq)
persistConn_roundTrip ->>+ persistConn_roundTrip : 发送给persistConn的异步写协程writeLoop()<br>writech <- writeRequest{req, writeErrCh, continueCh}
persistConn_roundTrip ->>- persistConn_roundTrip : 将读取resp的管道resc发送给persistConn的异步读协程readLoop()，让其写入resp<br>resc := make(chan responseAndError)<br>pc.reqch <- requestAndChan{request, resc}<br>case re := <-resc:
persistConn_roundTrip ->>- Transport : return re.res
Transport ->>- Transport : return resp
Transport ->>- Client : return resp
Client ->>- Client : return resp
Client ->>- Client : return resp
Client ->>- 用户: return resp
end

小结

以什么为粒度实现连接池，就在connectMethod中

type connectMethod struct {
    _            incomparable
    proxyURL     *url.URL // nil for no proxy, else full proxy URL
    targetScheme string   // "http" or "https"
    // If proxyURL specifies an http or https proxy, and targetScheme is http (not https),
    // then targetAddr is not included in the connect method key, because the socket can
    // be reused for different targetAddr values.
    targetAddr string
    onlyH1     bool // whether to disable HTTP/2 and force HTTP/1
}

这是一个代理地址，协议类型，域名，是否HTTP/1的标识符

但是有一行注释特意说明了，如果指定了代理，又没开https，那么域名将不起作用：同一个代理的不同域名会使用同一个连接池

很合理，学到了，就按这个来实现

需要哪些配置字段？默认值是什么？

golang

golang使用DefaultTransport作为默认的连接池配置

var DefaultTransport RoundTripper = &Transport{
    Proxy: ProxyFromEnvironment,
    DialContext: defaultTransportDialContext(&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second,
    }),
    ForceAttemptHTTP2:     true,
    MaxIdleConns:          100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   10 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}

所以支持一个空闲连接数量字段，默认为100即可

那么淘汰策略是什么呢？

刚才的源码分析中只有从连接池中获取连接，却没有放回去的函数，这部分逻辑在readLoop()中

当读取完整一个resp后，会调用tryPutIdleConn函数来把连接放回连接池，略掉一些无关逻辑，淘汰策略如下：

func (t *Transport) tryPutIdleConn(pconn *persistConn) error {
    t.idleConn[key] = append(idles, pconn)
    t.idleLRU.add(pconn) //将pconn放进LRU结构中链表的头部，表示是最近使用的，并且写入哈希表
    if t.MaxIdleConns != 0 && t.idleLRU.len() > t.MaxIdleConns {
        oldest := t.idleLRU.removeOldest() //从LRU结构中链表的尾部淘汰数据，并且删除哈希表存储的list节点
        oldest.close(errTooManyIdle) //关闭连接
        t.removeIdleConnLocked(oldest) //从idleConn的list中删掉这个连接
    }
}

nginx

空闲连接数量字段在nginx中为upstream的keepalive配置，默认不开启

Syntax:    keepalive connections;
Default:    —
Context:    upstream
This directive appeared in version 1.1.4.

The connections parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed.

connections参数设置了每个工作进程缓存中保留到上游服务器的空闲keepalive连接的最大数量。当超过这个数量时，使用LRU淘汰连接。

It should be particularly noted that the keepalive directive does not limit the total number of connections to upstream servers that an nginx worker process can open. The connections parameter should be set to a number small enough to let upstream servers process new incoming connections as well.

特别需要注意的是，keepalive指令不限制nginx工作进程可以打开到上游服务器的总连接数。connections参数应设置为一个足够小的数值，以便让上游服务器处理新的进入连接。

小结

空闲连接数量字段是刚需，它代表了这个连接池的最大并发数需求，可以根据qps * 耗时算出

细节上golang和nginx都使用LRU淘汰超过数量的连接，结合golang源码中特意尽量使用最近刚用过的连接

我觉得这种策略是防止太多空闲连接的时候，太过均衡，导致全部连接都很稀疏，容易被对端关闭

其他配置可以看情况再抄回来，初版不希望实现的过于复杂

客户端如果超时了，这个长连接还能放回连接池么？

我以前的设计是一旦超时就关闭连接，因为这样简单，放回连接池需要考虑很多额外的事情

但是作为一个通用客户端，业务很有可能设置了过短的超时时间，如果直接断开，会不会导致长连接退化成短链接？

特别是在突增峰值场景下，本来就很容易出现超时，这时候退化成短链接会造成雪崩吧？

根据刚才流程图可以看到，超时是在client.go中设置的

func setRequestCancel(req *Request, rt RoundTripper, deadline time.Time) (stopTimer func(), didTimeout func() bool) {
    ...
    req.ctx, cancelCtx = context.WithDeadline(oldCtx, deadline)
    ...
}

既然是给req的ctx设置deadline来超时的，那么看下哪里监听了req的Cancel事件即可

在长连接的readLoop中：

func (pc *persistConn) readLoop() {
    defer func() { //在readLoop函数退出的时候触发，关闭连接，并且删除空闲连接
        pc.close(closeErr)
        pc.t.removeIdleConn(pc)
    }()
    
    ...
    
    alive := true
    for alive {
        ... //处理各种读事件
        case <-rc.req.Cancel:
            alive = false //发现超时，alive为false导致退出循环，从而结束readLoop函数，触发一开始的defer
            pc.t.CancelRequest(rc.req)
        case <-rc.req.Context().Done():
            alive = false
            pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
    }
}

既然go也是这么实现的，那么业务就得小心使用超时参数咯~

长连接下的拆包

在HTTP/1.0中，当服务端关闭连接时，只要Header解析完毕，那么就代表完整的收到了一个包

在HTTP/1.1的长连接中，由于连接是持续的，因此要求Header中的Content-Length来拆分请求或者回包

RPC文档还表明，HTTP/1.1中希望断开连接的一方，需要在Header中返回Connection: close

由于taf的http解析代码一直是向短连接服务的，当服务端的Header既没有Content-Length也没有Connection: close

需要认为这个包不可用而报错么？

我拿go的http客户端和tcp服务端构造了一下，看下go的行为

chunk

chunk模式是为了想发多少就发多少的，所以必定没有Content-Length的

但是chunk有一个自己的结束符0\r\n\r\n所以我感觉应该没问题

服务端：

package main

import (
    "bufio"
    "fmt"
    "net"
    "strconv"
)

func handleConnection(conn net.Conn) {
    defer conn.Close()

    // Read request
    reader := bufio.NewReader(conn)
    _, err := reader.ReadString('\n') // Read the request line
    if err != nil {
        fmt.Println("Error reading request:", err)
        return
    }

    // Write response header
    header := "HTTP/1.1 200 OK\r\n" +
              "Content-Type: text/plain\r\n" +
              "Transfer-Encoding: chunked\r\n" +
              "\r\n"

    _, err = conn.Write([]byte(header))
    if err != nil {
        fmt.Println("Error writing response header:", err)
        return
    }

    // Write chunked response body
    chunks := []string{
        "This is the first chunk.",
        "This is the second chunk.",
        "This is the third chunk.",
    }

    for _, chunk := range chunks {
        chunkLen := strconv.FormatInt(int64(len(chunk)), 16) + "\r\n"
        _, err = conn.Write([]byte(chunkLen))
        if err != nil {
            fmt.Println("Error writing chunk length:", err)
            return
        }

        _, err = conn.Write([]byte(chunk + "\r\n"))
        if err != nil {
            fmt.Println("Error writing chunk data:", err)
            return
        }
    }

    // Write last chunk
    _, err = conn.Write([]byte("0\r\n\r\n"))
    if err != nil {
        fmt.Println("Error writing last chunk:", err)
    }
}

func main() {
    listener, err := net.Listen("tcp", ":8080")
    if err != nil {
        fmt.Println("Error starting server:", err)
        return
    }
    defer listener.Close()

    fmt.Println("Server started on port 8080")

    for {
        conn, err := listener.Accept()
        if err != nil {
            fmt.Println("Error accepting connection:", err)
            continue
        }

        go handleConnection(conn)
    }
}

执行：

1
2
3

~ go run client.go 
Response:
This is the first chunk.This is the second chunk.This is the third chunk.

没有报错

非chunk

服务端只需要修改handleConnection：

func handleConnection(conn net.Conn) {
    defer conn.Close()
    // Read request
    reader := bufio.NewReader(conn)
    _, err := reader.ReadString('\n') // Read the request line
    if err != nil {
        fmt.Println("Error reading request:", err)
        return
    }

    // Write response
    response := "HTTP/1.1 200 OK\r\n" +
                "Content-Type: text/plain\r\n" +
                "\r\n" +
                "This is a response without Content-Length and Connection: close"

    _, err = conn.Write([]byte(response))
    if err != nil {
        fmt.Println("Error writing response:", err)
    }
}

还是用刚才的客户端执行：

1
2
3

~ go run client.go
Response:
This is a response without Content-Length and Connection: close

很意外的正确拿到了body内容，看来go不认为这是一个错误，那么我也按这个行为来实现

附录

伪代码设计稿

class Conn {
    Key key;
    int id;
    ReqPtr reqPtr;
    Socket fd;
}
//reqPtr的生命周期由timeoutQueue保管
//Conn的生命周期由Pool保管
class Pool {
    hashmap<Key, vector<int>> pool_; 
    map<int, conn> conns_;
    
    //注册到epoll的idle处理中
    asyncGetConn(Key key, ReqPtr ptr, onCreateConn) {
        get conn from pool_ with key
        if (conn) {
            conn.reqPtr = reqPtr
            return;
        }
        conn = onCreateConn()
        conn.reqPtr = reqPtr
        conn.key = key
        insert int conns_;
    }

    tryPutConn(conn*) {
        if pool_[conn->key].size >= idleSize {
            return false
        }
        push_back to pool_[conn->key]
        return true
    }
    
    remove(conn*) {
        remove conn from pool_[conn->key]
        remove conn.id from conns_
    }
}

void close(conn*) {
    if (conn->reqPtr != nil) remove conn.reqPtr from timeoutQueue
}

void epollLoop {
    get connPtr from epoll data
    if conn->reqPtr != nil {
        if EpollOut {
            doResponse() //any problem will close(conn)
            if doResponse Finish {
                remove conn->reqPtr from timeoutQueue
                if (!pool_.tryPutConn(conn)) close(conn)
            }
        }
        if EpollIn {
            doRequest() //any problem will close(conn)
        }
    }else {
        if EpollIn {
            check recv <= 0 close
            pool_.remove(conn)
        }
    }
}

void do() {
    auto reqPtr = create Req
    add reqPtr into timeoutQueue //when close conn will release it
    pool_.asyncGetConn(reqPtr, []() {
        auto conn = create Conn
        conn.doConnect()
        addEpollEvent(&conn)
        return conn
    })
}

req的生命周期管理

req存在两个分支：

要么没拿到连接，就超时了
- 删除了req，此后再拿到req都不用发了（下图1）
当req获取了连接后，conn存储了reqPtr的shared指针而不是weak指针，看似conn和timeoutQueue一起持有了reqPtr
- 确实，在还未到timeoutQueue超时的时候，只要conn还存在，reqPtr的生命周期就一直存在，直到conn被服务端关闭或者conn对req的处理完毕（包含收发包失败或者解析成功）（下图2和3）
- 但是一旦timeoutQueue超时，会通过conn->onCancel(reqPtr)去conn删除对reqPtr的持有，从而释放对reqPtr的引用计数（下图4）

flowchart LR
create_req([创建req]) --> get_conn{拿连接，conn持有生命周期}
get_conn --（1）--> 没拿到
没拿到 --> timeout1[超时]
timeout1 --timeoutQueue.erase()--> remove_timeout_queue[timeoutQueue释放生命周期]
remove_timeout_queue --> release_req([释放req])
get_conn --> 拿到
拿到 --> 读写{读写}
读写 --（4）--> timeout2[超时]
timeout2 --timeoutQueue.erase()--> remove_timeout_queue1[timeoutQueue释放生命周期]
remove_timeout_queue1 --conn->onCancel(reqPtr)--> stop_conn1[conn释放生命周期]
stop_conn1 --> release_req([释放req])
读写 --（2）--> stop_conn2[conn被服务端关闭，conn释放生命周期]
stop_conn2 --timeoutQueue.erase()--> remove_timeout_queue
读写 --（3）--> return_conn[conn对req的处理完毕，conn释放生命周期]
return_conn --timeoutQueue.erase()--> remove_timeout_queue

最后，要对不返回数据做一个兜底：

返回数据分被动超时数据和主动返回数据
主动返回数据以后才会删除超时数据，所以删除超时队列数据的时候，如果没有返回过数据，是不正常的，需要主动返回一个未知异常来，方便定位问题

业务的奇怪需求

服务端不支持HEAD，对HEAD请求当做GET请求处理

根据专门的HTTP语义标准：RFC 9110: HTTP Semantics

However, a server MAY omit header fields for which a value is determined only while generating the content. For example, some servers buffer a dynamic response to GET until a minimum amount of data is generated so that they can more efficiently delimit small responses or make late decisions with regard to content selection. Such a response to GET might contain Content-Length and Vary fields, for example, that are not generated within a HEAD response.

服务器通常应该在响应HEAD请求时，发送与如果请求方法为GET时相同的头字段。然而，对于某些只有在生成内容时才能确定值的头字段，服务器可以选择不发送。例如，有些服务器会缓存GET请求的动态响应，直到产生了足够的数据，以便更有效地分界小型响应或在内容选择方面做出更晚的决定。这样的GET响应可能包含像Content-Length和Vary这样的字段，但在HEAD响应中不会生成这些字段的值。

服务端对HEAD请求是可以不包含Content-Length的，因此head应该无视Content-Length，解析完http的headers以后，就直接把连接丢进连接池

业务可以配置每个连接最多复用1次来强制直接关闭连接

但是这种用法其实只对chunk这种特别大的http回包有用，否则虽然客户端不会解析多余的包内容，实际上应用层还是read包了（因为是使用buffer有多少读多少的），甚至如果复用了连接丢进连接池以后，tcp也会继续读包，在我的实现中，空闲连接读包会直接抛出异常。