C++17的值类别

几年前写的一篇C++11的值类别,有朋友指出了里面的一些问题

改完小问题以后,回头看感觉其实写的很烂,很多重要内容没有提到,当然这和C++11的值类别本身比较晦涩,我当时理解的也很烂有关系

刚好C++17简化了值类别,那么直接再写一篇关于值类别的文章

表达式的类型和值类别

表达式有两个属性

  • 类型:描述计算产生的值的静态类型
  • 值类别:描述值是如何产生的,以及表达式的行为如何被影响

表达式之间能够赋值,不只是需要判断类型,还需要判断值类别是否成立

例如表达式“7”的类型是int,表达式“5+2”的类型是int

7 = 5+2这个表达式无法编译,两个表达式的类型成立,但是值类别无法成立

传统的值类别

在=号左边的就是左值,右边的就是右值

回顾C++98的值类别

在C++11以前,值类别几乎没有什么存在感

根据C++98的标准定义(3.10 Lvalues and rvalues)

Every expression is either an lvalue or an rvalue.

An lvalue refers to an object or function. Some rvalue expressions—those of class or cv-qualified class type—also refer to objects

这个定义很模糊,左值是指对象或函数,右值也是对象,基本上等于啥也没说

随后标准列举了一些表达式是左右值细则,其中有一条是

Whenever an lvalue appears in a context where an rvalue is expected, the lvalue is converted to an rvalue; see 4.1, 4.2, and 4.3.

然后在4.1 Lvalue-to-rvalue conversion中正式定义了左值到右值的转换:

An lvalue (3.10) of a non-function, non-array type T can be converted to an rvalue.

因此对表达式"5+2"来说,它是一个左值,但是当他出现在等号右边时,会隐式转换成右值

回顾C++11的值类别

到了C++11以后,值类别被重新进行了定义

img
  • An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [Example: If E is an expression of pointer type, then *E is an lvalue expression referring to the object or function to which E points. As another example, the result of calling a function whose return type is an lvalue reference is an lvalue. —end example ]
  • An xvalue (an “eXpiring” value) also refers to an object, usually near the end of its lifetime (so that its resources may be moved, for example). An xvalue is the result of certain kinds of expressions involving rvalue references (8.3.2). [Example: The result of calling a function whose return type is an rvalue reference is an xvalue. —end example ]
  • A glvalue (“generalized” lvalue) is an lvalue or an xvalue.
  • An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment expression) is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
  • A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [Example: The result of calling a function whose return type is not a reference is a prvalue. The value of a literal such as 12, 7.3e5, or true is also a prvalue. —end example ]

由于被分为了左值(lvalue),将亡值(xvalue)和纯右值(pvalue)三个最细分的类型,因此直接读标准显得非常难以理解

根据cppreference的定义

With the introduction of move semantics in C++11, value categories were redefined to characterize two independent properties of expressions[5]:

  • has identity: it's possible to determine whether the expression refers to the same entity as another expression, such as by comparing addresses of the objects or the functions they identify (obtained directly or indirectly);
  • can be moved from: move constructor, move assignment operator, or another function overload that implements move semantics can bind to the expression.

In C++11, expressions that:

  • have identity and cannot be moved from are called lvalue expressions;
  • have identity and can be moved from are called xvalue expressions;
  • do not have identity and can be moved from are called prvalue ("pure rvalue") expressions;
  • do not have identity and cannot be moved from are not used[6].

可以看到,cppreference的定义分为2个类型以后简单了一些

拥有身份(glvalue) 不拥有身份
可移动(rvalue) xvalue prvalue
不可移动 lvalue 不存在

值类别被简化为了两个问题,什么是拥有身份?什么是可移动?

根据cppreference引用的文章https://www.stroustrup.com/terminology.pdf

  • “has identity” – i.e. and address, a pointer, the user can determine whether two copies are identical, etc.

  • “can be moved from” – i.e. we are allowed to leave to source of a “copy” in some indeterminate, but valid state

根据我的理解:

  • 拥有身份

    是有名字的表达式,可以确定表达式是否与另一表达式指代同一实体,可以取地址

  • 可移动

    对象的资源可以移动到别的对象中,移动是指一种使原表达式失效的“拷贝”

    也可以根据身份来解释:表达式不拥有身份,或者拥有身份且生命周期即将结束的表达式

4.1 Lvalue-to-rvalue conversion也进行了修改

A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue.

扩大了左值转换到右值的范围,使得xvalue也可以进行转换

C++17的值类别

C++11的值类别即使在简化理解以后,依然有点让人懵逼,看到一个表达式需要先思考身份和可移动,这非常晦涩,需要对值类别进行简化

另外在C++17中,为了解决RVO能干掉move/copy,但是还是要求得有move/copy构造函数的问题,需要对值类别进行修改

因此C++17迎来了值类别的优化

首先看下为什么需要对值类别进行修改

RVO和C++11语法上的冲突

看如下代码

1
2
3
4
5
6
7
8
9
10
11
12
13
struct NonMovable
{
NonMovable() noexcept = default;
NonMovable(NonMovable&&) noexcept = delete;
NonMovable& operator=(NonMovable&&) noexcept = delete;
};
NonMovable Make()
{
return NonMovable{};
}
int main() {
[[maybe_unused]] const auto x = Make();
}

这里编译是必然失败的,因为RVO虽然能干掉move/copy,但是这只是编译器优化

在C++11的语法上,Make函数需要move/copy构造函数之一,而x的赋值需要move构造函数

我在工作中经常因为这个问题踩坑,由于C++11语法无法绕过,因此我只能绕过RVO改为引用传参将结果返回,整个代码一点也不优雅,auto也用不了

解决办法: prvalue只初始化对象,不再是临时对象,不能被移动

主要需要解决什么情况下省略拷贝和移动,但是如果加入一个新的值类别来描述,那我只能说真的学不动了

由于xvalue和prvalue都是可以被移动的,因此可以修改值类别

要求只有xvalue可以被移动,prvalue不能被移动,不再是临时对象,但是可以隐式转换为xvalue这样的临时对象

因此prvalue用于初始化的情况下,可以省略拷贝,而其他情况下,隐式转换为xvalue进行移动

由于在C++11中,使用可移动划分了rvalue这个分类

在C++17中,这个概念被弱化了

拥有身份(glvalue) 不拥有身份
xvalue, lvalue prvalue

通过是否拥有身份划分为了glvalue和prvalue,这大大简化了理解成本

img
img

最后来看下C++17标准定义

  • A glvalue(generalized lvalue) is an expression whose evaluation determines the identity of an object, bit-field, or function.
  • A prvalue(pure rvalue) is an expression whose evaluation initializes an object or a bit-field, or computes the value of an operand of an operator, as specified by the context in which it appears, or an expression that has type cv void.
  • An xvalue(eXpiring value) is a glvalue that denotes an object or bit-field whose resources can be reused(usually because it is near the end of its lifetime)。
  • An lvalue is a glvalue that is not an xvalue.
  • An rvalue is a prvalue or an xvalue.

可以简单地理解成 glvalue 表示具有标识符的对象或函数,prvalue 表示只有在初始化对象时才能发挥作用的值。

然后标准加入了7.4 Temporary materialization conversion

A prvalue of type T can be converted to an xvalue of type T

This conversion initializes a temporary object of type T from the prvalue by evaluating the prvalue with the temporary object as its result object, and produces an xvalue denoting the temporary object

T shall be a complete type

1
2
struct X { int n; };
int k = X().n; // OK, X() prvalue is converted to xvalue

当期待glvalue的地方出现了prvalue,会创建一个临时对象,用prvalue初始化,得到的xvalue作为临时对象

这里有一个更细节的解释:how-does-guaranteed-copy-elision-work

Guaranteed copy elision redefines the meaning of a prvalue expression. Pre-C++17, prvalues are temporary objects. In C++17, a prvalue expression is merely something which can materialize a temporary, but it isn't a temporary yet.

在保证的复制省略(Guaranteed Copy Elision)下,对prvalue表达式的理解发生了变化。在C++17之前,prvalue(纯右值)被视为临时对象。但在C++17中,prvalue可被理解为一个能够产生临时对象的东西,但其本身并未成为临时对象。

If you use a prvalue to initialize an object of the prvalue's type, then no temporary is materialized. When you do return T();, this initializes the return value of the function via a prvalue. Since that function returns T, no temporary is created; the initialization of the prvalue simply directly initilaizes the return value.

如果你使用prvalue来初始化prvalue类型的对象,那么不会生成临时对象。当你执行return T()时,这将通过prvalue来初始化函数的返回值。因为该函数返回了类型T,所以不会产生临时对象;直接利用prvalue初始化了返回值。

The thing to understand is that, since the return value is a prvalue, it is not an object yet. It is merely an initializer for an object, just like T() is.

需要理解的是,由于返回值是prvalue,它还未成为一个对象。它仅仅是一个对象的初始化器,就像T()能作为一个对象的初始化器一样。

When you do T t = Func();, the prvalue of the return value directly initializes the object t; there is no "create a temporary and copy/move" stage. Since Func()'s return value is a prvalue equivalent to T(), t is directly initialized by T(), exactly as if you had done T t = T().

当你执行T t = Func()时,函数返回值的prvalue会直接初始化对象t;这中间并无"创建一个临时对象并复制/移动"的步骤。由于Func()的返回值是与T()等效的prvalue,所以对象t直接被T()初始化,就如同你执行了 T t = T()一样。

If a prvalue is used in any other way, the prvalue will materialize a temporary object, which will be used in that expression (or discarded if there is no expression). So if you did const T &rt = Func();, the prvalue would materialize a temporary (using T() as the initializer), whose reference would be stored in rt, along with the usual temporary lifetime extension stuff.

如果prvalue以任何其它方式被使用,prvalue将会实体化一个临时对象,这个临时对象将在该表达式中被使用(如果没有表达式,则会被舍弃)。所以,如果你执行了 const T &rt = Func();,prvalue实体化了一个临时对象(以T()作为初始化器),它的引用被存储在rt中,同时还有通常的临时对象生命周期的延长机制。

简而言之:

在C++17以后,prvlaue只用来初始化对象,以任何其他方式使用纯右值,则纯右值将具体化一个临时对象(也就是隐式转换成xvalue)

由于只用来初始化对象,因此prvalue不能被移动

NRVO

从RVO的角度来说,返回的是一个纯右值,C++17定义了它的行为:复制消除

但是NRVO返回的是一个具名变量,这种情况下是否需要移动还是取决于编译器自己的实现

理解C++17 prvalue变化的例子

以下代码来自于C++14中的设计:

希望不要复制使用NonCopyable,但是允许隐式转换来使用,但是在C++17这种设计失效了

1
2
3
4
5
6
7
8
9
10
11
12
struct NonCopyable {
operator int() {
return 0;
}
NonCopyable() = default;
NonCopyable(const NonCopyable &) = delete;
NonCopyable& operator=(const NonCopyable&) = delete;
};
int main() {
[[maybe_unused]] int i = NonCopyable{}; //1
[[maybe_unused]] NonCopyable v = NonCopyable{}; //2
}
  • C++14
    • 1可以,这是因为prvalue隐式转换成了int值,没有发生拷贝
    • 2不行,这是因为prvalue作为临时值发生了拷贝
  • C++17,都可以
    • 2可以,是因为prvalue仅仅只是一个初始化器,不是对象,不会发生拷贝

C++17的这种变化也体现在RVO中,cppreferencePrvalue semantics ("guaranteed copy elision")章节

Since C++17, a prvalue is not materialized until needed, and then it is constructed directly into the storage of its final destination. This sometimes means that even when the language syntax visually suggests a copy/move (e.g. copy initialization), no copy/move is performed — which means the type need not have an accessible copy/move constructor at all. Examples include:

自C++17起,一个纯右值(prvalue)在需要时才实体化,并且会直接在其最终目的地的存储空间中构建。这有时意味着即使语言语法在视觉上暗示了复制/移动(例如,复制初始化),也不会执行复制/移动,这意味着该类型根本不需要有可访问的复制/移动构造函数。包括以下情况:

  • Initializing the returned object in a return statement, when the operand is a prvalue of the same class type (ignoring cv-qualification) as the function return type:

    当操作数是与函数返回类型(忽视常量性/易变性限定)相同的类类型的纯右值时,在返回语句中初始化返回的对象。

  • In the initialization of an object, when the initializer expression is a prvalue of the same class type (ignoring cv-qualification) as the variable type:

    在对象的初始化时,如果初始化器表达式是与变量类型(忽视常量性/易变性限定)相同的类类型的纯右值。

C++17保证了prvalue在RVO中不再需要拷贝和移动构造函数

虽然在C++17制定之前,所有编译器就都实现了RVO,但是在Non-mandatory copy/move(since C++11) elision章节可以看到

Under the following circumstances, the compilers are permitted, but not required to omit the copy and move(since C++11) construction of class objects even if the copy/move(since C++11) constructor and the destructor have observable side-effects. The objects are constructed directly into the storage where they would otherwise be copied/moved to. This is an optimization: even when it takes place and the copy/move(since C++11) constructor is not called, it still must be present and accessible (as if no optimization happened at all), otherwise the program is ill-formed: ...

在以下场景中,即使类对象的复制/移动构造函数及析构函数存在明显的副作用,编译器也有权利但非义务省略它们(自C++11起)。这些对象会直接在它们原本应被复制/移动至的内存中构造。这是一种优化方式:即使当此优化发生时并未实际调用复制/移动构造函数,这些构造函数仍然必须是存在且可访问的(如同根本没有进行优化一样),否则的话,程序就会格式错误

C++11的要求RVO或者NRVO也一定是需要有复制/移动构造函数的,C++17为prvalue彻底放开了这个限制

1
2
3
4
5
6
7
NonCopyable Make() {
return NonCopyable{};
}
int main() {
[[maybe_unused]] int i = Make(); //3
[[maybe_unused]] NonCopyable v = Make(); //4
}
  • C++14
    • 3可以,4不行
  • C++17
    • 都可以,因为3和4直接被视作1和2的场景,1和2能过,3和4自然也可以编译通过了

题外话:gcc和clang在NRVO上的区别

在测试NRVO之前,先给NonCopyable新增一个非const的构造函数

1
2
3
4
struct NonCopyable {
//同上 ...
NonCopyable(NonCopyable &) = default;
};

然后再进行NRVO的测试代码

1
2
3
4
5
6
7
8
NonCopyable Make() {
NonCopyable v;
return v;
}
int main() {
[[maybe_unused]] int i = Make(); //3
[[maybe_unused]] NonCopyable v = Make(); //4
}
  • gcc9.0以上
    • C++14,3可以,4不行
    • C++17,都可以
  • clang10.0以上,msvc最新版,icx最新版
    • 不管是C++14还是C++17,Make函数都无法编译通过,提示A禁止拷贝
分析

这两种行为都是基本符合标准的,根据上文中提到的Non-mandatory copy/move(since C++11) elision章节

对gcc来说T(T&)构造函数确实存在的,那么就可以通过NRVO优化,将case3,4和case1,2视为完全一致

而clang等编译器NRVO优化的构造函数只认T(const T&)

解决办法

可以将代码如上文所述直接使用RVO,从而不再对拷贝和移动构造函数有任何要求

1
2
3
NonCopyable Make() {
return {};
}

从而不管在什么编译器下都和gcc有一致的行为:

  • C++14,3可以,4不行
  • C++17,都可以

通过move理解值类别的例子

C++为什么纯右值能被延迟析构,将亡值却不行? - ZhiHuReader的回答 - 知乎

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <iostream>
#include <utility>

class result {
public:
result() {
std::cout << "result" << std::endl;
}

~result() {
std::cout << "~result" << std::endl;
}
};

int main() {
std::cout << "first line of main" << std::endl;
result&& r = result();
// result&& r = std::move(result());
std::cout << "last line of main" << std::endl;
}

这个例子中result&& r = result();,这里希望延长一个返回值的生命周期绑定到r的生命周期,输出

1
2
3
4
first line of main
result
last line of main
~result

成功了

result&& r = std::move(result());,这里希望延长一个返回临时值的生命周期绑定到r的生命周期,输出

1
2
3
4
first line of main
result
~result
last line of main

失败了

根据标准的理解

在C++11标准中12.2 Temporary objects 第5小节说到

The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except:

第二种情境是引用绑定到一个临时对象。绑定到引用的临时对象,或者是引用绑定的子对象的完整对象的临时对象,会持续存在直到引用的生命周期结束,除非:

— A temporary bound to a reference member in a constructor’s ctor-initializer (12.6.2) persists until the constructor exits.

  • 在构造函数的成员初始化器中,绑定到引用成员的临时对象会持续存在,直到构造函数退出。

— A temporary bound to a reference parameter in a function call (5.2.2) persists until the completion of the full-expression containing the call.

  • 在函数调用中,绑定到引用参数的临时对象会持续存在,直到包含该调用的完整表达式完成。

— The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not extended; the temporary is destroyed at the end of the full-expression in the return statement.

  • 在函数返回语句中,绑定到返回值的临时对象的生命周期不会被延长;临时对象会在返回语句的完整表达式结束时被销毁。

同样在C++17 标准中12.2 Temporary objects 第6小节说到

The third context is when a reference is bound to a temporary(116). The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except :

第三种情况是引用绑定到一个临时对象。引用绑定的临时对象,或者是引用绑定的子对象的完整对象的临时对象,会持续存在,直到引用的生命周期结束,除非:

— A temporary object bound to a reference parameter in a function call (8.2.2) persist s until the completion of the full-expression containing the call.

  • 在函数调用中,被绑定到引用参数的临时对象会持续存在,直到包含该调用的完整表达式完成。

— The lifetime of a temporary bound to the returned value in a function return statement (9.6.3) is not extended; the temporary is destroyed at the end of the full-expression in the return statement .

  • 在函数返回语句中,绑定到返回值的临时对象的生命周期不会被延长;该临时对象会在返回语句的完整表达式结束时被销毁。

这两个标准都指明了:

  • 规则1,在函数调用中,被绑定到引用参数的临时对象会持续存在,直到包含该调用的完整表达式完成。
  • 规则2,在函数返回语句中,绑定到返回值的临时对象的生命周期不会被延长;该临时对象会在返回语句的完整表达式结束时被销毁。

根据move的实现

1
2
3
4
template<typename T>
typename std::remove_reference<T>::type&& move(T&& t) noexcept {
return static_cast<typename std::remove_reference<T>::type&&>(t);
}

move(result())这个表达式中result()作为一个prvalue被传给了T&&这个右值引用类型的变量

回顾前面所提到的,prvalue本身不是临时值,prvalue用来初始化右值引用变量,会隐式转换成一个xvalue的临时值对象

根据规则1,这个临时对象的生命周期会持续到函数调用完成,根据规则2,result&& r = std::move(result())不会延长move函数内部临时对象的生命周期

因此会产生上述现象

通俗的理解

临时值的生命周期只能被第一次引用的变量延长,二次引用无法延长生命周期

  • 第一次引用延长是

    move内部的T&& t = result()

    prvalue用来初始化右值引用变量,会隐式转换成一个xvalue的临时值对象

    从而延长了临时值生命周期

  • 第二次引用

    result&& r = std::move(result())

    返回&&类型的函数表达式是xvalue类型

    xvalue用来初始化右值引用变量,不会产生新的临时对象

    因此无法延长生命周期

二次引用无法延长生命周期对于普通引用也是有效的

1
2
3
4
int& lvalue(const int& v) {
return v;
}
int &r = lvalue();
  • 第一次引用prvalue用来初始化引用变量,会隐式转换成一个xvalue的临时值对象,从而延长了临时值生命周期
  • 第二次引用lvalue用来初始化右值引用变量,不会产生新的临时对象,因此无法延长生命周期

二次引用无法延长生命周期甚至还能用来解释直接返回临时值的情况

1
2
3
4
int &lvalue() {
return {};
}
int &r = lvalue();
  • 第一次引用prvalue用来初始化返回值这个tmp的引用变量,会隐式转换成一个xvalue的临时值对象,从而延长了临时值生命周期
  • 第二次引用lvalue用来初始化右值引用变量,不会产生新的临时对象,因此无法延长生命周期

如何正确的“延长函数返回临时值生命周期”

由于返回临时值的引用变量生命周期一定是会结束的,因此严格来说“延长了返回值的生命周期”是要打引号的

实际上是当引用(或右值引用)的返回值这个xvalue是用来构造一个新的对象时,匹配到了拷贝/移动构造函数,产生了一个新的对象,然后原本的临时值才析构

这也是实际生产中的惯用方法

1
2
3
4
5
6
7
8
9
10
11
12
13
class result {
public:
// ...其他一致,新增一个move函数
result(&&) {
std::cout << "result" << std::endl;
}


int main() {
std::cout << "first line of main" << std::endl;
result r = std::move(result());
std::cout << "last line of main" << std::endl;
}

输出

1
2
3
4
5
6
first line of main
result
result &&
~result
last line of main
~result

值类别的检测

可以通过标准库来检查当前的值类别

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
if constexpr (std::is_lvalue_reference_v<decltype((e))>) {
std::cout << "expression is lvalue" << "\n";
} else if constexpr (std::is_rvalue_reference_v<decltype((e))>) {
std::cout << "expression is xvalue" << "\n";
} else {
std::cout << "expression is prvalue" << "\n";
}
或者
struct lvalue_tag { static constexpr const char* label = "lvalue"; };
struct prvalue_tag { static constexpr const char* label = "prvalue"; };
struct xvalue_tag { static constexpr const char* label = "xvalue"; };

template <typename T>
struct value_category {
using type = typename std::conditional<
std::is_lvalue_reference<T>::value, lvalue_tag,
typename std::conditional<std::is_rvalue_reference<T>::value, xvalue_tag, prvalue_tag>::type
>::type;
};
cout << value_category<decltype((e))>::type().label << endl;

这里是decltype((e))的两层括号是必须的(例如,如果表达式x只是将一个变量命名为v,那么decltype((v))的结构将变成decltype(v),它将生成变量v的类型)

引用类型的组合

在C++ Templates The Complete Guide (2nd Edition)的附录中有以下代码,觉得有点意思就post在这里

1
2
3
4
5
6
7
8
9
10
int& lvalue();          //返回int& 的函数表达式是lvalue
int&& xvalue(); //返回int&& 的函数表达式是xvalue
int prvalue(); //返回int非引用 的函数表达式是prvalue

int& lref1 = lvalue(); // OK: lvalue reference can bindto an lvalue
int& lref3 = prvalue(); // ERROR: lvalue reference cannot bind to a prvalue
int& lref2 = xvalue(); // ERROR: lvalue reference cannot bind to an xvalue
int&& rref1 = lvalue(); // ERROR: rvalue reference cannot bind to an lvalue
int&& rref2 = prvalue(); // OK: rvalue reference can bindto a prvalue
int&& rref3 = xvalue(); // OK: rvalue reference can bindto an xrvalue

参考资料