Paddle Error Message Writing Specification

Paddle报错信息文案书写规范

Paddle Error Message Writing Specification (English Version)

规范概要：

第1节，报错文案书写模板，属于推荐参考的形式，根据情景不同，如果有简洁并且更易于用户理解的写法，可以灵活使用

第2节，强制性规范条目，为必须遵守的报错信息书写规则，前四条已加入CI监控

第3节，报错信息规范示例库，是从Paddle中抽取的一些已有的PADDLE_ENFORCE，将其改写为合规的示例，便于参考

附录，后续对规范完善时，首先在附录中阐明依据以及要修改的内容，作为备案，再对规范的正文内容进行修改

补充说明：

规范在执行过程中，可能会发现现有规范未考虑到的方面，需要在实施过程中不断补充与完善，也请大家积极反馈意见

当前版本的规范中有12种错误类型，如果发现未能覆盖的错误类型，可以申请补充

报错信息规范示例库，示例越丰富，越有参考价值，非常鼓励大家补充新的示例

规范匹配情况较为复杂，可能出现符合规范的写法被匹配为不合规，届时请找 chenwhql(陈威行) approve

1. 报错文案书写模板

PADDLE_ENFORCE_*与PADDLE_THROW的提示信息推荐按照以下结构书写：

注：文案的关键是要将错误描述清楚，模板仅供参考

三段式报错文案书写（错误 - 预期 - 建议）

第一段：指明错误（必写）

直接陈述错误：
- 推荐的描述：
  - A出错，B没有初始化，C不存在，D的值不正确，E不匹配等
    - 示例：Mismatched label shape.
- 不推荐的描述：A应该怎么样，B不应该怎么样
  - 出错了，首先直接告诉用户出错即可
  - 除非必要，不建议以应该/不应该的语气指出错误
  - 应该或者不应该如何，属于第二段的阐述期望结果的内容
本段注意事项：
1. 属性变量要指明错误的主体，例如Op输入输出要写明是哪个Op的输入输出错误，区分前反向Op
2. 指明错误是告诉用户一个事实，一般不允许出现magic number（意义不明的数字），用英语句子陈述即可

第二段：期望值与实际值对比（尽可能提供）

写明此处期望的输入是什么，而实际的输入是什么
- 示例：Expected labels dimension=1. Received 4.
本段注意事项：
1. 将必要信息提供完整，比如Shape出错，需要将具体的Shape输出进行对比，并指明出错的维度
2. 如果第一段的错误是单值描述，本段可以省略。例如A为空指针，B不存在，没必要在此处写明期望A不为空，B应该存在等

第三段：修改建议（尽可能提供）

写明此处的错误可能是由什么导致的，应该如何修改
- 示例：Suggested Fix: If your classifier expects one-hot encoding label,check your n_classes argument to the estimatorand/or the shape of your label.Otherwise, check the shape of your label.
本段注意事项：
- 可以写明修改建议的情况一般适用于一些共性问题，例如
  - startup_program没有执行
  - 某个重要参数没有设置
  - 某个环境配置可能有问题

2. 强制性规范条目

PADDLE_ENFORCE_*与PADDLE_THROW的提示信息书写必须遵循以下条目：

1. 原则上不允许使用PADDLE_ENFORCE表达式（CI有监控）

详细内容请参考PADDLE_ENFORCE 改写规范

2. 不允许省略或为空字符串（CI有监控）

错误示例：

PADDLE_ENFORCE(ctx->HasInput("X"));

PADDLE_ENFORCE(ctx->HasInput("X"), "");

3. 不允许提示过短，至少长于20个字符（CI有监控）

错误示例：

PADDLE_ENFORCE(i != nullptr, "I must be set");

4. 必须指明错误类型（CI有监控）

当前声明了12种错误类型（具体见第三节中的详细示例）
- InvalidArgument：参数错误
- NotFound：目标未找到
- OutOfRange：越界错误
- AlreadyExists：目标已存在，目标重复
- ResourceExhausted：资源耗尽
- PreconditionNotMet：前提条件不满足
- PermissionDenied：操作不允许
- ExecutionTimeout：执行超时
- Unimplemented：功能未实现
- Unavailable：服务不可用
- Fatal：严重错误
- External：外部错误，第三方库错误

用法概要：在整个错误提示字符串（包含变长参数列表）的外面包裹platform::errors::ErrorType()

简要示例（注意后面括号的位置）：

旧: PADDLE_ENFORCE(true, "example: %s", str);

新: PADDLE_ENFORCE(true, platform::errors::InvalidArgument("example: %s", str));

正确示例：

PADDLE_ENFORCE_GT(y_dims.size(), y_num_col_dims,
                      platform::errors::InvalidArgument("The input tensor Y's dimensions of MulOp "
                      "should be larger than y_num_col_dims. But received Y's "
                      "dimensions = %d, Y's shape = [%s], y_num_col_dims = %d.",
                      y_dims.size(), y_dims, y_num_col_dims));

错误示例：

PADDLE_ENFORCE_GT(y_dims.size(), y_num_col_dims,
                      "The input tensor Y's dimensions of MulOp "
                      "should be larger than y_num_col_dims. But received Y's "
                      "dimensions = %d, Y's shape = [%s], y_num_col_dims = %d.",
                      y_dims.size(), y_dims, y_num_col_dims);

注意：__CUDA_ARCH__下的PADDLE_ENFORCE尚不支持声明错误类型，如果遇到，找审核人员approve即可

4. 不允许在提示中使用C++端开发人员定义的变量缩写，应展开为完整英语单词

错误示例：

PADDLE_ENFORCE(forward_pd != nullptr,
               "Fail to find eltwise_fwd_pd in device context");

5. 确保提示不存在语法错误

错误示例：

PADDLE_ENFORCE(context->HasInput("X"),
               "ArrayToLoDTensorOp must has input X."); //must has?

3. 报错信息规范示例库

考虑到开发者对于前述标准理解存在差异，对于错误的归类也可能存在疑惑，所以此处尽可能地提供了各类错误的示例，以及相关提示的参考写法，请开发者在优化报错信息的时候，主动参考此处的规范示例。

1. InvaliArgument - 参数有误

用户传入了非法的参数，包含各种参数类型错误，应该是最为普遍的错误类型

1.1 ShapeError

PADDLE_ENFORCE_EQ(
    output_shape[unk_dim_idx] * capacity, -in_size,
    platform::errors::InvalidArgument(
        "The 'shape' attribute in ReshapeOp is invalid. "
        "The input tensor X'size must be divisible by known "
        "capacity of 'shape'. "
        "But received X's shape = [%s], X's size = %d, "
        "'shape' is [%s], known "
        "capacity of 'shape' is %d.",
        in_dims, in_size, framework::make_ddim(shape), capacity));

1.2 参数为空（列表为空，空指针等）

PADDLE_ENFORCE_NE(vars.empty(), true, platform::errors::InvalidArgument(
                                          "Variable names are empty."));

1.3 参数有误，与预期值不相等

PADDLE_ENFORCE_GT(batch_size, 0, platform::errors::InvalidArgument(
                                    "Batch size %d is illegal.", batch_size));

PADDLE_ENFORCE_NE(
    num, 0,
    platform::errors::InvalidArgument(
        "The number of ids can not be zero, you need padding "
        "it in data generator, or if there is something wrong with "
        "the data, please check if the data contains unresolvable "
        "characters.\nplease check this error line: %s.",
        str));

1.4 参数格式错误

PADDLE_ENFORCE_NE(in.format(), MKLDNNMemoryFormat::format_undef,
          platform::errors::InvalidArgument(
              "Input tensor format is invalid. Input tensor should "
              "have specified memory format."));

1.5 参数未初始化

PADDLE_ENFORCE_EQ(proto_->IsInitialized(), true,
                  platform::errors::InvalidArgument(
                      "Operator's Proto in op info is not initialized."));

PADDLE_ENFORCE_EQ(
    t->IsInitialized(), true,
    platform::errors::InvalidArgument(
        "The Tensor in the %s Op's Input Variable %s(%s) is "
        "not initialized.",
        Type(), name, ctx.Inputs(name).at(i)));

1.6 参数类型错误

PADDLE_ENFORCE(
    tmp == *data_type || *data_type == dafault_data_type,
    platform::errors::InvalidArgument(
        "The DataType of %s Op's duplicable Variable %s must be "
        "consistent. The current variable type is (%s), but the "
        "previous variable type is (%s).",
        Type(), name, DataTypeToString(tmp),
        DataTypeToString(*data_type)));

PADDLE_ENFORCE_EQ(
    valid, true,
    platform::errors::InvalidArgument(
        "Tensor holds the wrong type, it holds %s, but desires to be %s.",
        DataTypeToString(type_),
        DataTypeToString(DataTypeTrait<T>::DataType())));

1.7 参数解析错误

PADDLE_ENFORCE_EQ(success, true,
                  platform::errors::InvalidArgument(
                      "Fail to parse DataFeedDesc from string: %s.",
                      data_feed_desc_str.c_str()));

1.8 LoD错误

PADDLE_ENFORCE_GT(lod_level, 0, platform::errors::InvalidArgument(
                                    "Input(X) Tensor of SequencePoolOp "
                                    "does not contain LoD information."));

2. NotFound - 未找到目标

申请的实体找不到，要找的变量为空，输入输出不存在等

和空指针区分开，找不到变量和变量没有被正确赋值，是两个层面的概念

2.1 Op输入输出未找到

PADDLE_ENFORCE_EQ(
    ctx->HasInput("X"), true,
    platform::errors::NotFound("Input(X) of MulOp is not found."));
PADDLE_ENFORCE_EQ(
    ctx->HasInput("Y"), true,
    platform::errors::NotFound("Input(Y) of MulOp is not found."));
PADDLE_ENFORCE_EQ(
    ctx->HasOutput("Out"), true,
    platform::errors::NotFound("Output(Out) of MulOp is not found."));

2.2 缺少节点

PADDLE_ENFORCE_NOT_NULL(
    p, platform::errors::NotFound("subgraph has no node %s.", name.c_str()));

2.3 文件未找到

PADDLE_ENFORCE_GT(file_cnt, 0,
                  platform::errors::NotFound("Input file list is empty."));

2.4 其他

PADDLE_ENFORCE_NOT_NULL(
    var_desc, platform::errors::NotFound("%s is not found.", var_name));

PADDLE_ENFORCE_NOT_NULL(
    proto_,
    platform::errors::NotFound("Operator's Proto has not been registered."));

3. OutOfRange - 越界错误

PADDLE_ENFORCE_LT(
    i, N, platform::errors::OutOfRange("Array index out of bounds."));

PADDLE_ENFORCE_GT(value, lower_bound_,
                  platform::errors::OutOfRange("Attribute GreaterThan check failed."));

4. AlreadyExists - 目标已存在 / 目标重复

查找的实体已存在，或者某些仅允许存在单个实例的个体，却找到了多个

PADDLE_ENFORCE_EQ(
    attrs_.count(attr_name), 0,
    platform::errors::AlreadyExists(
        "The attribute %s has been set in the graph.", attr_name));

PADDLE_ENFORCE_NE(Has(pass_type), true, 
    platform::errors::AlreadyExists(
        "Pass %s has been registered.", pass_type));

PADDLE_ENFORCE_LE(ins.size(), 1UL,
    platform::errors::AlreadyExists(
        "Operator %s's input %s should contain only one variable.", type_, name));
                    
PADDLE_ENFORCE_EQ(
    fused_var_set.count(fused_var_name), 0,
    platform::errors::AlreadyExists(
         "The fused variable already exists."));

5. PermissionDenied - 操作不允许

当前操作不允许被执行。

PADDLE_ENFORCE_NE(a, b, platform::errors::PermissionDenied(
                            "Cannot connect the same node in the graph."));

6. ResourceExhausted - 资源耗尽

PADDLE_THROW_BAD_ALLOC(platform::errors::ResourceExhausted(
    "\n\nOut of memory error on GPU %d. "
    "Cannot allocate %s memory on GPU %d, "
    "available memory is only %s.\n\n"
    "Please check whether there is any other process using GPU %d.\n"
    "1. If yes, please stop them, or start PaddlePaddle on another GPU.\n"
    "2. If no, please decrease the batch size of your model.\n",
    place_.device, string::HumanReadableSize(size), place_.device,
    string::HumanReadableSize(avail), place_.device));

PADDLE_THROW_BAD_ALLOC(platform::errors::ResourceExhausted(
     "\n\nOut of memory error on GPU %d. "
     "Cannot allocate %s memory on GPU %d, "
     "available memory is only %s.\n\n"
     "Please check whether there is any other process using GPU %d.\n"
     "1. If yes, please stop them, or start PaddlePaddle on another GPU.\n"
     "2. If no, please try one of the following suggestions:\n"
     "   1) Decrease the batch size of your model.\n"
     "   2) FLAGS_fraction_of_gpu_memory_to_use is %.2lf now, "
     "please set it to a higher value but less than 1.0.\n"
     "      The command is "
     "`export FLAGS_fraction_of_gpu_memory_to_use=xxx`.\n\n",
     gpu_id_, string::HumanReadableSize(size), gpu_id_,
     string::HumanReadableSize(avail), gpu_id_,
     FLAGS_fraction_of_gpu_memory_to_use));

7. PreconditionNotMet - 前提条件有误

当前执行的操作，需要一定的前提条件满足才能够执行

PADDLE_ENFORCE_NOT_NULL(
    mutex_for_pick_file_,
    platform::errors::PreconditionNotMet(
        "You should call SetFileListMutex before PickOneFile"));

PADDLE_ENFORCE_NOT_NULL(
    root_scope_,
    platform::errors::PreconditionNotMet(
        "root_scope should be set before creating thread scope."));

PADDLE_ENFORCE_NE(
    fetched_var_it, fetched_vars->end(),
    platform::errors::PreconditionNotMet(
        "Cannot find fetched variable(%s). Perhaps the main_program "
        "is not set to ParallelExecutor.",
        var_name));

PADDLE_ENFORCE_EQ(finish_start_, true,
                  platform::errors::PreconditionNotMet(
                      "Datafeed has not started running yet."));

PADDLE_ENFORCE_NE(framework::product(y_dims), 0,
                  platform::errors::PreconditionNotMet(
                      "The Input variable Y(%s) has not "
                      "been initialized. You may need to confirm "
                      "if you put exe.run(startup_program) "
                      "after optimizer.minimize function.",
                      ctx->Inputs("Y").front());

PADDLE_ENFORCE_NE(FLAGS_use_ngraph, true,
                  platform::errors::PreconditionNotMet(
                      "Please compile with NGRAPH first to use NGRAPH."));

8. ExecutionTimeout - 执行超时

执行响应时间过长，或者通信超时。

示例暂时未找到，有待后续添加。

9. Unimplemented - 功能尚未实现

尚未实现或支持，但之后有可能会实现

PADDLE_ENFORCE_NE(iter, operations_.end(),
                  platform::errors::Unimplemented(
                      "Operation %s is not supported yet.", op_type));

PADDLE_ENFORCE_EQ(
    all_reduce_ops.size(), grads.size(),
    platform::errors::Unimplemented(
        "The number of all_reduce OpHandle is not equal to the "
        "number of grads. Maybe some gradients are sparse type, "
        "it is not supported currently."));

10. Unavailable - 服务不可用

当前服务不可用，或当前操作不能执行。

10.1 IO错误

PADDLE_ENFORCE_NE(file_descriptor, -1, platform::errors::Unavaliable(
                                            "Cannot open file %s.", filename));

PADDLE_ENFORCE_EQ(fin.good(), true, platform::errors::Unavaliable(
                                        "Cannot open file %s.", filename));

PADDLE_ENFORCE_EQ(
    file.is_open(), true,
    platform::errors::Unavailable("Can not open %s to add benchmark.", path));

11. Fatal - 严重错误

未预料到的，严重的错误，例如段错误。

用于后期增加 try-catch 处理非预期的异常，开发者暂时不会用到。

12. External - 外部 / 第三方库错误

PADDLE_ENFORCE_CUDA_SUCCESS(
    cudaEventCreate(&event_, cudaEventDisableTiming),
    platform::errors::External(
        "Create event failed in CUDADeviceContextAllocator"));

4. 规范更新与补充

1. 新增OP_INOUT_CHECK宏，用于Op InferShape（2020.04.03）

Op InferShape的输入输出检查，报错类型和报错信息十分类似，但由于之前普遍没有添加报错类型，因此均需要修改。
添加了一个新的检查宏用于处理此类检查，用法示例如下：

OP_INOUT_CHECK(ctx->HasInput("X"), "Input", "X", "Mul");

只需要依次传入条件表达式，Input或者Output，Op输入输出名，Op名即可。
一方面简化代码，减少大家工作量，另一方面，这能够保证所有Op的输入输出检查报错信息均一致，统一之前存在的各种写法，避免语法问题。

附录

用于备案规范的其他内容

1. Paddle报错信息优化改动说明（2019.11.13上线）

原Paddle报错信息示例：

原Paddle报错信息示例

优化后Paddle报错信息示例：

优化后Paddle报错信息示例

2. 错误类型新增方法

以错误类型UNKNOWN为例介绍：

第一步：在paddle/fluid/platform/error_codes.proto中添加新增的错误代码

UNKNOWN = 13;

第二步：在paddle/fluid/platform/s.h中注册新增的错误类型

REGISTER_ERROR(Unknown, UNKNOWN)

第三步：在paddle/fluid/platform/errors.cc中添加新增的错误字符串

case paddle::platform::error::UNKNOWN:
      return "UnknownError";
      break;

第四步：在代码中使用

PADDLE_ENFORCE_EQ(flag, true, platform::errors::Unknown("example"));

3. 错误类型扩展记录

如果现有的12种错误类型无法覆盖实际场景中遇到的错误，可以申请新增错误类型，在此处阐明

新增错误类型名
新增错误类型应用场景描述
新增错误类型PADDLE_ENFORCE示例（不少于3个）

Release Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paddle Error Message Writing Specification

Paddle报错信息文案书写规范

目录

1. 报错文案书写模板

三段式报错文案书写（错误 - 预期 - 建议）

2. 强制性规范条目

3. 报错信息规范示例库

1. InvaliArgument - 参数有误

2. NotFound - 未找到目标

3. OutOfRange - 越界错误

4. AlreadyExists - 目标已存在 / 目标重复

5. PermissionDenied - 操作不允许

6. ResourceExhausted - 资源耗尽

7. PreconditionNotMet - 前提条件有误

8. ExecutionTimeout - 执行超时

9. Unimplemented - 功能尚未实现

10. Unavailable - 服务不可用

11. Fatal - 严重错误

12. External - 外部 / 第三方库错误

4. 规范更新与补充

1. 新增OP_INOUT_CHECK宏，用于Op InferShape（2020.04.03）

附录

1. Paddle报错信息优化改动说明（2019.11.13上线）

2. 错误类型新增方法

3. 错误类型扩展记录

Clone this wiki locally