Coding rules forked from CPROVER (https://github.com/diffblue/cbmc)
Here a few minimalistic coding rules for the CPROVER source tree.
- With any changes, consider the impact on other users of the code base. Users frequently carry their own set of patches or build tools on top of the code base. Large-scale changes negatively impact both scenarios.
- Tools that link against the code base can reasonably expect a stable interface. We consider as public interface any objects, procedures, or classes (and their methods) that are meant to be used outside a single directory.
- See below for how to document interfaces and how to mark parts of interfaces as deprecated.
Formatting is enforced using clang-format. For more information about this, see
COMPILING.md
. A brief summary of the formatting rules is given below:
- Use 2 spaces indent, no tabs.
- No lines wider than 80 chars.
- When line is wider, do the following:
- Subsequent lines should be indented two more than the initial line
- Break after
=
if it is part of an assignment- For chained calls, prefer immediately before
.
- For chained calls, prefer immediately before
- For other operators (e.g.
&&
,+
) prefer immediately after the operator - For brackets, break after the bracket
- In the case of function calls, put each argument on a separate line if they do not fit after one line break
- Nested function calls do not need to be broken up into separate lines even if the outer function call does.
- When line is wider, do the following:
- If a method is bigger than 50 lines, break it into parts.
- Put matching
{ }
into the same column, except for initializer lists and lambdas, where you should place{
directly after the closing)
. This is to comply with clang-format, which doesn't support aligned curly braces in these cases. - Spaces around binary operators (
=
,+
,==
...) - Space after comma (parameter lists, argument lists, ...)
- Space after colon inside
for
- For pointers and references, the
*
/&
should be attached to the variable name as opposed to the type. E.g. for a pointer to an int the syntax would be:int *x;
- No whitespaces at end of line
- No whitespaces in blank lines
- Put argument lists on next line (and indent 2 spaces) if too long
- Put parameters on separate lines (and indent 2 spaces) if too long
- Spaces around colon for inheritance, put inherited into separate lines in case of multiple inheritance
- The initializer list follows the constructor with a whitespace around the colon.
if(...)
,else
,for(...)
,do
, andwhile(...)
are always in a separate line- Break expressions in
if
,for
,while
if necessary and align them with the first character following the opening parenthesis - Use
{}
instead of;
for the empty statement - Single line blocks without
{ }
are allowed, but put braces around multi-line blocks - Use blank lines to visually separate logically cohesive code blocks within a function
- Have a newline at the end of a file
- Do not use
/* */
- Each source and header file must start with a comment block stating the
author. See existing source for an example of the format of this block. This
should be followed by a Doxygen
\file
comment:Note that the/// \file /// <Some information about this file goes here>
\file
tag must be immediately followed by a newline in order for Doxygen to relate the comment to the current file. - Each class, member variable and function should be preceded by a Doxygen
comment describing it when it is not immediately obvious what it does. The
format should match the LLVM
guidelines,
with one extension: for functions,
\param
and\return
comments longer than a single line should have subsequent lines indented by two spaces, so that the tags stand out. An example:/// This sentence, until the first dot followed by whitespace, becomes /// the brief description. More detailed text follows. Feel free to /// break this into paragraphs to aid readability. /// \param arg: This parameter doesn't need much description /// \param long_arg: This parameter needs a long description. Extra /// information about the parameter gets indented an extra two columns, /// like this. /// \param [out] out_arg: This parameter is mutated by the function, and /// its value at the beginning of the function is not used /// \param [in,out] in_out_arg: This parameter is mutated by the function, and /// its value at the beginning of the function is used /// \return The return value is literally the value returned by the /// function. For out-parameters, use "\param [out]" or "\param [in,out]". ret_typet function(argt arg, argt long_arg, argt &out_arg, argt &in_out_arg)
- The priority of documentation is readability. Therefore, feel free to use Doxygen features, or to add whitespace for multi-paragraph comment blocks if necessary.
- For functions or methods that provide an interface to a module, a comment block should immediately precede the declaration of the entity it documents. This implies that it will live in the header file. The documentions should focus on the externally visible behavior, and should discuss the implementation only to the extent necessary for understanding usage. The implementation should be explained using comments in the definition of the function or method.
- Functions or methods that are internal to a module are documented in a comment block that precedes the definition.
- The documentation block must immediately precede the entity it documents. Don't insert empty lines between docs and functions, because this will confuse Doxygen.
- Put comments on a separate line
- Use comments to explain the non-obvious
- Use #if 0 for commenting out source code
- Use #ifdef DEBUG to guard debug code
- When deprecating interfaces, use the
DEPRECATED
macro, preferably together withSINCE
. For example,marksDEPRECATED(SINCE(2019, 1, 31, "use other_method() instead")) void deprecated_method();
deprecated_method
as deprecated as of 2019-01-31. Any deprecated functionality should remain in place for at least six months from the date of deprecation. Before deprecating code, all in-tree uses should be replaced or marked as deprecated. - In the brief of function descriptions, prefer imperatives such as
Remove first element...
overRemoves first element...
.
- Identifiers should make clear the purpose of the thing they are naming.
- Identifiers may use the characters
[a-z0-9_]
and should start with a lower-case letter (parameters in constructors may start with_
). - Omit names of parameters or exception objects when they are not used. If
parameter names help documenting an interface, keep the name and use
(void)parameter_name;
in the body of the method. - Use American spelling for identifiers.
- Separate basic words by
_
- Avoid abbreviations (e.g. prefer
symbol_table
tost
). - User defined type identifiers have to be terminated by
t
. Moreover, beforet
may not be_
. - Do not use
m_
prefix nor_
suffix for names of attributes of structured types. - Enum values may use the characters
[A-Z0-9_]
- Avoid unnecessary
#include
s, especially in header files - Prefer forward declaration to includes, but forward declare at the top of the header file rather than in line
- Guard headers with
#ifndef CPROVER_DIRECTORIES_FILE_H
, etc - The corresponding header for a given source file should always be the first
include in the source file. For example, given
foo.h
andfoo.cpp
, the line#include "foo.h"
should precede all other include statements infoo.cpp
. clang-format will enforce this. - Use the C++ versions of C headers (e.g.
cmath
instead ofmath.h
). Some of the C headers use macros instead of functions which can have unexpected consequences.
- Each source file should appear on a separate line
- The final source file should still be followed by a trailing slash
- The last line should be a comment to not be deleted, i.e. should look like:
This ensures the Makefiles can be easily merged.
SRC = source_file.cpp \ source_file2.cpp \ # Empty last line
- Each program contains a
program_name_parse_optionst
class which should contain a definePROGRAM_NAME_PARSE_OPTIONS
which is a string of all the parse options in brackets (with a colon after the bracket if it takes a parameter) - Each parameter should be one per line to yield easy merging
- If parameters are shared between programs, they should be pulled out into a common file and then included using a define
- The defines should be
OPT_FLAG_NAMES
which should go into theOPTIONS
define - The defines should include
HELP_FLAG_NAMES
which should contain the help output in the format:" --flag explanations\n" \ " --another flag more explanation\n" \ <-------30 chars------>
- The defines may include
PARSE_OPTIONS_FLAG_NAMES
which move the options from the command line into the options
- Do not use namespaces, except for anonymous namespaces.
- Prefer use of
using
instead oftypedef
. - Prefer use of
class
instead ofstruct
if there is an invariant between members. - Write type modifiers before the type specifier.
- Make references
const
whenever possible - Make member functions
const
whenever possible - Do not hide base class functions
- When overriding a virtual function, use
override
(withoutvirtual
) - Single argument constructors must be
explicit
- Avoid implicit conversions
- Avoid
friend
declarations - Avoid iterators, use ranged
for
instead - Avoid allocation with
new
/delete
, useunique_ptr
- Avoid pointers, use references
- Avoid
char *
, usestd::string
- For numbers, use
int
,unsigned
,long
,unsigned long
,double
- Use
mp_integer
, notBigInt
- Use the functions in util for conversions between numbers and strings
- Avoid C-style functions. Use classes with an
operator()
instead. - Use
irep_idt
for identifiers (notstd::string
) - Avoid destructive updates if possible. The
irept
has constant time copy. - Use instances of
std::size_t
for comparison with return values of.size()
of STL containers and algorithms, and use them as indices to arrays or vectors. - Do not use default values in public functions
- Use assertions to detect programming errors, e.g. whenever you make assumptions on how your code is used
- Use exceptions only when the execution of the program has to abort because of erroneous user input
- We allow to use 3rd-party libraries directly. No wrapper matching the coding rules is required. Allowed libraries are: STL.
- When throwing, omit the brackets, i.e.
throw "error"
. - Error messages should start with a lower case letter.
- Use the
auto
keyword if and only if one of the following- The type is explicitly repeated on the RHS (e.g. a constructor call)
- Adding the type will increase confusion (e.g. iterators, function pointers)
- Avoid
assert
. If the condition is an actual invariant, use INVARIANT, PRECONDITION, POSTCONDITION, CHECK_RETURN, UNREACHABLE or DATA_INVARIANT (also see the documentation of the macros insrc/util/invariant.h
). If there are possible reasons why it might fail, throw an exception.- Use "should" style statements for messages in invariants (e.g. "array should have a non-zero size") to make both the violation and the expected behavior clear. (As opposed to "no zero size arrays" where it isn't clear if the zero-size array is the problem, or the lack of it).
- The statements should start with a lower case letter.
- All raw pointers (such as those returned by
symbol_tablet::lookup
) are assumed to be non-owning, and should not bedelete
d. Raw pointers that point to heap-allocated memory should be private data members of an object which safely manages the pointer. As such,new
should only be used in constructors, anddelete
in destructors. Never usemalloc
orfree
. - Prefer brace style initialisation (i.e.
type_name{arguments...}
) over parentheses for constructor calls - The C++ Core Guidelines are a useful reference for writing good modern C++.
- Avoid if at all possible using irept methods like
get(ID_name)
, instead cast to a derived type (e.g.class_typet
) and use the wrapper methodget_name
- Use
can_cast_type
/can_cast_expr
instead of directly checking theid()
of anirept
.
- Avoid if possible.
- Use
__LINUX__
,__MACH__
, and_WIN32
to distinguish the architectures. - Don't include architecture-specific header files without
#ifdef
...
- Do not output to
cout
orcerr
directly (except in temporary debug code, and then guard#include <iostream>
by#ifdef DEBUG
) - Derive from
messaget
if the class produces output and use the streams provided (status()
,error()
,debug()
, etc) - Use
\n
instead ofstd::endl
- Unit tests are written using Catch
- For large classes:
- Create a separate file that contains the tests for each method of each class
- The file should be named according to
unit/class/path/class_name/function_name.cpp
- For small classes:
- Create a separate file that contains the tests for all methods of each class
- The file should be named according to
unit/class/path/class_name.cpp
- Catch supports tagging, tests should be tagged with all the following tags:
- [core] should be used for all tests unless the test takes more than 1 second to run, then it should be tagged with [long]
- [folder_name] of the file being tested
- [class_name] of the class being tested
- [function_name] of the function being tested
You are allowed to break rules if you have a good reason to do so.
To install the hook
cp .githooks/pre-commit .git/hooks/pre-commit
or use a symbolic link. Then, when running git commit, you should get the
linter output (if any) before being prompted to enter a commit message. To
bypass the check (e.g. if there was a false positive), add the option
--no-verify
.
Code coverage metrics are provided using gcov and lcov. Ensure that you have installed lcov from http://ltp.sourceforge.net/coverage/lcov.php note for ubuntu lcov is available in the standard apt-get repos.
To get coverage metrics run the following script from the regression directory:
get_coverage.sh
This will:
- Rebuild CBMC with gcov enabled
- Run all the regression tests
- Collate the coverage metrics
- Provide an HTML report of the current coverage
CBMC uses clang-format to ensure that the formatting of new patches is readable and consistent. There are two main ways of running clang-format: remotely, and locally.
When patches are submitted to CBMC, they are automatically run through continuous integration (CI). One of the CI checks will run clang-format over the diff that your pull request introduces. If clang-format finds formatting issues at this point, the build will be failed, and a patch will be produced in the CI output that you can apply to your code so that it conforms to the style guidelines.
To apply the patch, copy and paste it into a local file (patch.txt
) and then
run:
patch -p1 -i patch.txt
Now, you can commit and push the formatting fixes.
To avoid waiting until you've made a PR to find formatting issues, you can install clang-format locally and run it against your code as you are working.
Different versions of clang-format have slightly different behaviors. CBMC uses clang-format-10 as it is available the repositories for Ubuntu 18.04 and Homebrew. To install on a Unix-like system, try installing using the system package manager:
apt-get install clang-format-10 # Run this on Ubuntu, Debian etc.
brew install clang-format@10 # Run this on a Mac with Homebrew installed
If your platform doesn't have a package for clang-format, you can download a pre-built binary, or compile clang-format yourself using the appropriate files from the LLVM Downloads page.
An installer for Windows (along with a Visual Studio plugin) can be found at the LLVM Snapshot Builds page.
Clang-format is distributed with a driver script called git-clang-format. This script can be used to format git diffs (rather than entire files).
After committing some code, it is recommended to run:
git-clang-format upstream/develop
Important: If your branch is based on a branch other than upstream/develop
,
use the name or checksum of that branch instead. It is strongly recommended to
rebase your work onto the tip of the branch it's based on before running
git-clang-format
in this way.
Note: By default, git-clang-format uses the git config variable
clangformat.binary
. If you have multiple versions of clang-format installed,
you might need to update this, or explicitly specify the binary to use via
--binary clang-format-10
.
If your works spans several commits and you'd like to keep the formatting correct in each individual commit, you can automatically rewrite the commits with correct formatting.
The following command will stop at each commit in the range and run clang-format on the diff at that point. This rewrites git history, so it's unsafe, and you should back up your branch before running this command:
git filter-branch --tree-filter 'git-clang-format upstream/develop' \
-- upstream/develop..HEAD
Important: upstream/develop
should be changed in both places in the
command above if your work is based on a different branch. It is strongly
recommended to rebase your work onto the tip of the branch it's based on before
running git-clang-format
in this way.