|
| |
| Version 0.11 ▪ Draft Feedback is warmly welcome | ||
Consistency is the last refuge of the unimaginative.
—Oscar WildeElegance is not an ornament worthy of man.
—Seneca
This document is supposed to enumerate a number of conventions aiming at consistency and elegance of C++ code. In general, there is no intention to focus on technical aspects of programming, but rather on look-and-feel issues.
Several well-known practices are put together, combined with personal feelings, preferences and thoughts.
By no means this should be considered as a teaching exercise.
And – to emphasize once and till the end of the document – sentences like "Don’t do...
"
should be taken just as brief form of something similar to
"in general, there is tendency to conclude that...
".
The document is neither academically polished, nor complete. Consider it as a partial memory dump with highlighting.
Before starting, let’s clarify overall predilections.
Should the coding style be universal and invariant, regardless of the circumstances? Or should it be adapted to the environment (operating system, framework, libraries in use, etc.)?
There is no simple answer (this statement applies to many other places, so let’s not repeat it anymore), but as a rule of thumb let’s decide in favor of customs. If the environment is well-designed, it is perhaps better to follow its spirit and guidelines.
Does such approach make general coding conventions rather irrelevant? Well, partly yes, but definitely not completely. In certain cases the goal is environment-independent development. And even if environment shapes code, it usually leaves a number of style-related questions open.
In any case, there should be a place for personal preferences. Formal and unconditional applying of (rather cosmetic) rules may seriously damage joy of coding.
Let’s use the term "logic
" to refer to semantic side of the code (roles of objects, their behavior, et al.),
"physics
" for its technical nature (variable types, the way parameters are passed, et al.), and
"physiologic
" for both together.
Nobody doubts that logic needs to be represented, but should it be accompanied with physics (this is especially prominent for naming discussions)?
Focus on logic. Do not overburden names with additional technical information.
Should the same structure be re-used in different contexts?
Well, sounds too fuzzy :-).
Let’s consider concrete cases.
Should {}-bracket policy be the same for classes, functions and control flow statements?
Should variables be named depending on their scope?
Should spacing rules be applied similarly to [], () and {}?
By default, homogeneity is advocated.
If the language itself uses the same lexemes (e.g. {}-brackets) to denote different types of blocks,
those brackets should come with the same formatting.
If the comma is used to separate items, it should be pre- or post-spaced in the same way, regardless of context of enumeration.
On the other hand, if the language uses the same lexemes for completely different things
(e.g. "<" for comparison and for template definition), the homogeneity rule does not apply :-).
Let’s see first what is in our arsenal.
| Notation | Example |
|---|---|
| Standard C / GNU notation | mouse_weight |
| Hungarian notation | iMouseWeight |
| Camel case (or lower Camel case) | mouseWeight |
| Pascal case (or upper Camel case) | MouseWeight |
| All capitals case | MOUSE_WEIGHT |
In addition, there are a number of different prefixes/suffixes which are often used to designate scope, constness or other attributes of a named object.
In this document, mostly Camel and Pascal cases are suggested, also all capitals case has its limited role.
GNU notation is not recommended, although it is widely used in the language itself and its standard libraries
(e.g. double, reinterpret_cast or basic_string).
Unfortunately it is hard to make a non-trivial coding conventions just around GNU notation,
and mixing it with Camel/Pascal cases looks eclectic.
So let's leave GNU notation to the language.
Of course, such approach introduces certain separation between the core and custom code,
and it might be considered as a drawback.
But even if so, it seems to be a smaller evil.
Hungarian notation is not included as it emphasizes types of variables needlessly (besides the fact that it also hardly fits to Camel/Pascal cases).
This is a common practice, despite the fact that for the standard C++ and STL types GNU notation is used
(see also the remark just above).
Examples: MainWindow, LinkedList.
C" (for "class")
If one does not use "f" for functions or "n" for namespaces, why should he denote classes in a special way?
Q" (for "Qt")It is perhaps better to avoid prefixes for identification that a class belongs to a certain library; consider using namespaces instead.
Of course, introducing a short prefix to designate elements of a library, like classes, constants or global functions, has certain advantages, especially for multi-functional toolkits or rich frameworks, which are designed to be used throughout in other projects. Moreover, if namespaces were not available, using library prefixes would be one of the best approaches.
However, if, as it is supposed to be, namespaces are already used to split things, such prefixes cause redundancy and some sort of non-normalized code structure. Also, general recommendations are expected to be widely reusable, while unique short prefixes would run out pretty soon.
Interface-suffix for interfaces
Although technically, in terms of C++, interfaces are a particular case of classes, semantically they are rather different.
Use the Interface suffix to emphasize it.
Up to some extent it can be considered as a compensation of a missing keyword interface, which languages like Java or C# have.
Interface-suffix
In case there is a standard, default implementation of an interface, give it the same name, but omit the suffix.
For instance, if there is a standard GUI class which implements ButtonInterface, call it Button.
Plural forms for classes are somewhat confusing, avoid them.
If a class represents a set, for instance, a collection of stamps, use a name like StampCollection instead of Stamps.
Please note that the meaning is
"use singular
",
but not
"Collection instead of plural Stampsuse singular
".
The singular form Stamp in StampCollection instead of plural StampsStamp is used in StampCollection just because of the English language,
StampšCollection would not sound right.
Another remark is that this recommendation is not intended to encourage the reader to use custom containers instead of STL or Boost ones by default.
If a class is a derived class, which belongs to a certain category,
but didn’t deserve a brand-new name, consider adding a name of class which originates the category.
For instance, if GreenTurtle class derived from SeaTurtle class
derived from Turtle class derived from Reptile class,
call it GreenTurtle, but not GreenSeaTurtle, neither GreenReptile.
T" or Type for typesTypes are somewhat similar to classes and should be treated alike.
Camel case for variables is both elegant and traditional.
Examples: fileName, i.
:: for global variables
It is important to emphasize the scope of variables.
Instead of naming global variables differently (e.g. Pascal case can be considered), the scope operator is recommended.
If a variable sits, for instance, within a namespace ns, use it with ns::.
Such approach does not introduce extra naming complexity and keeps "name" and "scope" as two different entities.
_" for member variables
Use "_" as a prefix of suffix of member variables.
This recommendation requires a longer discussion.
There are good reasons why member variables should be distinguishable from regular variables or function parameters. Firstly, the scope as such is an essential characteristic. Secondly, it is very common to name function parameters, regular variables and corresponding member variables similarly, so there is a need to distinguish them.
Appropriate thing would be a class-scope operator (similar to the global scope ::-operator).
However, there is no such thing in C++.
Instead, member variables can be referred using this->.
Similar practice is common in Java and C# to resolve ambiguity when a member variable has the same name as a regular variable or a function parameter.
But in case of C++ it is visually a bit cumbersome and uncommon. Another approach is to use a prefix or suffix.
A popular pattern suggests a prefix "m" or "m_".
However an abbreviation of the word "member
hardly finds a way to the heart, similarly to C for class or c for constant.
If we can avoid it in other cases, introducing such abbreviations for member variables does not sound right.
So a simple "_" is suggested, and it can be considered as a simulation of the missing class-scope-operator.
(Cannot resist to mention that a good symbol for such hypothetical operator would be ".", and "_" somewhat approximates it :-). )
So far so good, the "_"-prefix for member variables looks attractive.
But there is one thing to have in mind.
There are a number of recommendations not to start variable names with the underscore.
Is it too restrictive or not?
Indeed, variable names should not begin with "_ _" (double underscore) or "_#", where "#" is a capital letter.
Each name that contains a double underscore (_ _) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
—C++ Standard
However if a name starts with "_#" where "#" is a minuscule, it looks perfectly qualified for a member variable.
Still, if it is proven that "_" is an inappropriate prefix, use it as a suffix.
Use singular form for a singular object, e.g.
city = new York;
Use plural form for arrays, lists or other collections, e.g.
Polyhedron platonicSolids[5];
It is hard to come up with anything better than i, j, and k for general indexing purpose.
Use them.
Use c for character and x, y, z for coordinates.
Another practice is to use e for exceptions.
(A personal preference is to use x for exceptions, keeping e for events – OK, it is coming from C#.)
This recommendation applies only to "quite local" variables.
Similarly to class naming, consider using "category class name" as a suffix, e.g. cancelButton or titleLabel.
Avoid using names like btnCancel or lblTitle.
It is similar to (if not exactly) Hungarian notation.
This recommendation is especially applicable to GUI member variables.
Prefixes like p for pointers or r for references,
suffixes like Ptr for pointers et al.
are discouraged as they emphasize technical nature of implementation and obscure the logical side.
This also applies to instances of various pointer classes, e.g. shared_ptr or QPointer.
Examples:
Node& head; // not rHead
shared_ptr<Vertex> root; // not rootPtr
void swap(int* left, int* right); // not pLeft, pRight
It is disputable if constness is so essential characteristics that it requires to be prominently emphasized.
Basically, constants are, or at least should be considered as, unmodifiable variables.
Looking from this angle, the Camel case applies.
There is a tangible desire not to reflect constness in names, and in case it would be a must to give the answer,
it would be "do not distinguish
".
However, firstly, nobody seems to be forced to decide right now, and secondly, there is another question.
Even if there is no big need to emphasize constness in general, does it still make sense to denote the "terminal constants",
like Pi or DarkGray?
There is a feeling that it does.
However, since "terminal constants" are rather a semantical thing, it is hard to draw a formal line.
Let's leave the issues open for now, allowing both Camel and Pascal cases.
Enumeration is a particular type.
As for classes and typedef'ed types, Pascal case should be used.
Enumerators (the values within enumerations) are named constants. They should perhaps be treated alike.
It is a common situation when two or more enumerations have homonymous enumerators, e.g.
enum State |
|
enum Direction |
A way to resolve ambiguity could be to use State::Unknown and Direction::Unknown.
But, although some compilers allow using enumeration names as qualifiers (at least VC compiles with a warning),
it is beyond the C++ standard (C++0x is supposed to allow it, and even to require explicit scoping for type-safe enumerations).
Consider adding enumeration name, as follows:
enum State |
|
enum Direction |
It looks tempting to use enumeration names as prefixes. However, despite some tactical advantages of such approach, it introduces certain inconsistency; compare with recommendations Consider adding the principal class name to the variable name (above) and more general Use words in the normal language order below.
If an enumeration consists of non-combinable enumerators (not supposed to be used flag-alike), use a singular form, e.g.
enum Animal {QuickFox, LazyDog};
Otherwise, use plural form, e.g.
enum Styles {Visible, Enabled};
It is debatable if Camel or Pascal case fits better for function names, both practices are reasonable and widely used.
Since Camel case looks more elegant, let's stick to it.
Examples: getCount(), sleep().
:: for global functions
Similar to variables, use :: wherever appropriate.
As Ian Joyner wrote, "In pure OO languages, namespaces are not needed; classes themselves are namespaces.
".
Indeed, namespaces can be considered as some type of meta-classes,
intended mostly for grouping together classes, non-member functions and variables, et al.
which are related to each other.
It puts a namespace somewhat close to a particular case of a class,
which already has a similar grouping feature (though not across different files).
Therefore, the same notation as for classes is recommended.
A nickname is supposed to be short and light.
In practice, it is convinient to use one word or an abbreviation,
so Camel case is not distinguishable from GNU notation.
Still, this is Camel case :-).
Example:
namespace math = Science::Mathematics;
Preprocessor elements are considered mostly as legacy features and necessary evil.
They do not really belong to the language, and should be reduced to a reasonable minimum.
Capitalization is supposed to emphasize exceptional use of preprocessor, besides the fact that it is a traditional convention.
Underscores drastically improve readability. WATCH_QUICK_FOX is better than CATCHQUICKFOX.
Typical approach is to put class declaration in a separate .h file and class definition in a separate .c++ file.
(When classes represent GUI elements, it might be convenient to have a yet another file (like .ui file in case of Qt).)
It is logical and intuitive to name such files after the class name, keeping the same naming convention as for classes.
Motivation is, of course, to prevent confusions and to keep briefness and readability.
Capitalize abbreviations as if they would be regular words.
E.g. a variable can be called htmlText.
Alternatives like HTMLText are less readable (and can, as in this case, violate Camel case convention).
In general, classes, types, enumerations are to represent some categories; variables, constants, enumerators are certain objects within these categories.
In human languages, they are described by noun-based sentences.
Functions are usually to represent certain actions, and this is what verbs are for.
Be positive.
Prefer names as isDone() to names as isNotDone().
So as to have names readable and pronounceable naturally, use normal language forms (e.g. OakLeave, not LeaveOak).
Well, conceptually it would be better to use reverse naming, since it reflects logical order of specification: base first, derivatives next.
For instance, "Bambusa vulgaris
" would be somewhat better than "common bamboo
".
But maybe it is too much.
(I have to change my habits of using tabs :-).)
This seems to be good for readability and not too much for nested constructions. In the dispute "3 vs 4", 4 is advocated due to fundamental practice to prefer powers of 2.
EOL, not EOF
Having last line ended with EOF is slightly inconsistent
and could be less convenient for some operations, like copy-pasting of the entire file contents.
Trailing whitespaces definitely do not improve readability and violate normalization. This applies to whitespaces at the line ends and to empty lines at the file end as well.
There are some recommendations to limit line lengths to 80 characters or so to prevent breaking lines while printing.
The concrete numbers are questionable, but in general it is a good practice to have line lengths limited adequately to the environment. Visual comparison of two versions of a file might be a good sample when shorter lines fit better.
{}-brackets policyLet's list a few options.
| A | B | C (NOK) | D |
|---|---|---|---|
●●●● |
●●●● { |
●●●● |
●●●● |
Options A and B are the most popular.
Advantages of the first one are that it is more readable and also it makes it easier to move entire blocks (which are "set of sequential lines", not "a bracket at the end" + "set of sequential lines").
The second one saves some screen space.
Option C is not recommended, because it introduces unnecessary indention.
Option D seems to combine advantages of A and B, but looks unusual.
Decide for yourself and use consistently for all structures (classes, functions, control statements, et al.).
()-brackets policySome options, not necesserily mutually exclusive, are enumerated below.
| A | B | C | D |
|---|---|---|---|
●● = ●●●●(●, ●, ●);
|
●● = ●●●●(●, |
●● = ●●●●( |
●● = ●●●● |
Variant A is most appropriate for "short lines", B is not much more than breaking A in several lines and is OK as far as line breaking in general is OK.
In case of multiple and/or long-named parameters, variant C can provides readable alternative to B, feel free to use it.
Similarly to {}-brackets, variant D could be used.
In fact, using the variant D for ()-brackets in combination with the variant C for {}-brackets,
provides a very consistent code structure. But, once again, uncommon :-).
{}-brackets for single statements when applicable
Although some sources recommend using {}-brackets always, this practice seems to blow up code without giving considerable benefits.
else or catch on the same line as }-bracket
Having if-else or try-catch pairs misaligned looks unbalanced and can be disturbing for readers.
Readability is the goal. In general, "natural language" rules are advocated, including:
,", ";";EOL) after.Do not add extra spaces after dots, they should be considered as separators rather than regular punctuation signs.
Table style basically means using extra spaces to introduce vertical alignment, like in the following case:
int i = 0; // some comment
double d = 0.0; // another comment
Table style is acceptable for pragmatic readability reasons. Though in theory it is perhaps wrong similarly to the following construction
printf("\
+=================+\n\
| t a b l e |\n\
+=================+\n");
which mixes code-centric and data-centric formatting.
Be aware about auto-formatting feature of source code editors. It may not appreciate the table style.
(-bracket and before )-bracket[-bracket and before ]-bracket
It is not absolutely clear and perhaps subjective if a space after (-bracket and [-bracket and
before )-bracket and ]-bracket makes the code easier to read or vice versa.
Due to lack of means to measure readability objectively, to find a "statistically better" way, let’s leave the choice to the reader. If you cannot define a clear winner, then it is better to omit spaces. Shorter code is better, if equally readable. This is also in line with the using of the "natural language" recommendation.
There is a common practice to add a space between if, for, etc. and the following opening bracket.
Tradition to be respected.
Though, answering the usual argument that "this is done to distinguish between operators and functions
",
I would ask to show me a person for whom such distinction is not obvious :-).
(And please allow absence of such spaces in my code, let me keep my accent :-).)
*" and "&" together with type
Use constructions like int* x instead of int *x (and ditto for reference &).
Of course, prevent confusing constructions like int* x, y.
This is a common practice. It simplifies using header file as some form of a class documentation. Regular users of the class interested mostly in the public section, and those who derive from the class, in public and protected.
Using headers in this way is not very conceptual perhaps, but quite convenient.
Static class members have different nature and belong to the class itself, not to instances of the class. It is useful to emphasize it by placing them in front of non-static members.
It might be reasonable to do for simplifying browsing of source files and for the sake of consistency.
It is common practice to start C++ file with its own header.
This way helps to find if the header contains all necessary include directives and declarations.
A side advantage is that such #include can be used as a "tag",
making it easy to guess the file name when reading (e.g. printed) code.
Split the #include sequence into groups (e.g. framework includes, system includes).
Within each group, sort includes alphabetically.
The typical order looks as follows:
However to be consistent with the "own header first" recommendation, one may perhaps also consider the following unusual order:
virtual keyword for overriden functionsUnfortunately, the C++ language does not ensure that overridden functions are marked explicitly (as e.g. C# does). This could result in funny bugs, for instance if a maintainer of the base class adds a virtual function which is homonymous to one in a derived class.
Although using the keyword virtual explicitly for implicitly virtual functions does not really solve problems,
it at least makes things more visible.
throw() function attributethrow() attribute
Maybe the recommendation should be formulated as
"do not use
".
It gives excellent explanation of the subject.
throw() attribute before reading the article of Herb Sutter
/*virtual*/ or /*static*/It might look appropriate to have a comment if a function is virtual or static next to the function definition. However it makes code more cumbersome, and in addiion brings the risk of desynchonisation.
Embedding assignments, increments and decrements within other operations can be illustrated with the following samples.
if((c = a[i]) >= '0' && c <= '9') // embedded assignment
d = c - '0';
while(--count > 0) // embedded decrement
a[i++] = b[j++]; // embedded increments
Although such patterns might look quite usual, in many (if not in most of all) cases it is better to avoid them. No doubt, it is tempting to exploit a fact that the assignment operator, increment and decrement return a value. It can make code shorter and may look somewhat equilibristic. However such embedding is more technical than algorithmical, tends to bring unneeded complexity and decreases maintainability.
Still, there are some situations when embedding might be helpful. Chain assignment is one of such cases. There is nothing wrong in constructions like the following:
x = y = z = 0;
Quite often, include guards are constructed based on the class name, for instance:
#ifndef MATRIX_H
#define MATRIX_H // hardly unique name
class Matrix
{
//....
}
#endif
In could easily lead to a conflict if two classes (maybe within different libraries) have the same name.
One alternative is to add a GUID to the define, to get something similar to
MATRIX_H_E4865904_C72C_4975_A445_30D1FAAB7546.
This is reliable, but bulky.
Another way is to include a name of the namespace to which the class belongs to,
or, if the class belongs to a nested namespace, names of the entire namespace hierarchy.
For instance, if the Matrix class belongs to the namespace Math::Algebra,
the guarding define can be called MATH_ALGEBRA_MATRIX.
It brings a good balance between readability and unicity.
Casting in C++ can be considered as goto in C.
Try to avoid.
And if you have to use it, make an appropriate cumbersome _cast construction, not ()-casting.
NULL instead of 0
This issue has a lot of controversy.
From several arguments, let’s choose the following.
C++0x is going to introduce nullptr, ensuring that null-pointer and 0 are conceptually different;
and using NULL seriously helps in find-and-replace procedure.
C++ Programming Style Guidelines
http://geosoft.no/development/cppstyle.html
C++ Coding Standard
http://www.possibility.com/Cpp/CppCodingStandard.html
C++0x in Wikipedia
http://en.wikipedia.org/wiki/C%2B%2B0x
Design Guidelines for Developing Class Libraries
http://msdn.microsoft.com/en-us/library/ms229042.aspx
(This document refers to C#, but many recommendations can be applied to C++ as well.)
Ian Joyner. C++?? : A Critique of C++
http://burks.bton.ac.uk/burks/pcinfo/progdocs/cppcrit/
Herb Sutter. A Pragmatic Look at Exception Specifications
http://www.gotw.ca/publications/mill22.htm