On C++
Having programmed in C++ professionally for well over 10 years I have
learned all of it. I have all the books, I know all the tricks. And I
don’t like it anymore.
Update: This intro apparently made some people see red, because "no man could possibly know all of C++". If that includes you, you can read it as: "I've shipped 4 AAA games on MLOC code bases, and here's my take on the C++ abstractions you can reasonably use in projects that big".
Basically game teams using C++ fall into the same trap every time:
they try to create abstractions with whatever is in the C++ toolbox
and they fail miserably. On the next project they’re a little bit
smarter from the experience so they set out to fix their abstractions
and create new ones. And fail.
Quickly going over the major abstraction mechanisms C++ introduced
over C I’m arguing that:
-
Templates suck as they cause link-time spam and compile times to
skyrocket. They severely bloat the size of the debug symbol file,
which on large projects can easily reach several hundred megabytes
of data. Abstractions built with templates perform differently
depending on whether compiler optimizations are enabled or not
(every tried debugging Boost code?). They’re essentially unusable on
large code bases beyond container-of-T and simple functions.
-
RTTI sucks because it doesn’t do what anyone wants, and you can’t
even rely on it returning a type name formatted in a certain way.
-
Classes suck because their guts have to be in headers for all to see.
All these high-level concepts are flawed and you can’t alter their
semantics because they’re set in ISO stone. What all teams do then is
to reinvent all the language components that don’t work for them, and
sets up rules forbidding the use of the other features. Every C++ shop
has its own "accepted" subset.
Basically the C++ game developer community is slowly navigating away
from C++'s abstraction patterns. We left operator overloading mostly
in the 90’s (some vector libraries still use it). We ditched RTTI back
in 2001. Exceptions are firmly off as they don’t even work on all
platforms we develop for. A lot of people are advocating that we stop
using member functions to reduce coupling.
This may sound harsh, but to me these are clear signs that C++ isn’t
providing any real cost benefit for us, and that we should be writing
code in other ways.
Coding C
Many C++ game programmers have started to turn towards C (or C-like
C++) to get away from the flaws of C++. Targeting C manually with
larger systems can be a lot of work, because it offers very basic
abstraction facilities. There are functions, enums, tagged types
(structs and unions) and a rudimentary type system, but that’s about
it.
But let’s look at C as a platform for a minute. C is lean, compiles
super fast, it’s supported everywhere and all the tools we need to
ship games such as compilers and debuggers (including the obscure and
proprietary) work well with it. If you need platform-specific
intrinsics to get on with your job, you can rely on the target
platform’s C compiler to provide them.
As C is such a simple, predictable language that works everywhere it
makes a lot of sense to generate C code. Indeed many projects have
done so, but typically the meat of the application has still been
written in plain C as code generation is typically used for language
interfaces or parser generators.
The Lisp Way
Having also programmed a lot of Common Lisp over the years, I’ve seen
how the Lisp family of languages deals with extensibility. In Lisp,
you write your own abstractions that become a part of the project’s
language. This remarkable feature is enabled basically through two
simple things:
-
Programs can be treated as data (because they can be thought of as
parse trees)
-
There are macros which transform such data (that is, your programs)
into other programs (implementing the abstractions).
I’m going to suggest something mildly radical: we should prefer C over
C++. But not straight-up C. We should create our own C with the
abstractions we need, built right into the language, customized for
the problems we’re working on.
Amplification
We can apply many of the ideas that make Lisp powerful to C if we drop
its Algol-like syntax. What is the difference between the following
two program fragments?
int my_function(int a, int b) {
return a + b;
}
|
(defun my-function ((a int) (b int) (return int))
(return (+ a b))
|
Answer: none, they are equivalent as far as semantics go. The latter
can trivially be parsed (using very simple rules) and transformed into
the former, and so it is still the same C program. This is good news,
because Lisp has shown us that if we represent programs as data, we
can transform that data arbitrarily before evaluating or compiling it.
In my prototype system, c-amplify, I’m doing exactly this. The system
introduces an "amplification" phase where s-expressions are
transformed to C code before a traditional build system runs.
The c-amplify system has the following major parts:
-
A system definition facility (specifying input files and
dependencies)
-
A reader, parsing ca source files
-
A persistent function and type database which is updated as source
files are amplified.
-
A pretty-printing C code generator — important as we’re going to be
debugging the generated code.
The system is intended to be run incrementally as source files are
changed, re-reading changed files, updating the database and writing
generated output. A traditional build system can then be used to the
resulting files.
The persistent database is an interesting component that isn’t
strictly needed for the system but enables a lot of neat features:
-
Type inference. Because all functions and types are known, c-amplify
can easily supply a type-of operator for arbitrary expressions. This
can be used as the basis for type inferring macros similar to auto in
C++0x or var in C#.
-
Hook functions could be installed that run over the database and do
additional work. For example, instrumenting all writes to a
particular struct field, generating reflection info or performing
project-specific checks on how types or functions are used. The
possibilities are pretty much endless. Remember all those times
you’ve thought: if we could only access this thing in the compiler
we could give an error message if the code does this thing? Well,
with hooks you could.
RAII, uncluttered
Let’s look at one such macro that solves a real problem: making sure a
file handle is closed in an orderly manner, even when there are
multiple exit points from the block.
In situations like this, C++ fans can’t wait to tell you about
RAII. RAII means creating a stack object of some utility type that
performs resource cleanup in its destructor. If we look at the
AutoFile type we need to type up to implement RAII we find that
exactly one line is providing the abstraction we need (the
destructor), the rest is boilerplate:
class AutoFile // auxillary type
{
private:
FILE* f;
public:
AutoFile(const char* fn, const char* mode) {
f = fopen(fn, mode);
}
~AutoFile() { if (f) fclose(f); }
operator FILE*() { return f; }
private:
AutoFile(const AutoFile&);
AutoFile& operator=(const AutoFile&);
};
// later, that same day..
{
AutoFile file("c:/temp/foo.txt", "w");
fprintf(file, "Hello, world");
}
|
What Lisp programmers do to manage resources is to create
block-wrapping macros (usually starting with the word with-). The
macros sits around a body of code, indicating visually that the code
wrapped by the macro will have access to some resource. The macro
expansion is guaranteed to clean up the resource regardless of how the
block terminates. Here’s an example of using such a macro with
c-amplify:
(with-open-file (f "c:/temp/foo.txt" "w")
(fprintf f "Hello, world"))
|
If we ask c-amplify to macro-expand this we see that the details of
calling fopen and fclose are being handled as if we had written
out everything by hand:
{
FILE* f = (FILE *) 0;
{
f = fopen("c:/temp/foo.txt", "w");
fprintf(f, "Hello, world");
}
cleanup_8_:
if (file) {
fclose(file);
}
}
|
Even if we add more complex code with multiple return paths, c-amplify
doesn’t let us down:
(defun foo ((return int))
(with-open-file (f "c:/temp/foo.txt" "w")
(if (> (rand 10) 5)
(return 20))
(fprintf f "Hello, world")
(return 10)))
|
This amplifies to the following C code. Note how the with-open-file
block locally redefines what it means to return a value. This C code
is a close representation of what a C++ compiler has to emit when RAII
is used but as before there are no residual types left.
int foo(void)
{
FILE* f = (FILE *) 0;
{
int result_13_;
{
f = fopen("c:/temp/foo.txt", "w");
if (rand(10) > 5) {
result_13_ = 20;
goto cleanup_12_;
}
fprintf(f, "Hello, world");
{
result_13_ = 10;
goto cleanup_12_;
}
}
cleanup_12_:
if (f) {
fclose(f);
}
return result_13_;
}
}
|
One possible c-amplify implementation of the with-open-file macro
looks like this (on a real game project it would of course not use
fopen, but some custom file manager):
(def-c-macro with-open-file ((var file-name mode) &body body)
`(progn
(declare (,var = (cast (ptr #$FILE) 0)))
(unwind-protect
(progn
(= ,var (#$fopen ,file-name ,mode))
,@body)
(when ,var
(#$fclose ,var)))))
|
The funny #$foo syntax is just a reader macro to facilitate reading
case sensitive symbols in a special package which corresponds to the C
namespace. The implementation piggy-backs on unwind-protect, which
makes sure that the body code always goes through the cleanup clauses:
(def-c-macro unwind-protect (form &body cleanup-forms)
(with-c-gensyms (cleanup result)
`(progn
(ast-stmt-if (not (current-defun-void-p))
(declare (,result *current-return-type*)))
(macrolet (return (&optional expr)
`(progn
(ast-stmt
(if (not (current-defun-void-p))
`(= ,',',result ,,expr)
`(cast void ,,expr)))
(goto ,',cleanup)))
,form)
(label ,cleanup)
,@cleanup-forms
(return ,result))))
|
If we decided to add exception handling (through e.g. setjmp or SEH
exceptions), we only have to touch unwind-protect to enable
exception cleanup in all our RAII-like resource macros. Layering pure
compile-time abstractions like this to create programs is incredibly
powerful.
Exploiting the database
As I mentioned, there are many advantages to having all your code
parsed with type information sitting around in an in-memory
database. Let’s highlight one such thing, automatic type inference
for local variables. Consider the following c-amplify input:
(defstruct bar
(x (const restrict volatile ptr int)))
(defstruct foo
(barp (ptr struct bar)))
(defun my-function ((foop (ptr struct foo)) (return int))
(let ((xp (-> foop barp x)))
(return (* xp))))
|
This amplifies to:
struct bar { int * const volatile restrict x; };
struct foo { struct bar * barp; };
int my_function(struct foo * foop) {
int * const volatile restrict xp = foop->barp->x;
return *xp;
}
|
We can see that the lexical variable xp has the expected type. The
c-amplify system knows how to compute the type of any C expression
(including arithmetic promotion) so the this feature can be used
extensively if desired.
Including what’s needed
Another great feature of having a complete function/type database is
that generated C files do not need include statements. If a source
file needs a bunch of declarations the generator will just emit them
right there. There’s no need to maintain header files. If code in a
generated file starts using a structure all of a sudden, a copy of its
declaration will automatically pop in to the generated file.
For a full-out implementation of this idea to work, 3rd party
declarations from the OS and C libraries must be imported into the
amplification database. A separate tool must be devised for this but
it would certainly be possible.
Compiling files generated like this would mean the the preprocessor
wouldn’t touch disk except to read the input c file. Makefiles for
generated files such as these will also be trivial to write as there
are no implicit dependencies.
Further ideas
Here are additional ideas that could be explored within the c-amplify
system:
-
Improved compile-time checking for traditionally dangerous functions
(scanf, printf) (make macros that evaluate the format strings
and types of arguments)
-
Add exception handling to C on top of setjmp, SEH or some other
basic mechanism
-
Annotate structures for real-time tweaking.
-
Generate script language bindings at compile time via macros and
hooks.
-
Inlining/code simplification at amplification time (trig function
simplification, maths)
-
Add a proper sublanguage for vector math. Finally you can write that
vector math library that combines plus and multiply to madd on
altivec by analyzing the code at compile time, and you don’t need
3000 lines of C++ "expression templates" to do it
-
If you absolutely must have C++-like classes and templates, you
could implement those too. Classes with single dispatch would be
pretty easy (generate a couple of structs per class), and templates
could be "done right" in the sense that you’d only generate a single
expansion for each instantiated type and dump them all to a single
source file, rather than compiling thousands of instantiations of
std::vector and letting the linker sort through the carnage.
Conclusion
If there is a way (no matter how much work it would be) to express the
semantics of an abstraction in C, chances are you can implement it as
a set of macros and hooks in c-amplify.
However, c-amplify is still a prototype and a lot of work remains
before it might be suitable for production use. I hope this rant has
given you some new ideas on how we design programs. Send your feedback
and flames my way.