Friday, January 22, 2010

Amplifying C

On C++

Having programmed in C++ professionally for well over 10 years I have learned all of it. I have all the books, I know all the tricks. And I don’t like it anymore.

Update: This intro apparently made some people see red, because "no man could possibly know all of C++". If that includes you, you can read it as: "I've shipped 4 AAA games on MLOC code bases, and here's my take on the C++ abstractions you can reasonably use in projects that big".

Basically game teams using C++ fall into the same trap every time: they try to create abstractions with whatever is in the C++ toolbox and they fail miserably. On the next project they’re a little bit smarter from the experience so they set out to fix their abstractions and create new ones. And fail.

Quickly going over the major abstraction mechanisms C++ introduced over C I’m arguing that:

  • Templates suck as they cause link-time spam and compile times to skyrocket. They severely bloat the size of the debug symbol file, which on large projects can easily reach several hundred megabytes of data. Abstractions built with templates perform differently depending on whether compiler optimizations are enabled or not (every tried debugging Boost code?). They’re essentially unusable on large code bases beyond container-of-T and simple functions.

  • RTTI sucks because it doesn’t do what anyone wants, and you can’t even rely on it returning a type name formatted in a certain way.

  • Classes suck because their guts have to be in headers for all to see.

All these high-level concepts are flawed and you can’t alter their semantics because they’re set in ISO stone. What all teams do then is to reinvent all the language components that don’t work for them, and sets up rules forbidding the use of the other features. Every C++ shop has its own "accepted" subset.

Basically the C++ game developer community is slowly navigating away from C++'s abstraction patterns. We left operator overloading mostly in the 90’s (some vector libraries still use it). We ditched RTTI back in 2001. Exceptions are firmly off as they don’t even work on all platforms we develop for. A lot of people are advocating that we stop using member functions to reduce coupling.

This may sound harsh, but to me these are clear signs that C++ isn’t providing any real cost benefit for us, and that we should be writing code in other ways.

Coding C

Many C++ game programmers have started to turn towards C (or C-like C++) to get away from the flaws of C++. Targeting C manually with larger systems can be a lot of work, because it offers very basic abstraction facilities. There are functions, enums, tagged types (structs and unions) and a rudimentary type system, but that’s about it.

But let’s look at C as a platform for a minute. C is lean, compiles super fast, it’s supported everywhere and all the tools we need to ship games such as compilers and debuggers (including the obscure and proprietary) work well with it. If you need platform-specific intrinsics to get on with your job, you can rely on the target platform’s C compiler to provide them.

As C is such a simple, predictable language that works everywhere it makes a lot of sense to generate C code. Indeed many projects have done so, but typically the meat of the application has still been written in plain C as code generation is typically used for language interfaces or parser generators.

The Lisp Way

Having also programmed a lot of Common Lisp over the years, I’ve seen how the Lisp family of languages deals with extensibility. In Lisp, you write your own abstractions that become a part of the project’s language. This remarkable feature is enabled basically through two simple things:

  • Programs can be treated as data (because they can be thought of as parse trees)

  • There are macros which transform such data (that is, your programs) into other programs (implementing the abstractions).

I’m going to suggest something mildly radical: we should prefer C over C++. But not straight-up C. We should create our own C with the abstractions we need, built right into the language, customized for the problems we’re working on.

Amplification

We can apply many of the ideas that make Lisp powerful to C if we drop its Algol-like syntax. What is the difference between the following two program fragments?

int my_function(int a, int b) {
    return a + b;
}
(defun my-function ((a int) (b int) (return int))
  (return (+ a b))

Answer: none, they are equivalent as far as semantics go. The latter can trivially be parsed (using very simple rules) and transformed into the former, and so it is still the same C program. This is good news, because Lisp has shown us that if we represent programs as data, we can transform that data arbitrarily before evaluating or compiling it.

In my prototype system, c-amplify, I’m doing exactly this. The system introduces an "amplification" phase where s-expressions are transformed to C code before a traditional build system runs.

The c-amplify system has the following major parts:

  • A system definition facility (specifying input files and dependencies)

  • A reader, parsing ca source files

  • A persistent function and type database which is updated as source files are amplified.

  • A pretty-printing C code generator — important as we’re going to be debugging the generated code.

The system is intended to be run incrementally as source files are changed, re-reading changed files, updating the database and writing generated output. A traditional build system can then be used to the resulting files.

The persistent database is an interesting component that isn’t strictly needed for the system but enables a lot of neat features:

  • Type inference. Because all functions and types are known, c-amplify can easily supply a type-of operator for arbitrary expressions. This can be used as the basis for type inferring macros similar to auto in C++0x or var in C#.

  • Hook functions could be installed that run over the database and do additional work. For example, instrumenting all writes to a particular struct field, generating reflection info or performing project-specific checks on how types or functions are used. The possibilities are pretty much endless. Remember all those times you’ve thought: if we could only access this thing in the compiler we could give an error message if the code does this thing? Well, with hooks you could.

RAII, uncluttered

Let’s look at one such macro that solves a real problem: making sure a file handle is closed in an orderly manner, even when there are multiple exit points from the block.

In situations like this, C++ fans can’t wait to tell you about RAII. RAII means creating a stack object of some utility type that performs resource cleanup in its destructor. If we look at the AutoFile type we need to type up to implement RAII we find that exactly one line is providing the abstraction we need (the destructor), the rest is boilerplate:

class AutoFile // auxillary type
{
  private:
    FILE* f;

  public:
    AutoFile(const char* fn, const char* mode) {
      f = fopen(fn, mode);
    }
    ~AutoFile() { if (f) fclose(f); }
    operator FILE*() { return f; }

  private:
    AutoFile(const AutoFile&);
    AutoFile& operator=(const AutoFile&);
};

// later, that same day..
{
   AutoFile file("c:/temp/foo.txt", "w");
   fprintf(file, "Hello, world");
}

What Lisp programmers do to manage resources is to create block-wrapping macros (usually starting with the word with-). The macros sits around a body of code, indicating visually that the code wrapped by the macro will have access to some resource. The macro expansion is guaranteed to clean up the resource regardless of how the block terminates. Here’s an example of using such a macro with c-amplify:

(with-open-file (f "c:/temp/foo.txt" "w")
  (fprintf f "Hello, world"))

If we ask c-amplify to macro-expand this we see that the details of calling fopen and fclose are being handled as if we had written out everything by hand:

{
  FILE* f = (FILE *) 0;
  {
    f = fopen("c:/temp/foo.txt", "w");
    fprintf(f, "Hello, world");
  }
cleanup_8_:
  if (file) {
    fclose(file);
  }
}

Even if we add more complex code with multiple return paths, c-amplify doesn’t let us down:

(defun foo ((return int))
  (with-open-file (f "c:/temp/foo.txt" "w")
    (if (> (rand 10) 5)
       (return 20))
    (fprintf f "Hello, world")
    (return 10)))

This amplifies to the following C code. Note how the with-open-file block locally redefines what it means to return a value. This C code is a close representation of what a C++ compiler has to emit when RAII is used but as before there are no residual types left.

int foo(void)
{
  FILE* f = (FILE *) 0;
  {
    int result_13_;
    {
      f = fopen("c:/temp/foo.txt", "w");
      if (rand(10) > 5) {
        result_13_ = 20;
        goto cleanup_12_;
      }
      fprintf(f, "Hello, world");
      {
        result_13_ = 10;
        goto cleanup_12_;
      }
    }
    cleanup_12_:
    if (f) {
      fclose(f);
    }
    return result_13_;
  }
}

One possible c-amplify implementation of the with-open-file macro looks like this (on a real game project it would of course not use fopen, but some custom file manager):

(def-c-macro with-open-file ((var file-name mode) &body body)
  `(progn
     (declare (,var = (cast (ptr #$FILE) 0)))
     (unwind-protect
          (progn
            (= ,var (#$fopen ,file-name ,mode))
            ,@body)
       (when ,var
         (#$fclose ,var)))))

The funny #$foo syntax is just a reader macro to facilitate reading case sensitive symbols in a special package which corresponds to the C namespace. The implementation piggy-backs on unwind-protect, which makes sure that the body code always goes through the cleanup clauses:

(def-c-macro unwind-protect (form &body cleanup-forms)
  (with-c-gensyms (cleanup result)
    `(progn
       (ast-stmt-if (not (current-defun-void-p))
                    (declare (,result *current-return-type*)))
       (macrolet (return (&optional expr)
                         `(progn
                            (ast-stmt
                             (if (not (current-defun-void-p))
                                 `(= ,',',result ,,expr)
                                 `(cast void ,,expr)))
                            (goto ,',cleanup)))
         ,form)
       (label ,cleanup)
       ,@cleanup-forms
       (return ,result))))

If we decided to add exception handling (through e.g. setjmp or SEH exceptions), we only have to touch unwind-protect to enable exception cleanup in all our RAII-like resource macros. Layering pure compile-time abstractions like this to create programs is incredibly powerful.

Exploiting the database

As I mentioned, there are many advantages to having all your code parsed with type information sitting around in an in-memory database. Let’s highlight one such thing, automatic type inference for local variables. Consider the following c-amplify input:

(defstruct bar
  (x (const restrict volatile ptr int)))

(defstruct foo
  (barp (ptr struct bar)))

(defun my-function ((foop (ptr struct foo)) (return int))
  (let ((xp (-> foop barp x)))
     (return (* xp))))

This amplifies to:

struct bar { int * const volatile restrict x; };

struct foo { struct bar * barp; };

int my_function(struct foo * foop) {
  int * const volatile restrict xp = foop->barp->x;
  return *xp;
}

We can see that the lexical variable xp has the expected type. The c-amplify system knows how to compute the type of any C expression (including arithmetic promotion) so the this feature can be used extensively if desired.

Including what’s needed

Another great feature of having a complete function/type database is that generated C files do not need include statements. If a source file needs a bunch of declarations the generator will just emit them right there. There’s no need to maintain header files. If code in a generated file starts using a structure all of a sudden, a copy of its declaration will automatically pop in to the generated file.

For a full-out implementation of this idea to work, 3rd party declarations from the OS and C libraries must be imported into the amplification database. A separate tool must be devised for this but it would certainly be possible.

Compiling files generated like this would mean the the preprocessor wouldn’t touch disk except to read the input c file. Makefiles for generated files such as these will also be trivial to write as there are no implicit dependencies.

Further ideas

Here are additional ideas that could be explored within the c-amplify system:

  • Improved compile-time checking for traditionally dangerous functions (scanf, printf) (make macros that evaluate the format strings and types of arguments)

  • Add exception handling to C on top of setjmp, SEH or some other basic mechanism

  • Annotate structures for real-time tweaking.

  • Generate script language bindings at compile time via macros and hooks.

  • Inlining/code simplification at amplification time (trig function simplification, maths)

  • Add a proper sublanguage for vector math. Finally you can write that vector math library that combines plus and multiply to madd on altivec by analyzing the code at compile time, and you don’t need 3000 lines of C++ "expression templates" to do it

  • If you absolutely must have C++-like classes and templates, you could implement those too. Classes with single dispatch would be pretty easy (generate a couple of structs per class), and templates could be "done right" in the sense that you’d only generate a single expansion for each instantiated type and dump them all to a single source file, rather than compiling thousands of instantiations of std::vector and letting the linker sort through the carnage.

Conclusion

If there is a way (no matter how much work it would be) to express the semantics of an abstraction in C, chances are you can implement it as a set of macros and hooks in c-amplify.

However, c-amplify is still a prototype and a lot of work remains before it might be suitable for production use. I hope this rant has given you some new ideas on how we design programs. Send your feedback and flames my way.

37 comments:

repi said...

Great post! Look forward to hearing more about it both here and at work :)

Jonas said...

Goose bumps :)

p10q said...

That sounds really awesome. I'd recommend you integrate something like ccache into to your system as well (so that compile times are very fast).

Also, from a debugging standpoint you might think about how to link (maybe just by how the c code is named) the c-amplify lisp code and the generated c code. Ideally, it'd be awesome to step through both code at the same time...

That persistent database is a really good idea though. Like you say, opens up so many possibilities.

I noticed you mention gvim on this blog. With that database, you could treat the c-amplify code as basically 'folded' c code (accessing it via a client to the persistent database). That would be pretty awesome...

Anyway, cheers.

EvanM said...

This is some extremely interesting work. I am an experienced C++ programmer, and I share your pains, but some of the things you can do with templates are just too awesome to pass up. I've been loving boost::variant these days, but dear God has it wrecked my compile times! It seems like something equivalent could be implemented with c-amplify relatively painlessly, and without the massive compile times associated with C++ templates. Plus, the automatic header generation would be just delightful.

Any chance that you'll ever open-source your prototype?

David Reynolds said...

This is really cool. I've been doing a lot of work in C lately and I'm not really a fan of C++. Working in a language like lisp to generate C code is great. I'm looking forward to seeing where you take this.

Anonymous said...

How is this different than bigloo?

http://www.bigloo.org

malkia said...

fellow game developer here...

Have you ever used PowerLoom?

To implement PowerLoom we developed a new programming language called STELLA, which is a Strongly Typed, Lisp-like LAnguage that can be translated into Lisp, C++ and Java. PowerLoom is written in STELLA and therefore available in Common-Lisp, C++ and Java versions.

Christopher Armstrong said...

I know people who have prototyped pretty much identical systems and I've worked on one myself. One of the ones I know already had the exceptions implemented with setjmp/longjmp. Though your type inference thing is impressive.

But we really need someone to release and maintain a high-quality implementation of the idea! (hint hint) :)

Colin Curtin said...

typo: for for

This is great, and I'm looking forward to how you go about this. This would not only be awesome for games.

Scott said...

Oracle's in-database Java VM implementation is based on exactly this idea. Lisp is a Gatling Gun for writing C.

WebDevHobo said...

It is a sad thing that people can't get past their love for C++ and see that there is such a thing as a "paradigm".

Every problem has a best way to get to the answer. And that's not always included in the same programming language.

But... y'know... IT'S C++ IT CAN DO ANYTHING AND WINDOWS SUCKS!!!
the argument usually goes.

dep said...

Thanks for all the comments.

@Anonymous: This is different from Bigloo scheme in that it is a way to write fully typed C code, rather than a way to translate untyped Scheme to C. I think of it a translation device with macro capabilities rather than a compiler.

@malkia: I hadn't heard about PowerLoom and STELLA before, I'll have to read up!

Jyaif said...

"stop using member functions to reduce coupling"
What what what?

Ali Sabil said...

Couldn't agree more, I gave up on c++ few years ago, and I did it for the same reasons you are outlining in your post. Nowadays I program in Vala, which is in my opinion what c++ should have been.

patrickdlogan said...

You might want to look at Gambit C-Scheme. There are people writing games in it as well, including for the iPhone. It can get fairly compact: iphone, also someone ported it to the Nintendo DS handheld.

eryx said...

Look at SC
This is expandable C in lisp with lisp syntax where you can use lisp style
macroses and define your own constructs.

It help me a lot in generating my C projects.

Example of function in SC:

(def (fib n) (fn int int)
(if (<= n 2)
(return 1)
(return (+ (fib (- n 2))
(fib (- n 1))))))

=>

int fib (int n)
{
if (n <= 2)
return 1;
else
return fib (n - 2) + fib (n - 1);
}

Structure declaration in SC:

((struct list)
(data int)
(next (ptr (struct list))))

Example of macro in SC:

(%defmacro ntimes (n &body form)
`(for ((def i int 0) (< i ,n) (inc i)) ,@form))

dimatura said...

Check out Lush http://lush.sourceforge.net

Anonymous said...

Or you could use:

shared_ptr f(fopen(fn,mode),fclose);

From tr1 or boost.

Maybe Attending said...
This comment has been removed by the author.
Johan Kotlinski said...

Tried to post yesterday, but guess it fell out. Anyway one more try:

"Classes suck because their guts have to be in headers for all to see."

How do you think Lisp is an improvement over C++ in this regard? In my experience Lisp is more like "everyone sees everything," no real support for information hiding besides wrapping stuff inside functions. Or do you think more of compile and link times?

eryx said...

> Johan Kotlinski said...
> How do you think Lisp is an improvement
> over C++ in this regard?
> In my experience Lisp is more like
> "everyone sees everything,"
> no real support for information hiding
> besides wrapping stuff inside functions.

You can use packages, they have protected(internal) and public(exported) symbols.

For CLOS classes and generics with 50
lines of MOP you can add private slots/methods, "friend" slots/methods
or any other sort of protection you like.

But nobody really need it as I can see.

temoto said...

Great job! The most important part of your rant is constructive one: "C++ is flawed *and i propose this solution...*".

What is especially interesting about your project is types database. On the first look, it looks very promising. But it is unclear how are you going to deal with multiple same names. Are you going to introduce some kind of "import" statement that would remove ambiguity on which structure to use?

NONAME said...

"...I have learned all of it. I have all the books, I know all the tricks"

From this sentence, I stopped taking this article too seriously.

Jason Messer said...

I generally try to pretend that the problem parts of C++ don't exist, but I still love the STL. Used carefully it reduces or eliminates an awful lot of plumbing code.

Any plans to try and salvage the STL for your project?

rampelstinskin said...

Drop Algol-like syntax not necessary.
http://nemerle.org/

Dimitris Menounos said...

I think you would love Vala! It aligns perfectly with your goal of sanely enhancing C with higher level abstractions while staying 99% syntax familiar.

Anonymous said...

Keep posting stuff like this i really like it

Larry Clapp said...

Anecdote re: "knowing all of C++": A friend of mine, who is very smart and very experienced and does *not* claim to know all of C++ once interviewed someone that did.

Interviewer 1, to Candidate (on phone): So, Candidate, how much C++ do you know?
Candidate: All of it.
Interviewer 1: [Taken aback] Really.
Candidate: Yes. Ask me anything.
Interviewer 1: Hold on a minute. Hey Chris, can you come in here, and bring your ARM?

They ask random questions from the Annotated C++ Reference Manual, and Candidate does indeed get them all right, and does indeed seem to have all this knowledge at his mental fingertips.

They hire the guy, and he is very smart and very productive and indeed knows all of C++.

I later worked with him, and asked him about it, and he said that that knowledge had atrophied and he no longer claimed to know all of C++.

So it can happen. :)

Re the article: I like this idea, and along parallel lines have often wondered what Perl would look like in Lisp syntax.

One issue I see is that in situations like this you often end up having to know *both* languages (in this case, CL and C) to get a lot done. I see signs of this in your post, e.g. the "with-open-file" macro. This is "obvious" to a CL'er but not so much to a C or C++ programmer. Beware of undocumented crossover idioms. :)

Larry Clapp said...

Also: ECL is a Common Lisp compiler that generates C code. Just a thought. :)

cl-beer said...

Thank you, I have been trying to articulate this for years. I dislike C++ and I have a few good, but now I can add yours as well. I love lisp (and it's little brother, little scheme) and think it's a perfect match for game development. Oh, and keep up the good fight!

Tim Finer said...

Thanks for the article - I like it, now I need to read more about some of these other free tools that do the same. I can finally appreciate what the excitement of LISP like languages is all about.

Victor Sergienko said...

I also got a C++ background, loved it and finally lost my trust in it.

Why not doing one more step and leaving C at all, for Python, Haskell, Lua or whatever?

Serge Brunov said...

Great article!

Anonymous said...

[... ] is another relavant source of tips on this issue[...]

Anonymous said...

Keep posting stuff like this i really like it

Regan said...

Hey, like the article. I've had many of the same experiences and come to many similar conclusions.

Have you ever tried The D programming language (http://www.digitalmars.com/d).

There is one thing which may put you off, and that is the built in garbage collector. However, it is predictable (only triggers on allocation) and can be disabled entirely. Plus there are plans to have several varieties and make it configurable.

D has classes (single inheritance), interfaces, templates (but the syntax and implementation is vastly superior to C++) and there are no header files.

Compile time speeds are 100x better than C/C++.

Your RAII example can be handled by a 1 line scope(exit) statement.

D has type safe printf/scanf.

D is link time compatible with C, you can link D to C libraries, object files, etc.

D plans to have, if it doesn't already, vector/array math.

D has a feature called 'mixins' which work a bit like LISP macros injecting code into other code.

Last but not least, the syntax is the same as C, and better yet the behaviour of statements in D matches those in C, intentionally, so transition is very easy.

reroute said...

You are my hero. I have been coding in Blub for a long time and have always wanted to do something like this (use Lisp macros and Lisp in general as a metaprogramming platform to spit out Blub architectures and abstractions). Thank you sir, I am inspired.