Monday, November 27, 2006

ODE, .NET, handles and references from hell

I've recently toyed around with .NET in C# and I was excited to see that there are .NET bindings for a lot of free game development libraries in the Tao framework. I was especially interested in playing with the ODE bindings. ODE is a free physics and collision library that is starting to become usable for a lot of things. As with many .NET wrappers, the ODE wrapper is just a set of static functions that take System.IntPtr objects. You are supposed to juggle these handles carefully, because they correspond to some real unmanaged object. This is all good, but these APIs break down when the unmanaged objects form graphs on their own, which is the case with the ODE API. Assume we have a wrapper for the following API: public sealed class SomeApi {  static IntPtr MakeContainer();  static void DestroyContainer(IntPtr i);  static IntPtr MakeItem();  static void DestroyItem(IntPtr i);  static void AddItem(IntPtr container, IntPtr item); // Many other functions. } Further, assume that the semantics of a "Container" in the API is that it destroys its contents when it is destroyed. Also assume that we've written our wrapper classes for Container and Item so that they correctly call the corresponding cleanup function in the API when objects are Disposed and finalized. With the prerequisites out of the way, consider this code fragment: Item i = new Item(); Container c = new Container(); c.AddItem(i); What is happening here is that the unmanaged code is forming a reference graph behind the covers, but the available tools for dealing with unmanaged resources can't see such graphs. The .NET process will crash because regardless of whether the item or the container is destroyed/finalized first, the other one still refers to a (now deleted) unmanaged object. Tao's ODE wrapper exhibits these exact problems, making it very hard to reason about code correctness. Essentially I'm finding it much harder to use in C# than in C because not only do I have to deal with my own code, I also have to consider how ODE maintains memory internally at every API invocation and what that does to the validness of my wrapper objects. The only solution I can see is to build wrapper APIs that explicitly maintain the life-time of handles as well. In the above example, such a wrapper would detect that the "AddItem" operation changes the cleanup origin of the item and that it shouldn't be destroyed when the Item wrapper is destroyed, because the container is now doing that. Sigh.

Wednesday, June 07, 2006

Non-ignorable return codes

I'd like to have a nonignorable-type that could be used to wrap return values that must be handled, but using an exception would be to heavy-weight. Consider: bool IntersectLineAndPlane(...); I'd like to say nonignorable<bool> IntersectLineAndPlane(...); And then have code that doesn't handle the return value give a compile-time error (or warning) so that mistakes don't slip through into production code. You would think that perhaps something like this would work:
template <typename T>
class nonignorable {
  public:
      nonignorable(const T& v) : value_(v) {}

      // check() does the destruction and extracts the value
      friend T check(const nonignorable<T>&);

 private:
     ~nonignorable();
};
This could be used as:
nonignorable<bool> Function() {
 return nonignorable(false);
}
And in client code: if (!check(Function)) { /* take action */ } Alas, this will not work, because there are more destructions going on when the nonignorable instance is returned on the stack. I think the only way to solve this is to rely on the new C++0x proposition to add move semantics into the language so that there is only one destruction (in the check() template function).

Friday, March 17, 2006

Mixing Scheme and C

In my quest to find the perfect Scheme/C mix I've found that Bigloo Scheme really goes a long way to overcome the language barriers. The cool part is that Bigloo compiles to efficient C code, and you can influence the generated code in a number of ways. For example, it's possible to include headers and declare functions with type annotations straight in the Scheme source code. Since the Scheme files are translated to C when compiled, there's no marshalling to consider. I'm assuming that annotated procedures yield a very low overhead compared to Python which requires a object allocations for everything (even integers). Using Bigloo with DLLs symbols on windows has one quirk though, the symbols must be imported as macros, and the header file with _declspec() crud must be included explicitly. This does the trick:
(module my-module
        (extern
          (include "include/mydll.h")
          (macro simple::int (::int) "simple")))
Given these declarations, you can call the function simple as if it were a Scheme procedure, which is awesome. When compiled, the resulting executable links properly to the DLL. I haven't found a tool that can produce these annotations automatically, but writing one should be trivial. I'm slightly confused about how to make a stringent build system for a combined Scheme/C project, but I'm sure it's possible. Armed with such a build tool it might actually become fun to program again..

Tuesday, February 14, 2006

Reducing boilerplate by embedding XSLT

I've recently been working on an custom IDL compiler that can target Python and C++. Think COM, only portable between platforms and targeted towards a select few languages. A lot of patterns immediately appear in both the frontend and backends of the compiler that's just boilerplate work. Consider the classic implementation of an IDL parser (or any programming language parser, for that matter):
  1. Parse the input, building an abstract syntax tree of the declarations.
  2. Build a type graph and interconnect the type nodes so that types can be resolved.
  3. Resolve all type references (using fixups or in a separate pass if forward references allowed.)
  4. Invoke one or more backends if the resulting tree is valid.
The backends in turn will walk the tree (via some variation of the Visitor pattern) and produce output. This quickly becomes repetitive and mundane work. Visitor is well known for it's strong impact on code coupling and strong binding between entities. In another sandbox project of mine I tried out a completely different approach. Instead of writing yet another AST hierarchy to interface the frontend I experimented with XML as an intermediate format in the compiler itself. Given an input file, my LALR parser will essentially emit a well-formed XML document using the DOM APIs. This in itself doesn't buy much, but it provides tremendous flexibility down the road. Think of it as capturing the entire AST in a portable format. Indeed, the GCC project has recently added a AST-to-XML converter as part of their (excellent) compiler frontend so more people are thinking along these lines. With the AST swimming around in an XML DOM tree, it becomes possible to rewrite repetitive and boring tasks in XML query tools that are designed for those jobs rather than a general-purpose programming language such as C++. Whipping up flexible type resolvers, type and syntax checkers using a decent XSLT processor is a lot of fun, and it can be done with a surprisingly small amount of XSLT code. XQuery processors can easily build statistics and metrics with a minimal amount of code. Another interesting thing about keeping the AST in XML is that it becomes trivial to add whatever meta-data you like in a matter of seconds. That data can travel through the filters (for example in an XML namespace of its own) all the way to the backend, where it can be used to tweak the output. Also, changing the XML AST doesn't require recompiling hundreds of AST classes! The key concept to making this work is pipelining, which is a common design pattern anyway in compilers. Connecting the output of one XSLT processing filter to the input of the next nicely captures the pipelining pattern while at the same time enforcing the idea that one piece of code should have a single, well-defined purpose. XSLT also works wonders for code generation backends, because it designed for transforming data, something C++, Java and Python simply aren't. So what are the drawbacks? Simply put: speed and memory consumption. Clearly, a hand-written implementation of say a type resolver will have a lower memory consumption and better processing speed than a XSLT-based implementation. Also, while XSLT is a very useful language (especially with version 2) it's not very strong with error reporting or string handling, to name a few areas. If you're willing to strike a deal with the portability devil and settle on a single XSLT library it's possible to work around most of these problems. Given the prototyping advantages I still think this is a viable implementation strategy for compiler projects.