2001.10.24

One important aspect of rendering fast 3-D graphics is the notion of stripification. Nearly all modern 3-D graphics hardware represents surfaces in terms of coherent triangle meshes, and typically you feed the graphics engine lists of triangles. These triangles are typically represented by vertex lists where a new vertex replaces the second oldest vertex in the current triangle (a "fan") or the new vertex replaces the oldest vertex in the current triangle (a "strip").

We care about generating "optimal" strips because if we can feed the graphics hardware coherent strips and fans of triangles as opposed to discrete triangles, we can reduce bus bandwidth and generally improve rendering speed. The notion of what "optimal" is differs from architecture to architecture, as well as from writer to writer. However the literature on this problem has exploded in the past five years. I thought it might be useful to summarize some of the more interesting work on this problem.

The problem itself is great geeky fun on a long plane flight. Just draw a big messy mesh of triangles and try to figure out how to generate an optimal set of strips and fans from it. Eventually you’ll figure out that a fast, general algorithm is non-obvious. If you check the literature you’ll discover that the problem itself is NP-complete across certain types of triangle meshes; reducing a mesh to a single triangle strip and/or fan is the good old Hamiltonian path problem. So anybody who claims that they have implemented the "optimal" algorithm for stripifying meshes is either up for the first Nobel Prize in mathematics, or mistaken.

One odd result from Reuven Bar-Yehuda and Craig Gotsman proposes that the list of incoming polys should be pushed on a stack. The algorithm puts a nice upper bound on the correct size of that stack, which is on the order of the square root of the mesh size. I have not personally worked with any architectures where this approach was implementable, but it’s interesting nonetheless.

One of the big academic disagreements over the stripification problem seems to be over the definition of "optimization." Hugues Hoppe wrote an interesting article in which he optimizes for render-time cache coherency over a stripified mesh. That makes pretty good sense to me. However, the majority of work these days seems to be about trying to get triangle strips to be as long as possible.

There are a few algorithms that attack the optimization problem by finding quad meshes or by looking for discrete surface edges, and dealing explicitly with these cases. Perhaps the first program to deal with this problem in any way was tomesh.c, which appeared on the SGI Developer’s Toolkit. Nearly all the papers nowadays describe the efficiency of their algorithms with respect to tomesh.c, which is a bit like hunting cows with a shotgun. Another algorithm that tried to exploit some architectural knowledge of the mesh was done by Francine Evans, Steven Skiena, and Amitabh Varshney which treats quads as a special case.

But in my opinion, the most promising and elegant class of algorithms seems to be the ones based on generating a minimal spanning tree of the mesh using a depth-first search algorithm or some variant thereof. Xinyu Xiang, Martin Held, and S. B. Mitchell have gotten the number of vertices per triangle down to around 1.2 for their sample set. James Stewart has figured out how to glue nearby strips to one another with a process called dynamic tunneling. The nice thing about using dfs or similar algorithms is that, in my mind, they’re easier to optimize for strip length, coherent cache usage, coherent texture usage, related normals, or other factors that your particular architecture cares about.

If, after reading this, all you want is some code that implements a reasonable stripifier, check out Brad Grantham’s Triangle Consolidator. It’s not optimal, but then, what code is?