A Practical Implementation of a 3-D Game Engine
Creating a 3-D game engine is not a trivial task as gamers often demand for high quality output with top notch performance in games. In this article, we show you how various real-time rendering algorithms can be applied to implement a practical 3-D game engine. We explore the general architecture of a 3-D engine and discuss the role of a scene graph in a 3-D engine. We will look at scene graph from the software engineering perspective. In particular, we show you the way to design a scene graph that is object-oriented and portable across different rendering engine. Then, we explain the algorithms that we apply to speed up the performance of our 3-D engine. We optimize the 3-D engine on the scene graph and object geometry levels. The algorithms that we propose are expected to perform reasonably well for both static and dynamic scenes. Finally, we give you a brief preview on the possibility of parallel processing in scene graph to create a 3-D engine with multiprocessing capability.
The computer entertainment industry has been experiencing a tremendous growth for the past decades. With the rapid advancement of 3-D accelerator hardware in the past few years, industry players were focusing on producing interactive 3-D games with innovative ideas. 3- D game engine is the core technology that drives those games. In a nutshell, a 3-D engine takes the geometry data of 3-D objects in a game, and shows it on the display device, which typically is a monitor. This process is commonly known as rendering. The 3-D object geometry data is usually defined by a set of vertices, the object material properties (such as diffuse color, specularity, and emissiveness), texture mapping, texture coordinates, and normal vectors. All these data will determine the final appearance of the 3-D object, where they will go through various stages in the 3-D engine processing pipeline (Figure 1).
The high-level scene graph dictates the relationship among objects in the game. It is also an application programming interface (API) to the game programmers for manipulating objects in the game programmatically. Before a scene is rendered, the scene graph must be optimized for rendering. This optimization process typically converts (or compiles) the high-level scene graph into an optimized scene graph, which uses a data structure that is more efficient for rendering. This requires the programmer to specify certain “hints” on each object, such as whether an object is a movable (dynamic) object or stationary (static) object, will the object structure deform over time, and so on. Scene object culling is a process to discard objects that the viewer cannot see, whereas level-of-detail (LOD) control cuts down the insignificant details of objects with respect to the distance from the viewer. The whole idea behind all these is to avoid rendering objects that you cannot see, and reduce the geometry data of distance objects, in order to minimize the data send to the final rendering pipeline, which will improve the rendering performance significantly. These are the major tasks carried out by a 3-D game engine to speed up its performance.
The subsequent tasks, such as lighting, view transformation, clipping, projection transformation, and rasterization, will be undertaken by a rendering engine to complete the rendering pipeline. The rendering engine, often referred as immediate mode rendering engine, is usually implemented with the support of 3-D accelerator hardware. Two major rendering engines available for PC are Direct3D by Microsoft, and OpenGL by Silicon Graphics Inc. To develop a 3-D game on these rendering engines is often tedious and time consuming, given that the programming interface is procedural and hardware oriented. In the following section, we will focus on the three major building blocks of a 3-D engine: (1) the design of a high-level scene graph for our 3-D engine; (2) the algorithms to create an optimized scene graph. (3) how do we discard objects that are not within the viewer’s field of view, as well as controlling the level of detail of objects appropriately.
Design Issues in 3-D Engine
Scene graph design is very important as it has direct implication to the overall usability of the 3-D engine. It defines the way a scene can be modeled by the programmers. A good scene graph design should allow programmers to focus more on scene contents like objects and their arrangement within the scene, thinking of the best way to present them, and forget about the complexities of controlling the rendering pipelines . The programmers will work with the 3-D engine through the application programming interface (API) of the scene graph.
Scene Graph Design
The first issue in scene graph design is thinking of its object representation. As we mentioned earlier, the immediate mode rendering engine like Direct3D and OpenGL tend to have its functionalities incline towards the rendering procedures on graphics hardware. Clearly, this renderer oriented design is not suitable to achieve our goal of a scene graph. An object-oriented scene graph is obviously a better programming model, where we allow programmers to create 3-D games using the notion of objects in the scene. We treat each game object as 3-D object in the scene. Strauss and Carey  had presented a comprehensive framework for an object-oriented scene graph. The basis of this framework is representing the 3-D scenes as graphs (typically directed acyclic graphs) of objects called nodes. There are different node classes (Table 1) and each of them has different properties associated to it. For example, the Cylinder shape node will contains two properties such as radius and height, but the Sphere shape node contains only one property, which is radius. There are certain nodes that can inherit their properties to their child nodes below them. For instance, in Figure 2, it depicts a partial scene graph of a car. The Group node named “Body” has four child nodes that make up the car body. The Texture node contains the properties that define the texture mapping of the car body. These properties will be inherited to the Group node “Door” as well, which make up the door of the car on the car body. Therefore, the car door will have the same texturing mapping as the car body. By using this way, not only we can increase resources reusability, but it is also a natural way to model a scene, especially when we deal with relative positioning of scene objects. The Transform node is used to describe the transformation (position and orientation) of the objects under the parent Group node. This will be the relative transformation to its parent node. In order to get its absolute transformation (transformation relative to the entire scene), the current Transform node will be concatenated (by matrix multiplication) with the Transform node of its parent node. To be able to specify the position and orientation of an object relative to its parent object is usually much easier and natural. This model is commonly known as hierarchical scene modeling  and it is the basis of Bone Animation  that is commonly used for animating character movement in games.
Scene Graph Portability
To ensure that the games can reach as much audience as possible, we must ensure that our 3-D engine is able to run on various platforms and operating systems. Thus, portability is another issue that must be addressed when designing a scene graph. A scene graph must be able to work with any rendering engine on the target platform, without any modification to the codes in the game. Thus, a portable scene graph must be design such that it does not contain any implementations that are dependent to a particular rendering engine. Döllner and Hinrichs  had discussed several practical approaches to generalize the characteristics of different rendering engines, and proposed a general scene graph architecture that supports these systems. They are convinced that by using a generalized scene graphs, most real-time rendering techniques can be integrated in a single scene representation without loosing control over or limiting individual strengths of rendering engines. The portability is achieved by the separation of rendering object and rendering engine. Figure 3 is a simplification of the original scene graphs architecture to illustrate this concept. Similar to Strauss and Carey’s work  discussed above, a rendering object is a node in the scene graph. Rendering objects are organized in a scene graph to compose a complete scene. Döllner and Hinrichs  had extended the rendering objects further into shapes, which include 2-D and 3-D geometric objects that do not provide any rendering functionality, except their geometry descriptions; attributes include appearance attributes (e.g. color, material, texture), transformation attributes (e.g. orientation and position), and lighting attributes (e.g. different kind of light sources), which are rendering specific. During rendering process, the rendering engine will traverse and interpret the scene graph contents, evaluate the attributes of each rendering object and translate them into the algorithms that match the target rendering engine. Therefore, the rendering engine is the only place that contains codes that are specific to a particular rendering engine. To further generalize the rendering engine, these rendering engine specific algorithms can be implemented in an entity called handler, which is a specialization of an attribute. The rendering engine will invoke the methods in these handlers when rendering. Specialized rendering engines can provide optimized implementations of engine methods to exploit the special functionality of the underlying rendering engine. In a bigger picture, scene graphs are described as a parameterized scene content specification. A scene graph can only be evaluated for a given rendering engine. Its contents can be interpreted for different purposes by different rendering engines 
Optimization Techniques in 3-D Engine
Optimizations in this context are exclusively referring to speedup techniques that accelerate scene rendering of our 3-D engine. The optimization process takes the objects in the scene graph, and constructs a specialized data structure to store the objects’ geometry that is best for rendering. This process is commonly referred as scene graph compilation. A compiled (optimized) scene graph should produce result that is output-sensitive. Sudarsky and Gotsman  defined an output-sensitive algorithm if its runtime per frame is O(n + f(N)), where N is the total number of objects in the scene, and n is the number of visible objects, and f(N) << N. Put in a simpler terms, this says that the runtime of an output-sensitive algorithm is linearly proportion merely to the number of visible objects, rather than the total number of objects in the scene. There are two common techniques that we employed to achieve this goal. At the object geometry level, level-of-detail control attempts to render the objects that are farther with less detail (thus the objects contain less data). Therefore, objects that are closer to the viewer will present their complete features clearly, whereas distanced objects will looked coarser with their fine details removed. At the scene graph level, visibility culling avoids rendering objects that are not visible to the viewer. These objects will be discarded in the pre-rendering stage so that they are not sent to the rendering hardware. Visibility culling can be further categorized into back-face culling, view-frustum culling, and occlusion culling. In a nutshell, back-face culling rejects the object’s faces that are facing away from the viewer; view-frustum culling ignores objects that are outside the view volume; occlusion culling attempts to leave out the objects that are blocked completely from the viewer’s line of sight by other objects in the scene (Figure 4).
Scene Graph Optimization
In order to achieve output-sensitivity in the visibility culling algorithms, they cannot simply iterate though all objects in the scene and determine the visible ones. We employed a certain kind of spatial data structure  to group the scene objects in such that, with a single query, the algorithm can decide to accept or reject a particular group of objects in one go. This had suggested the use of hierarchical data structure  to cluster the scene objects into several regions based on their locality. By using this way, a large portion of the objects can be excluded from rendering if the particular region is found hidden or not visible to the viewer. The scene must be preprocessed in order to obtain this hierarchical representation. Given that this preprocessing is very time consuming, it must be done at the initialization stage.
We used octree as the spatial data structure to store objects in our 3-D engine. The main reason why we chose octree as our optimized scene graph is due to its flexibility to extend with various algorithms, based on our review other optimization techniques. Octree uses boxes to spatially divide the objects along three dimensions at a time. At each stage, eight equal-sized boxes are created to partition the scene evenly by three axis-aligned planes (XZ, XY, and YZ planes). Figure 5 illustrates this process in a two dimensional view. This will create a tree with each node containing eight child nodes. After partitioning, the objects are associated to the child node that its corresponding box encloses. If an object is coincidentally intersected with the partitioning planes (Figure 5c), the object can be handled in several ways. One way is to split the object with respect to the partitioning planes, and associate the partial objects depend on its spatial dimension with the child nodes’ bounding box. The splitting algorithm is somewhat complicated and this approach will increase the object counts in the scene as well. Another way is to associate the object in its original form either to the parent node, or every child node that its box encloses. The box subdivision stops when it is empty, or contains only a threshold number of objects (Figure 5d). When the algorithm terminates, the actual objects are all contained in the leaf nodes of the octree, where each leaf node will contain the number of objects that is less than or equals to the specified threshold.
Object Visibility Culling
Frustum culling is carried out by performing initial intersection test  between the viewing-volume and the general (bigger) box enclosing a set of objects, pruning these objects at the early stage if the test fails, otherwise proceed with a finer test on the smaller boxes within it. Octree works very well on static scenes, but it can be adapted easily to work on dynamic scenes. Occlusion culling algorithm can be implemented on top of it to discard another large portion of objects that are not in the viewer’s line of sight, which is especially true for densely populated scene. Zhang had proposed a novel occlusion culling algorithm that uses hierarchical image space occlusion maps . This algorithm consists of two tests: a one-dimensional depth test in the Z-direction, and a two-dimensional overlap test in the image space, to determine whether an object is occluded.
For the two-dimensional overlap test, an occlusion representation is constructed by rendering a set of potentially good occluders, which are identified during scene graph compilation. Zhang had suggested objects that are large or close to the viewer as potential occluders . These occluders are rendered into an off-screen image buffer with a white color on a black background, without any texturing, lighting, and Z-buffering enabled. This operation allows a number of small occluders combined into a large occluder. This rendered image is the highest resolution occlusion map, which forms the base of the occlusion maps hierarchy. The hierarchy is created by recursively sampling from the highestresolution map down to some minimal resolution, as shown in Figure 6. This process can be accelerated by graphics hardware using texture mapping with bilinear interpolation used as minification filter.
The overlap test is carried out by testing the projection of an object’s bounding volume into screenspace bounding rectangle, against the hierarchy at the level where the size of a pixel in the occlusion map is about the same size as the bounding rectangle. If all pixels where the bounding rectangle overlaps with the map are opaque (fully white), then this implies that the object is occluded. Otherwise, the algorithm will recursively check the non-opaque pixel into the sub-pixels at the lower hierarchy level. A unique property of this algorithm is approximate visibility culling, which omits rendering objects that are only visible through small holes in or among occluders. For this, the pixels in the occlusion map are not compared to full opacity (white), but rather against an opacity threshold value (gray-scale). The lower the threshold, the more approximate the culling, resulting objects that are partially visible being omitted from rendering. Zhang had come out with a formula for computing the threshold values of different levels in the hierarchy . This feature will most likely increase the culling rate, when the scene doesn’t require objects to be visible through some small and insignificant gaps among occluders.
The one-dimensional Z-depth is used to check whether an object is behind the selected occluders. The depth estimation buffer proposed by Zhang  divides the screen into a number of rectangular regions. For each region, the Z-value of the farthest vertex among all occluders’ bounding volume within this region is inserted into this buffer. The depth estimation buffer is built at every frame. During rendering, an object passes the depth test only if the Z-value of the nearest vertex of its bounding volume is larger than the stored Z-depth in all regions that the object covers. In order for an object to be occluded, it must pass both the overlap test using hierarchical occlusion maps, and the depth test by depth estimation buffer.
Dynamic Object Optimization
To handle dynamic objects in octree, the most straightforward way is every time when an object moves, it is deleted from the octree, followed by an insertion to its new position. This is not optimal as it involves major modifications of the octree structure. For instance, deletion of an octree node sometimes involves merging nodes that are immediately split again, as seen in Figure 7. In addition, it takes a long path to search for the object insertion node from the root. Instead of updating the entire octree for deletions and creations, Sudarsky  suggested that only the sub-tree whose root is the least common ancestor (LCA) of the object’s old and new positions is updated (Figure 8). For a large scene where the octree is very deep, this method will significantly reduce the time to update the octree, as LCA is expected to be closer to the leaves than to the root.
To avoid updating the octree structure every frame for every dynamic objects, Sudarsky  used a lazy evaluation technique, where we don’t compute anything until it is absolutely necessary. This requires a Temporal Bounding Volume (TBV) associated with each dynamic object. A TBV is a bounding volume guaranteed to contain a dynamic object for some period of time. This period of time is referred as the validity period of the TBV, and the expiration date is the last instant of that period. A TBV is constructed on-the-fly based on some prior knowledge of the object’s movement and behavior. For example, sweep surfaces can be used as the TBVs for objects with preset trajectory; if the maximum velocity and acceleration are known, spheres may be used. Using this technique, the TBVs of dynamic objects are used for the intersection test during frustum culling described above (Section 3.2). A moving object is considered hidden and ignored until: (1) its TBV is visible, which implies that the object itself might be visible; (2) its TBV has expired, meaning that the TBV is no longer guaranteed to contain the object. A priority queue is recommended to store the expiration dates of all TBVs. The validation period must be chosen with care in order to get the most optimum performance. An adaptive algorithm works best in most situations for this case. Initially, we set an arbitrary value for the validation period. If the TBV expires before it is actually visible, it means that the period was too short and a longer validation period will be assigned to the next TBV. In contrast, if the TBV becomes visible before it is expired, the next TBV will have a shorter validation period instead.
Object Geometry Optimization
The geometry of a scene object can be optimized further by attempting to generate different levels of detail (LOD) of the object with respect to its distance from the viewer. The reason behind this is that we don’t need to demonstrate the full detail of an object that is far away and looks relatively small in image space. We only display the complete details for the object when it gets nearer to the viewer, where the fine detail can be evidence clearly. This can significantly reduces the object data by sending only the absolute necessary ones to the rendering engine. Figure 9 illustrates how a Stanford Bunny with LOD control will look like at two different distances from the viewer.
LOD control through object simplification is an imperative technique to increase the performance of our 3-D engine. The two renowned object simplification techniques are decimation [10, 12] and clustering method [11, 13]. Decimation method simplifies the vertices, edges, or triangles based on ordering basis. A list of candidate vertices will be generated, with those that contribute less to the overall appearance placed on top of the list. When a “less important vertex” is removed, the resulting hole will be patched. This process continues until the desire result is achieved. In clustering method, a group of highweighted vertices will be pre-determined. Then, the surrounding vertices that are close to these vertices will be clustered. Only the high-weighted vertices will be sent to the rendering engine, where each of them is tied up to form the meshes. This method is extremely fast but the quality of the simplified model is very low.
The fidelity of the simplification algorithm can be improved by first targeting on small and co-planer meshes during the simplification process. To obtain these meshes, one can use length, area, volume, angle, etc. to measure each vertex with its neighboring vertices. When these coplaners are removed and replaced by large triangles, the number of polygons used decreases but the semantic of the object will still remain the same as the original object. Thus the original appearance of the object is retained as much as possible.
The Future of 3-D Game Engine
So far all the algorithms that we had discussed above are sequential algorithms, which are designed to run on single processor computers. Although majority of the gamers today are using this type of configuration, multiprocessor computers is an emerging technology trend in the industry. Numerous research works had been done recently in an attempt to incorporate parallel processing algorithms into 3-D engine. Rohlf and Helman  explained how different components in a 3-D engine can be optimized using various parallel algorithms. The 3- D engine that they developed uses multiprocessing to partition work among multiple processes, and pipelined processing to manage data. They also demonstrated how processes synchronize their operations between each other under different circumstances.
Igehy  presented an interesting attempt to extend the OpenGL programming interface to support parallel processing. His extension allows multiple graphics contexts to simultaneously draw into the same image. Synchronization primitives are included to enable parallel traversal of an explicitly ordered scene. An implementation that uses this programming interface on a 24 processors system is demonstrated to test its viability.
We had presented the main aspects to implement a feasible 3-D game engine. This engine is portable and has an efficient design that allows game programmers to construct and manipulate scene objects effectively. We also emphasized a lot on the optimization techniques used in the 3-D engine. Output sensitivity is the compulsory characteristic for the algorithms that we employed in the 3-D engine. Optimization is done on both scene and object level to maximize the rendering performance in static and dynamic scenes. Finally, a few additional sources are discussed briefly to further enhance the performance of the 3-D engine using parallel processing.
- H. Sowizral, “Scene Graphs in the New Millennium”, IEEE Computer Graphics and Application, Springer, pp. 23-24, 1995.
- J. Adams, “Creating 3-D Graphics Engines”, Programming Role Playing Games with DirectX, Premier Press, 2002.
- P. Strauss and R. Carey, “An object-oriented 3D Graphics Toolkit”, Computer Graphics (SIGGRAPH ’92), pp. 341- 449, 1992.
- J. Döllner and K. Hinrichs, “A Generalized Scene Graph”, Vision, Modeling, Visualization 2000 (VMV 2000), Premier Press, 2000.
- O. Sudarsky and C. Gotsman, “Output-Sensitive Visibility Algorithms for Dynamic Scenes with Applications to Virtual Reality”, Proceedings of Eurographics ‘96, 15(3), 1996.
- H. Samet, “Applications of Spatial Data Structures”, Computer Graphics, Image Processing and GIS, Addison- Wesley, 1989.
- M. Gomez, “Simple Intersection Tests for Games”, Gamasutra, 1999, http://www.gamasutra.com/features/19991018/Gomez_1.ht m
- T. Kay and J. Kajiya, “Ray Tracing Complex Scenes”, Computer Graphics (SIGGRAPH ’86), pp. 269-278, 1986.
- D. Eberly, “Hierarchical Scene Representations”, 3D Game Engine Design: A Practical Approach to Real-Time, Morgan Kaufmann, pp. 141-167, 2000.
- M. Hussain, Y. Okada, and K, Niijima, “Efficient and Feature-Preserving Triangular Mesh Decimation,” Journal of WSCG, 12(1):167-174, February 2004.
- M. Garland and P. Heckbert, “Surface Simplification using Quadric Error Metric”, Proceedings of SIGGRAPH ’97, pp. 209-216, August 1997.
- W. Schroeder, J. Zarge and W. Lorenson, “Decimation of Triangle Meshes”, Computer Graphics, Vol 26 (SIGGRAPH ’92), pp 65-70, 1992.
- K.L. Low and T.S. Tan, “Model Simplification Using Vertex Clustering”, Proceedings of Symposium on Interactive 3D Graphics, pp.75-82, 1997.
- H. Zhang, D. Manocha, T. Hudson and K.E. Hoff III, “Visibility Culling using Hierarchical Occlusion Maps”, Computer Graphics (SIGGRAPH ’97), pp.77-88, August 1997.
- O. Sudarsky and C. Gotsman, “Dynamic Scene Occlusion Culling”, IEEE Transactions on Visualization and Computer Graphics, 5(1), pp. 13-29, 1999.
- J. Rohlf and J. Helman, “IRIS Performer: A High Performance Multiprocessing Toolkit for Real-Time 3D Graphics”, Proceedings of SIGGRAPH ’94, pp. 381-395, 1994.
- H. Igehy, G. Stoll and P. Hanrahan, “The Design of a Parallel Graphics Interface”, Proceedings of SIGGRAPH ’98, pp. 141-150, 1998.
About The Author
Thomas Cheah is the Principal CTO-for-hire of Procto. He helps business owners and executives to innovate their business model thru strategic technology management so that they get 80% of the benefit for 20% of the cost. If you have innovative ideas but do not the technical expertise, he is your partner to validate and build your digital business models. Thomas believes that constant business innovation is increasingly important in today's business environment so that our business is prepare for rapid change of customer behavior, rising cost, and globalization in order to stay ahead (or away) of competition.
Learn more about Thomas Cheah.