

GPU Terrain Subdivision and Tessellation The full details of our GPU Terrain Subdivision and Tessellation algorithm, along with the full source code and a demo .exe are included in the GPU Pro 4 Advanced Rendering Techniques publication by CRC Press. Abstract This paper presents a GPU based algorithm to perform realtime terrain subdivision and rendering of vast detailed landscapes. This algorithm achieves smooth level of detail (LOD) transitions from any viewpoint, and does not require any preprocessing of data structures on the CPU. 1 Introduction There are a lot of existing terrain rendering and subdivision algorithms that achieve fantastic results, and fast frame rates. Some of these algorithms, however, are limited. They can require the preprocessing of large data sets, constant transferring of large data sets from the CPU to the GPU, limited viewing areas, or complex algorithms to merge together meshes in efforts to avoid cracks and seams as a result of various LOD subdivision techniques. The GPU based algorithm we developed addresses all of the above mentioned limitations, and presents a simple alternative to render highly detailed landscapes, without significant impact on the CPU. We describe a GPU based algorithm to create a subdivided mesh with distance based level of detail (LOD) that can be used for terrain rendering. Data amplification and multiple streamout hardware capability are utilized to repeatedly subdivide an area to achieve a desired level of detail. In addition, culling is also performed at each iteration of the algorithm, therefore avoiding a lot of unnecessary processing or subdivision of areas outside of the viewing frustum. Because the resulting data is retained and refined on the GPU, the CPU is mostly left available to perform other tasks. To show a practical use of this technique, we also utilize a smooth LOD transitioning scheme, and use a procedural terrain generation function to provide realtime rendering of a highly detailed and vast landscape. Our algorithm was heavily inspired by two existing algorithms. The first inspiration came from the great desire to walk through the procedural mountains created by F. Kenton Musgrave [EMPPW*1998], in realtime. The second inspiration came from the visual beauty of the realtime water created by Claes Johanson, in his introduction of the projected grid concept [CJ*2004]. The concept helped form one of the ideas for the basis of our subdivision algorithm, by showcasing effective and efficient vertex placement to display a vast area of seascape. 2 The Algorithm A few terms are used throughout this paper, and are integral to understanding the general algorithm and the related descriptions. Section 2.1 will describe these terms, and provide related calculations. An algorithm overview is provided in section 2.2, and sections 2.3 to 2.5 describe the main separate components of our algorithm. 2.1 Terms & Definitions
Viewable Region Span: To quantify a viewable region span, denoted by θ, we defined the following calculation at point P and applied offset λ:
The above calculation for the viewable region span can be used at any position P and applied offset λ, and is used extensively within the LOD Transition Algorithm (explained in section 2.4). We use this calculation instead of calculating the actual viewing surface area to avoid inconsistencies when viewable regions are viewed from different angles. Maximum Viewable Region Span: The maximum viewable region span, denoted by θ_{max}, is the maximum allowable viewable region span. This value, which is set by the user, is one of the main determining factors of the attainable LOD, and plays a key part both the Subdivision Algorithm (explained in section 2.3), and the LOD Transition Algorithm (explained in section 2.4). Relative Quadrant Code: This code identifies the relative position of a split viewable region in relation to its parent viewable region. This code is utilized by the LOD Transition Algorithm (explained in section 2.4), and calculated in the Subdivision Algorithm (explained in section 2.3). Usually encoded as a 2 bit mask, this code becomes part of the definition of a viewable region. 2.2 Algorithm Overview The algorithm operates in three stages:
2.3 Subdivision Algorithm In Stage 2 of the algorithm, for each iteration, we feed an input stream of viewable regions into the geometry shader stage of the rendering pipeline. The input stream for the first iteration is the initial input stream that was created in Stage 1. All subsequent iterations use the intermediate output stream from the preceding iteration as the input. Two output streams are utilized in the geometry shader. An intermediate output stream is used to hold viewable regions that are intended for further processing. A final output stream is used to hold viewable regions that @require no further processing in the next iteration, and are to be rendered as part of Stage 3 of the algorithm. Within the geometry shader, each viewable region is processed and one of the following three actions are performed:
Note that for the final iteration in Stage 2 of the algorithm, any viewable region not culled must be added to the final output stream, regardless of the viewable region span. This is to ensure that all remaining viewable regions have a chance to be rendered, and are not placed into an intermediate output stream that will not result in any further processing or rendering. Figures 4a4f show examples of a viewable region after a number of iterations through our subdivision algorithm.
2.4 LOD Transition Algorithm In Stage 3 of the algorithm, a stream of viewable regions are rendered. One viewable region R with an applied offset of R_{λ} may be adjacent to another viewable region with an applied offset of ½R_{λ}. Without performing a smooth transition between the two viewable regions, there could be visible discontinuities or other visual anomalies when rendering. We describe a method that offers a smooth transition between the two viewable regions. By rendering a viewable region as a set of quadrilaterals, we are able to morph the quadrilaterals in such as way to make the boundary between the larger and smaller viewable regions indistinguishable. This method eliminates seams, Tjunctions, and visible boundaries between neighboring viewable regions of different sizes. A similar method was described by Philip Strugar [PS*2010], although we have extended it to handle various boundary cases to ensure no cracks form anywhere within the mesh.
Collectively, these nonoverlapping quadrilaterals will cover the same surface area as the viewable region they are created from. The boundary points of each quadrilateral are calculated to align with the boundary points of the neighboring viewable region quadrilaterals. The morphing quadrilateral boundary points ( P_{L}, P_{T}, P_{R}, P_{B}, P_{C} ) are calculated as follows:
Given a viewable region span θ at point P and applied offset λ, θ( P, λ ), we are able to calculate a morphing factor T( P, λ ), by using the following formula:
These general morphing factors will be applied when rendering a viewable region, and assumes that the neighboring viewable regions have the same applied offset R_{λ}. Figure 6 provides a diagram of the various positions used when calculating the general morphing factors. There are two special boundary cases we need to handle when calculating the morphing factors. These special cases arise when one viewable region is adjacent to another viewable region of a larger or smaller applied offset λ. The cases are defined as follows:
The various morphing stages of a single visible region are shown in Figure 9. As a result of how the morphing quadrilateral boundary points are calculated, neighboring visible regions share adjacent boundary points. This results in no cracks or seams when rendering, and smooth transitions when moving through a scene.
The images below show examples of rendered viewable regions without (see Figure 10a) and with (see Figure 10b) the LOD Transition Algorithm in effect.
2.5 Procedural Height Generation Algorithm Any number of methods can be used in conjunction with the subdivision algorithm described in this paper. We chose to base ours on the "Ridgid multifractal terrain model" algorithm described by F. Kenton Musgrave [EMPPW*1998]. The use of this procedural algorithm for us resulted in highly detailed and realistic terrain, as seen in our demo video below. The adaptive nature of the algorithm effectively eliminated high frequency noise or aliasing, and fit quite nicely with our LOD Transition Algorithm. We used a tileable noise texture when the "Ridgid multifractal terrain model" algorithm called for a noise value, and embedded gradient information within the noise texture to speed up the surface normal calculations. The height is calculated at each of the visible region quadrilateral boundary points, resulting in displaced geometry when rendering (see Figure 11c). We calculate the surface normals on a per pixel basis to achieve even greater surface detail (see Figure 11d). 3 Results In our tests, the CPU utilization averaged approximately 1%. Frame rates for our demo averaged between 4050 fps, while running at a resolution of 1024×768. The demo was run on a 2.9GHz AMD Phenom CPU with 8GB system memory, and a 512MB AMD RADEON HD 6450 video card. Figures 11a11e shows some of the various steps taken to create the final rendered scene. In Figure 11e, we added some diffuse lighting from a single light source, and modified the color of the terrain based on the calculated surface normal. Greater amounts of realism can be easily added through any number of techniques, such as the use of textures, but we wanted to focus on the possible detail through the exclusive use of our algorithm.
4 Conclusions The results of our algorithm show promise, and we are continuing our research in this area in efforts to make further gains. A lot of the speed costs involved in our usage example come from the multilayered procedural calculation of the terrain height at the specified locations. Swapping our procedural algorithm out and using a simpler method, such as a single texture based height map, would result in much faster rendering times. When swapping out our procedural height calculation for a simpler height map lookup, we were able to achieve frame rates well above 100 fps at the same resolution. One problem with our approach is the visual phenomenon described as "vertex swimming". This can be noticed when moving towards large features that contain a high frequency of detail. The problem can be greatly reduced by increasing the amount of subdivisions created before rendering (ie: by lowering the θ_{max} value used in the Subdivision Algorithm). When we swapped out our LOD Transition Algorithm and instead added logic to correct the geometry (Tjunctions) at the LOD boundaries, we were able to eliminate the "vertex swimming" phenomenon. Unfortunately, this also meant that the LOD boundaries became noticeable, and in our opinion, were more of a visual distraction than the "vertex swimming" itself. There may be other ways to eliminate this visual distraction, and we are hopeful that this issue will be improved with continued research. Care must be taken to ensure high precision floating point calculations are used throughout this algorithm. Artifacts can sometimes appear if care is not taken in this regards, and floating point errors are inadvertently allowed to compound, which can show up as sporadic pixel noise in the final rendered scene. We solved this problem by using integer based values for our visible region center positions and applied offsets. Our initial applied offsets are set to large factors of two, in order to accommodate the maximum number of iterative subdivisions. By only applying a floating point conversion factor when we required world space coordinates, we were able to completely eliminate all floating point difference errors, while also scaling the data appropriately. Improvements to the subdivision algorithm and resulting data amplification could be made to better utilize hardware depth culling when rendering the resulting mesh. Increasing the subdivision granularity per iteration, and adding the subdivided regions to the output streams in increasing order of average depth (nearest to furthest) would enable the final rendering pass to benefit more from depth culling. References [EMPPW*1998] Ebert D. S., Musgrave F. K., Peachey D., Perlin K., Worley S.: "Texturing and Modeling: A Procedural Approach, Second Edition" Academic Press, 1998. [FS*2010] Filip Strugar: "Continuous DistanceDependent Level of Detail for Rendering Heightmaps (CDLOD)" [CJ*2004] Claes Johanson: "Realtime water rendering Introducing the projected grid concept" Demo All product names are trademarks or registered trademarks of their respective holders. 
© 2018 Mistal Research 