GPU Terrain Subdivision and Tessellation

Terrain Tessellation Example Image


The full details of our GPU Terrain Subdivision and Tessellation algorithm, along with the full source code and a demo .exe are included in the GPU Pro 4 Advanced Rendering Techniques publication by CRC Press.  

Abstract

This paper presents a GPU based algorithm to perform real-time terrain subdivision and rendering of vast detailed landscapes.  This algorithm achieves smooth level of detail (LOD) transitions from any viewpoint, and does not require any pre-processing of data structures on the CPU.  

1    Introduction

There are a lot of existing terrain rendering and subdivision algorithms that achieve fantastic results, and fast frame rates.  Some of these algorithms, however, are limited.  They can require the pre-processing of large data sets, constant transferring of large data sets from the CPU to the GPU, limited viewing areas, or complex algorithms to merge together meshes in efforts to avoid cracks and seams as a result of various LOD subdivision techniques.  

The GPU based algorithm we developed addresses all of the above mentioned limitations, and presents a simple alternative to render highly detailed landscapes, without significant impact on the CPU.  

We describe a GPU based algorithm to create a subdivided mesh with distance based level of detail (LOD) that can be used for terrain rendering.  Data amplification and multiple stream-out hardware capability are utilized to repeatedly subdivide an area to achieve a desired level of detail.  In addition, culling is also performed at each iteration of the algorithm, therefore avoiding a lot of unnecessary processing or subdivision of areas outside of the viewing frustum.  Because the resulting data is retained and refined on the GPU, the CPU is mostly left available to perform other tasks.  To show a practical use of this technique, we also utilize a smooth LOD transitioning scheme, and use a procedural terrain generation function to provide real-time rendering of a highly detailed and vast landscape.  

Our algorithm was heavily inspired by two existing algorithms.  The first inspiration came from the great desire to walk through the procedural mountains created by F. Kenton Musgrave [EMPPW*1998], in real-time.  The second inspiration came from the visual beauty of the real-time water created by Claes Johanson, in his introduction of the projected grid concept [CJ*2004].  The concept helped form one of the ideas for the basis of our subdivision algorithm, by showcasing effective and efficient vertex placement to display a vast area of seascape.  

2    The Algorithm

A few terms are used throughout this paper, and are integral to understanding the general algorithm and the related descriptions.  Section 2.1 will describe these terms, and provide related calculations.  An algorithm overview is provided in section 2.2, and sections 2.3 to 2.5 describe the main separate components of our algorithm.  

2.1    Terms & Definitions

Viewable Region:  A viewable region, denoted by R, is defined as an axis aligned quadrilateral representing a region that is to be subdivided and/or rendered.  This region is defined from a center position, and an applied offset in both the positive and negative directions along each aligning axis (see Figure 2).  The center position is denoted by RC.  The applied offset is denoted by Rλ.  The bounding points of the region are denoted by P1, P2, P3, and P4.  
Figure 2
Figure 2: Example of a viewable region (in grey).
RC = Center Position ( RCxRCy )
Rλ = Applied Offset
P1 = ( RCx - RλRCy + Rλ )
P2 = ( RCx + RλRCy + Rλ )
P3 = ( RCx + RλRCy - Rλ )
P4 = ( RCx - RλRCy - Rλ )

Viewable Region Span:  To quantify a viewable region span, denoted by θ, we defined the following calculation at point P and applied offset λ:

θPλ ) = | PScreenR - PScreenL |

PScreenL = ( PProjLxy ) / PProjLw
PScreenR = ( PProjRxy ) / PProjRw
PProjL = PWL × matProjection
PProjR = PWR × matProjection
PWL = ( PWx - λPWy )
PWR = ( PWx + λPWy )
PW = P × matWorldView

The above calculation for the viewable region span can be used at any position P and applied offset λ, and is used extensively within the LOD Transition Algorithm (explained in section 2.4).  We use this calculation instead of calculating the actual viewing surface area to avoid inconsistencies when viewable regions are viewed from different angles.  

Maximum Viewable Region Span:  The maximum viewable region span, denoted by θmax, is the maximum allowable viewable region span.  This value, which is set by the user, is one of the main determining factors of the attainable LOD, and plays a key part both the Subdivision Algorithm (explained in section 2.3), and the LOD Transition Algorithm (explained in section 2.4).  

Relative Quadrant Code:  This code identifies the relative position of a split viewable region in relation to its parent viewable region.  This code is utilized by the LOD Transition Algorithm (explained in section 2.4), and calculated in the Subdivision Algorithm (explained in section 2.3).  Usually encoded as a 2 bit mask, this code becomes part of the definition of a viewable region.  

2.2    Algorithm Overview

The algorithm operates in three stages:

Stage 1:
Create the initial input stream of one or more viewable regions containing the area(s) to be viewed.

Stage 2:
Take the initial input stream of viewable regions from Stage 1 and iteratively process each viewable region utilizing the Subdivision Algorithm (explained in section 2.3).

Stage 3:
Render the resulting final output stream of viewable regions from Stage 2.  At this stage, we utilize an LOD Transition Algorithm (explained in section 2.4), as well as a Procedural Height Generation Algorithm (explained in section 2.5) to render the terrain.

2.3    Subdivision Algorithm

In Stage 2 of the algorithm, for each iteration, we feed an input stream of viewable regions into the geometry shader stage of the rendering pipeline.  The input stream for the first iteration is the initial input stream that was created in Stage 1.  All subsequent iterations use the intermediate output stream from the preceding iteration as the input.  Two output streams are utilized in the geometry shader.  An intermediate output stream is used to hold viewable regions that are intended for further processing.  A final output stream is used to hold viewable regions that @require no further processing in the next iteration, and are to be rendered as part of Stage 3 of the algorithm.  

Within the geometry shader, each viewable region is processed and one of the following three actions are performed:

Option 1:
If the viewable region is determined to not intersect with the view frustum, it is culled.  The culled viewable region is not added to either the intermediate or the final output stream.  

When determining whether or not the viewable region intersects the view frustum, one must be mindful of any displacements performed during Stage 3 of the algorithm.  When using this algorithm for terrain rendering, each viewable region is extruded into a volume, based on the maximum displacement that may be added in Stage 3.  It is this extruded volume that is then tested for intersection with the view frustum.  

Option 2:
If the viewable region span θ, calculated at the viewable region R center position RC, and with the applied offset Rλ, θRCRλ ), is greater than the maximum viewable region span θmax, then the viewable region R is split into four quadrants ( R1R2R3R4 ).  The quadrants, each of which become viewable regions themselves, are then added to the intermediate output stream to be re-processed at the next iteration.  A special code, unique to each quadrant, is also added to the output stream to identity the relative location of the split viewable regions to their parent viewable region.  This extra piece of information, referred to as a relative quadrant code, and usually encoded into a 2 bit mask, is later utilized by the LOD Transition Algorithm (explained in section 2.4) when rendering each viewable region.  

To split the viewable region R (see Figure 3a) into the quadrants (see Figure 3b), we create four new viewable regionsR1R2R3R4 ) with their respective center positionsR1CR2CR3CR4C ) and applied offsetsR1λR2λR3λR4λ ) defined as follows:  

R1C = ( RCx - ½RλRCy + ½Rλ )
R2C = ( RCx + ½RλRCy + ½Rλ )
R3C = ( RCx + ½RλRCy - ½Rλ )
R4C = ( RCx - ½RλRCy - ½Rλ )

R1λ = ½Rλ
R2λ = ½Rλ
R3λ = ½Rλ
R4λ = ½Rλ
Figure 3a
Figure 3a: The viewable region (R).
Figure 3b
Figure 3b: The split viewable regions (R1, R2, R3, R4).

Option 3:If the viewable region span θ, calculated at the viewable region R center position RC, and with the applied offset Rλ, θRCRλ ), is less than or equal to the maximum viewable region span θmax, then the viewable region R is added to final output stream of viewable regions.  

Note that for the final iteration in Stage 2 of the algorithm, any viewable region not culled must be added to the final output stream, regardless of the viewable region span.  This is to ensure that all remaining viewable regions have a chance to be rendered, and are not placed into an intermediate output stream that will not result in any further processing or rendering.  

Figures 4a-4f show examples of a viewable region after a number of iterations through our subdivision algorithm.

Figure 4a
Figure 4a: After N iterations

Figure 4b
Figure 4b: After N+1 iterations

Figure 4c
Figure 4c: After N+2 iterations

Figure 4d
Figure 4d: After N+3 iterations

Figure 4e
Figure 4e: After N+4 iterations

Figure 4f
Figure 4f: After N+5 iterations


2.4    LOD Transition Algorithm

In Stage 3 of the algorithm, a stream of viewable regions are rendered.  One viewable region R with an applied offset of Rλ may be adjacent to another viewable region with an applied offset of ½Rλ.  Without performing a smooth transition between the two viewable regions, there could be visible discontinuities or other visual anomalies when rendering.  We describe a method that offers a smooth transition between the two viewable regions.  By rendering a viewable region as a set of quadrilaterals, we are able to morph the quadrilaterals in such as way to make the boundary between the larger and smaller viewable regions indistinguishable.  This method eliminates seams, T-junctions, and visible boundaries between neighboring viewable regions of different sizes.  A similar method was described by Philip Strugar [PS*2010], although we have extended it to handle various boundary cases to ensure no cracks form anywhere within the mesh.  

Each viewable region is rendered by splitting the viewable region into 4 quadrilaterals, denoted by Q1, Q2, Q3, and Q4 (see Figure 5).  The boundary points for each quadrilateral, comprised from the collection of static boundary points ( P1, P2, P3, P4 ) and the morphing boundary points ( PL, PT, PR, PB, PC ), are defined as follows:

Q1 = { P1PTPCPL }
Q2 = { PTP2PRPC }
Q3 = { PCPRP3PB }
Q4 = { PLPCPBP4 }

Figure 5
Figure 5: Example of the viewable region quadrilateral boundary points.

Collectively, these non-overlapping quadrilaterals will cover the same surface area as the viewable region they are created from.  The boundary points of each quadrilateral are calculated to align with the boundary points of the neighboring viewable region quadrilaterals.  The morphing quadrilateral boundary points ( PLPTPRPBPC ) are calculated as follows:

PL = ( P1 × ( 1 - ½TL ) ) + ( P4 × ½TL )
PT = ( P1 × ( 1 - ½TT ) ) + ( P2 × ½TT )
PR = ( P2 × ( 1 - ½TR ) ) + ( P3 × ½TR )
PB = ( P4 × ( 1 - ½TB ) ) + ( P3 × ½TB )
PC = ( P1 × ( 1 - ½TC ) ) + ( P3 × ½TC )

Given a viewable region span θ at point P and applied offset λ, θPλ ), we are able to calculate a morphing factor TPλ ), by using the following formula:

0,θPλ ) ≤ ½θmax
TPλ ) = θPλ ) / θmax ) × 2 - 1,½θmax < θPλ ) < θmax
1,θPλ ) ≥ θmax

We calculate each of the general morphing factors ( TLTTTRTBTC ) for a viewable region R with a center position RC and applied offset Rλ as follows:  

TL = TβLRλ )
TT = TβTRλ )
TR = TβRRλ )
TB = TβBRλ )
TC = TRCRλ )

βL = ( RCx - RλRCy )
βT = ( RCxRCy + Rλ )
βR = ( RCx + RλRCy )
βB = ( RCxRCy - Rλ )
Figure 6
Figure 6: Diagram of the various positions used with each viewable region when calculating the general morphing factors.

These general morphing factors will be applied when rendering a viewable region, and assumes that the neighboring viewable regions have the same applied offset Rλ.  Figure 6 provides a diagram of the various positions used when calculating the general morphing factors.  

There are two special boundary cases we need to handle when calculating the morphing factors.  These special cases arise when one viewable region is adjacent to another viewable region of a larger or smaller applied offset λ.  The cases are defined as follows:  

Boundary Case 1:
The viewable region R with applied offset Rλ is adjacent to a smaller viewable region with applied offset ½Rλ.  

We will conditionally set the following morphing factors as follows:

TL = 1, if θβNLRλ ) ≥ θmax
TT = 1, if θβNTRλ ) ≥ θmax
TR = 1, if θβNRRλ ) ≥ θmax
TB = 1, if θβNBRλ ) ≥ θmax

βNL = ( RCx - ( 2 × Rλ ), RCy )
βNT = ( RCxRCy + ( 2 × Rλ ) )
βNR = ( RCx + ( 2 × Rλ ), RCy )
βNB = ( RCxRCy - ( 2 × Rλ ) )
Figure 7
Figure 7: Diagram of the various positions used with each viewable region when calculating the morphing factors for Boundary Case 1.

The above set of conditionals test for the cases where an adjacent viewable region has been split into smaller viewable regions.  We therefore need to lock the affected morphing factors to 1.  This is to ensure that all of the overlapping quadrilateral vertices exactly match with those of the smaller adjacent viewable region.  Figure 7 provides a diagram of the various positions used when calculating the morphing factors for Boundary Case 1.  

Boundary Case 2:
The viewable region with applied offset Rλ is adjacent to a larger viewable region with applied offset ( 2 × Rλ ).  

In order to be able to test for this case, we need to ensure we have some additional information regarding the current viewable region we are rendering.  In Stage 2 of our algorithm (explained in section 2.3), we needed to store the relative quadrant code for each viewable region.  This is so we are able to correctly calculate the center positions for larger neighboring viewable regions.  This also allows us to calculate the associated edge of the viewable region that may be adjacent to larger neighboring viewable regions.  

We will conditionally set the following morphing factors as follows, based on the relative quadrant code for the viewable region:

R1:
TL = 0, if θβFL, ( 2 × Rλ ) ) < θmax
TT = 0, if θβFT, ( 2 × Rλ ) ) < θmax
βC = ( RCx + RλRCy - Rλ )

R2:
TT = 0, if θβFT, ( 2 × Rλ ) ) < θmax
TR = 0, if θβFR, ( 2 × Rλ ) ) < θmax
βC = ( RCx - RλRCy - Rλ )

R3:
TR = 0, if θβFR, ( 2 × Rλ ) ) < θmax
TB = 0, if θβFB, ( 2 × Rλ ) ) < θmax
βC = ( RCx - RλRCy + Rλ )

R4:
TB = 0, if θβFB, ( 2 × Rλ ) ) < θmax
TL = 0, if θβFL, ( 2 × Rλ ) ) < θmax
βC = ( RCx + RλRCy + Rλ )


βFL = ( βCx - ( 4 × Rλ ), βCy )
βFT = ( βCxβCy + ( 4 × Rλ ) )
βFR = ( βCx + ( 4 × Rλ ), βCy )
βFB = ( βCxβCy - ( 4 × Rλ ) )

The above set of conditionals test for the cases where an adjacent viewable region has not been split to the same size as the current viewable region, and instead has an applied offset of ( 2 × Rλ ).  We therefore need to lock the affected morphing factors to 0.  This is to ensure that all of the overlapping quadrilateral vertices exactly match with those of the larger adjacent viewable region.  Figures 8a-8d provides diagrams of the various positions used with each viewable region when calculating the morphing factors for Boundary Case 2, based on the relative quadrant code for the viewable region.  

Figure 8a
Figure 8a: Diagram of the various positions used with each viewable region with a relative quadrant code of R1, when calculating the morphing factors for Boundary Case 2.

Figure 8b
Figure 8b: Diagram of the various positions used with each viewable region with a relative quadrant code of R2, when calculating the morphing factors for Boundary Case 2.

Figure 8c
Figure 8c: Diagram of the various positions used with each viewable region with a relative quadrant code of R3, when calculating the morphing factors for Boundary Case 2.

Figure 8d
Figure 8d: Diagram of the various positions used with each viewable region with a relative quadrant code of R4, when calculating the morphing factors for Boundary Case 2.

The various morphing stages of a single visible region are shown in Figure 9.  As a result of how the morphing quadrilateral boundary points are calculated, neighboring visible regions share adjacent boundary points.  This results in no cracks or seams when rendering, and smooth transitions when moving through a scene.  

Figure 9
Figure 9: Examples of a single viewable region through various stages of morphing.  All T values are set to 0.0 on the left, and increase up to 1.0 on the right.  

The images below show examples of rendered viewable regions without (see Figure 10a) and with (see Figure 10b) the LOD Transition Algorithm in effect.  

Figure 10a
Figure 10a: Example of rendered viewable regions without using the LOD Transition Algorithm described in this section.

Figure 10b
Figure 10b: Example of rendered viewable regions using the LOD Transition Algorithm described in this section.


2.5    Procedural Height Generation Algorithm

Any number of methods can be used in conjunction with the subdivision algorithm described in this paper.  We chose to base ours on the "Ridgid multifractal terrain model" algorithm described by F. Kenton Musgrave [EMPPW*1998].  

The use of this procedural algorithm for us resulted in highly detailed and realistic terrain, as seen in our demo video below.  The adaptive nature of the algorithm effectively eliminated high frequency noise or aliasing, and fit quite nicely with our LOD Transition Algorithm.  We used a tile-able noise texture when the "Ridgid multifractal terrain model" algorithm called for a noise value, and embedded gradient information within the noise texture to speed up the surface normal calculations.  

The height is calculated at each of the visible region quadrilateral boundary points, resulting in displaced geometry when rendering (see Figure 11c).  We calculate the surface normals on a per pixel basis to achieve even greater surface detail (see Figure 11d).  

3    Results

In our tests, the CPU utilization averaged approximately 1%.  Frame rates for our demo averaged between 40-50 fps, while running at a resolution of 1024×768.  The demo was run on a 2.9GHz AMD Phenom CPU with 8GB system memory, and a 512MB AMD RADEON HD 6450 video card.  

Figures 11a-11e shows some of the various steps taken to create the final rendered scene.  In Figure 11e, we added some diffuse lighting from a single light source, and modified the color of the terrain based on the calculated surface normal.  Greater amounts of realism can be easily added through any number of techniques, such as the use of textures, but we wanted to focus on the possible detail through the exclusive use of our algorithm.  

Figure 11a
Figure 11a: The subdivided viewable regions.

Figure 11b
Figure 11b: The subdivided viewable regions shown with the LOD Transition Algorithm applied.

Figure 11c
Figure 11c: Shown with the heights added to the viewable region quadrilateral boundary points.

Figure 11d
Figure 11d: Shown with the per-pixel surface normals.


Figure 11e
Figure 11e: The final scene, with added lighting and procedural coloring.


4    Conclusions

The results of our algorithm show promise, and we are continuing our research in this area in efforts to make further gains.  

A lot of the speed costs involved in our usage example come from the multi-layered procedural calculation of the terrain height at the specified locations.  Swapping our procedural algorithm out and using a simpler method, such as a single texture based height map, would result in much faster rendering times.  When swapping out our procedural height calculation for a simpler height map lookup, we were able to achieve frame rates well above 100 fps at the same resolution.  

One problem with our approach is the visual phenomenon described as "vertex swimming".  This can be noticed when moving towards large features that contain a high frequency of detail.  The problem can be greatly reduced by increasing the amount of subdivisions created before rendering (ie: by lowering the θmax value used in the Subdivision Algorithm).  When we swapped out our LOD Transition Algorithm and instead added logic to correct the geometry (T-junctions) at the LOD boundaries, we were able to eliminate the "vertex swimming" phenomenon.  Unfortunately, this also meant that the LOD boundaries became noticeable, and in our opinion, were more of a visual distraction than the "vertex swimming" itself.  There may be other ways to eliminate this visual distraction, and we are hopeful that this issue will be improved with continued research.  

Care must be taken to ensure high precision floating point calculations are used throughout this algorithm.  Artifacts can sometimes appear if care is not taken in this regards, and floating point errors are inadvertently allowed to compound, which can show up as sporadic pixel noise in the final rendered scene.  We solved this problem by using integer based values for our visible region center positions and applied offsets.  Our initial applied offsets are set to large factors of two, in order to accommodate the maximum number of iterative subdivisions.  By only applying a floating point conversion factor when we required world space coordinates, we were able to completely eliminate all floating point difference errors, while also scaling the data appropriately.  

Improvements to the subdivision algorithm and resulting data amplification could be made to better utilize hardware depth culling when rendering the resulting mesh.  Increasing the subdivision granularity per iteration, and adding the subdivided regions to the output streams in increasing order of average depth (nearest to furthest) would enable the final rendering pass to benefit more from depth culling.  

References

[EMPPW*1998] Ebert D. S., Musgrave F. K., Peachey D., Perlin K., Worley S.:
"Texturing and Modeling: A Procedural Approach, Second Edition"
Academic Press, 1998.

[FS*2010] Filip Strugar:
"Continuous Distance-Dependent Level of Detail for Rendering Heightmaps (CDLOD)"

[CJ*2004] Claes Johanson:
"Real-time water rendering Introducing the projected grid concept"

Demo




All product names are trademarks or registered trademarks of their respective holders.