SIGGRAPH 2011 Notes

© David Gavilan



Sunday, August 7th


11:30 am -1:30 pm 

Informal meeting

Pascal Langlois, Motives in Movement

Sebastien Hillaire, dynamixyz (Expresive Machines)



 * System to facilitate facial animation from mo-cap

 * From a set of expressions (pictures), it creates a space of expressions (eigenspace?)

 * This space is mapped onto the surface of a sphere.

 * It is also clustered, so every color on the surface of the sphere represents a cluster.

 * A path over the sphere is a plausible facial animation, with smooth transitions between expressions.

 * The sphere encodes the idiosyncrasy of the actor, so there are no unnatural expressions for that actor in the expression space.

 * The space is more rich than typical FACS.

Motives in Movement


 * Database of facial expressions, for dramatic content.

 * More clever categories or semantic annotations than just the typical 7 expressions of FACS. How this works in not disclosed yet. 

2-2:30 pm

Talk: Coherent Out-of-Core, Point-Based Global Illumination

Janne Kontkanen, Google, Inc.

Eric Tabellion, Ryan S. Overbeck, PDI/DreamWorks


 * PBGI - check 8-page paper.

 * Octree -> subdivide with Morton Code

 * Connecting points in Morton order, draws a Z-curve

 * Parallel PBGI Shading, similar to PantaRay

3:45-5:15 pm

Panel: Successful Creative Collaboration Across Time and Space

This panel discusses issues surrounding globally distributed projects in animation, games, and visual effects. Success in these ventures depends on unique production structures, review processes, universal

tool sets, and adaptation of artists and engineers to technology-mediated communication. Topics include speculation on possible future work environments and how the rising generation of artists and engineers will influence the collaboration process. Each panelist brings a specific area of expertise to the general topic and represents an organization recognized for successfully advancing industry capability with distributed projects.

Tim McLaughlin, Texas A&M University

Tim Fields, Certain Affinity, Inc.

Jonathan Gibbs, DreamWorks Animation

David A. Parrish, Reel FX Creative Studios

Steve Sullivan, Industrial Light & Magic 


 * Tight schedules and budgets forces us to do it

 * Global talent

 * Getting the best people for the job

 * To be truly successful -> Get trust -> Get to know people -> actually fly there to meet them in person

 * Some computer geeks communicate better using the network than face to face

 * Tools for collaboration

  * asset management is usually the biggest issue

  * communication is usually ok

 * Communication: how they do their work is not equal to how they talk about their work

 * People with different perspectives in life gives variety and richness to work

 * Time lag -> to minimize it, double the key persons in the different teams, and keep them sync'ed (it can become a bottleneck)

 * Cultural differences are good

 * It's important that everyone thinks it's OUR project, not that it's from my group and you are just helping…

 * Collaborations with scientists is important too

 * Creative collaboration makes things better? Yes, because of the access to talent.

 * When growing bigger, it makes sense to pay people just to track information

6–8 pm

Technical Papers Fast Forward


Monday, August 8th


9 am–12:15 pm

Course: Advances in Real-Time Rendering in Games: Part I

Natalya Tatarchuk Bungie, Inc.

Making game worlds from polygon soup: visibility, spatial hierarchy and rendering challenges

Chen, Silvennoinen, and Tatarchuk, Bungie


シーンをボクセル化して領域で分けられます。可視判定は速くなります。Umbraのculling手法についてまた「Occlusion Culling in Alan Wake」のトークに出てきます。

または、ロードバランシングの話もあります。前回のやり方のcoarse-grain parallelismだと、Xbox360では20MBもあるステートをコピーするコストが高かったです。可視判定の結果を使って、よりいいロードバランシングをやると、ステートは2MBで収まりました。

* More than pretty pixels: 

  * AI perception, activation, object attachment (~ZIP code), audibility, caching/paging

  * Path-finding, collision/physics, visibility, spatial connectivity, rendering

* In Halo, they used manual portalization, but it's non-trivial and optimized for indoor scene only!

* Think in different direction -> polygon soup!

 * Just pieces jammed together

 * no water-tightness, no manual portals, scalable...

* Automatization

 * Subdivide the scene, voxelize, segment -> connectivity graph -> simplified volumes

* 2D watershed for pathfinding works very well

* But 3D watershed is harder & slower, +oversegmentation, … 

* Collaboration with Umbra for Automatic portal generation

 * 2 stages: preprocess, runtime

 * Preprocess: discretize scene into voxels, connectivity, propagate, determine portals

 * Volumetric visibility; can query from point or from region

 * Portal culling: traverse 100K+ fast

  * screen space approach

  * fast, 16 pixels at a time with 128-bit SIMD vectors

  * works on PS3, Xbox360, PS Vita

* Game unannounced, they can't show any art…

* Halo reach

 * coarse-grain parallelism, for xbox360 -> System on a thread

  => copy the 20MB state to render (+ needs double buffering for 2 ticks)

  => impact on performance

 * deterministic engine -> game state -> explicit sync through state mirroring

 * mostly manual load-balancing

 (check slides for graphs!)

* Can we do better?

 # We don't need the entire game state for rendering (-> portals & visibility )

 * drive game extraction and processing based on visibility 

 * no need to double buffer the entire game state

 * better load balancing

 * from 20MB per frame, to just 2MB per frame!

 * Render submission job

* Future work

 * dynamic occluders

 * + use of CPU


 * no preprocessing, eg. use hardware rasterized voxels

Rendering in Cars 2

C. Hall, R. Hall, D. Edwards

前回のゲームのToyStory3では平均の30fpsで良かったですが、今回4人のスプリットスクリーンの固定の30fpsが必要でした。ライトプローブを使ったSpherical Harmonicsを用いたライティングを行っています。HDR/Tonemapの描画に関して、露出をライティングフェースで適用する手法で32bitバッファで実現できます。影に関しては4カスケードを使っていますが、4回もすべてのモデルを描画するのはもったいないので、ライトマップに動かないものを焼きます。動くものは低い解像度のバッファで描画します。

 * Previous game: Toy Story 3

 * 4-player split screen & maintain 30 fps!

 * Lightmaps, Light probes, limited dynamic

Light Probes


  * capture light from a point space -> cube maps

  * bounce lighting

  * Global probe: for outdoor lighting, either artist defined or captured, and converted to SH

  * Order 3 SH (108 bytes per probe)

 * Render cubemaps on GPU

 * Save as 16F

 * Irradiance volume

 * Coverage is not essential

 * Uniform grid volume: simple structure and easy to implement

 * Fading regions (global ~ volume)

 * Probes outside world have incorrect lighting -> detect!

 * CPU assignment -> blend in CPU -> big objects have incorrect lighting

 * Overlapping volumes -> blending is challenging

  -> time averaging

 * Env maps for reflections

 * Pack direct lighting into probes

 * Lighting overrides, defined by the artists

Reduce cost for HDR rendering


 * 32bit per pixel

 * Xbox360: 7e3 format; 

  * banding problem, because of the lose of precision

  -> to avoid this, move the exposure change to the scene render step, instead as post process

  (pre-exposed color)

 * PS3: 16F format

  * Cheaper than LogLuv 



 * 4 viewports, x4 renders….

 * Render less in the shadow maps

 * Lightmap: precalculate everything from static objects

 * + Add Low res SM: 256x256, for dynamic objects

 * MLAA and early stencil culling

 * 1/16 size deferred shadow mask

SPU post processing


 * In Toy Story 3, PS3 GPU lagged behind the Xbox360

 * source textures must reside in main memory….

 * Stereo 3D implemented as SPU postprocessing :)

  * traditional stereo 3D, render scene twice… performance cost x2! x4 for 4-player viewports! 

  * Use depth buffer to do a reprojection

  * problems as a post processing: occlusions and disocclusions, view dependent lighting, translucent objects…

  * human eyes are placed horizontal, no up-down shifts

  * Use item buffer, iterate over depth buffer and fill item buffer

  * Fill disocclusion holes with road

  * Stereo 3D for free!

Two uses of voxels in Little Big Planet 2's Graphics Engine

Alex Evans, Media Molecule

* Simple techniques! Easy to implement

* Bigger constraint is not GPU or CPU, but coding time!!!

* LBP engine requirements:  many local lights, simple code, predictable cost! (user-generated content!)

* Siggraph 2006: LBP0 irradiance volumes -> they didn't use it! 

  -> light leaking (no occlusion)

* Siggraph 2009: LBP1: Light pre-pass renderer

  - for transparency, if you place 2 transparent objects, the one behind disappears -> users made lots of invisible stuff!!! ^^;

* LBP2: proper transparency and god rays!

  - 1st voxelize the scene completely on the GPU

  - thickness is constant, because in LBP everything is like a prism (orthographic)

  - splat lights into a 180p x 8 volume

  - SH0 color intensity, SH1 light direction

  - sample voxels on real time for the skylight ambient occlusion (check code on slides!)

  - single scattering, add to the bloom buffer

   - super lo-res if the screen gets big… :(


 * based on a FLIP integrator

 * coded as 47 different SPU jobs! 

 * each SPU kernel is simple, and the boring stuff done with a PERL script

 * FLIP blends advantages of particle & grid based fluid solvers

 * 64x32x8 grid

 * gas cloud particles voxel volumes

 * gas renders into variance shadow maps

 * 2 hacks for allow for transparency in VSM

 * Gloop / water

 * Voxels & Particles! 

Secrets of CryEngine3 Graphics Technology

Sousa, Kasyan, and Schulz, Crytek

 * Linear correct HDR rendering

 * minimal g-buffer: depth & normals

 * deferred lighting

 * opaque, transparent passes

 * Z-buffer depth caveats

  - hyperbolic distribution -> needs conversion to linear space before using in shaders

  - problem: first person view objects

  - modify depth reconstruction function

  - different depth scale for first person view objects

  - idea: lineari transform VPOS from screen space S directly to target homogeneous space W (shadow space or world space)

    (check code in slides)

 * coverage buffer

  - main occlusion culling system (~ low res depth buffer)

  - downscale Zbuffer on GPU (max filter) after G-buffer pass

  - consoles are perfect for creating this C-buffer (1 frame latency)

  - problem: mismatch between frames

  - using C-buffer CPU reproduction prom prev. frame camera (2ms in SPU, 3-4 ms on Xbox360, heavily optimized with vectorized code)

 * Deferred Lighting: Ambient

  * Ourdoor/indoor, additive blended

 * Env. probes

  * artists pick important sampling locations

 * GI, SSDO, … presented in 2010

 * Shadow mask for sun

 * Point light shadows rendered directly to light buffer

 * CSM caching:

  * not all the cascades are updated during a single frame (performance reasons, PS3)

  * allows us to have more cascades - better shadow map density distribution

  * cached Shadow Maps use cached Shadow Matrices

  * distant cascades are updated less freq. 

  * last cascade uses VSM

 * Point light shadows -> big texture atlas to pack all SM

 * Soft shadows approx. -> Poisson PCF taps with randomized rotations in shadow space

 * For transparent shadow casters, accumulate alpha values of the casters

 * Translucency map generation 

 * Real time local reflections (RLR)

  - expensive with rasterization (screen-space) 

  - compute reflection vector for each pixel, raymarch along the vector

  (check slides!)

 * Contact shadows: core idea is same as SSDO

 * Deferred skin shading

  - subsurface scattering in screen space

  - self-shadow trick: ray march along screen space light vector

  - soft-alpha test, for hair

 * Camera & object motion blur

  - reprojection for static + velocity buffer for dynamic object (full res => 3ms on consoles)

  - half res

  - velocity buffer dilation

 * Bokeh DoF

 * ultra specs (DX11): 

   - single pass at full screen res for motion blur

   - avoid geometry shader for DoF (slow in some hardware!)

 * Stereo 3D

  - image space approach: reprojection

  - image offsetting in pixel shader

  - for disocclusion, tweak it to look good for the weapon in first person

  - transparent objects are not in the depth buffer! 

  - stereo view comfort 

2–5:15 pm

Course: Advances in Real-Time Rendering in Games: Part 2

The focus of this course, the next installment in the now-established series of SIGGRAPH courses on real- time rendering, is on bridging the game-development community and state-of-the-art 3D graphics research to encourage cross-pollination of knowledge for future games and other interactive applications. Presenters review the best of graphics practices and research from the game-development community and provide practical and production-proven algorithms.

Natalya Tatarchuk, Bungie, Inc.

John White, Electronic Arts Black Box

Colin Barré-Brisebois, Electronic Arts Montréal

Dimitar Lazarov, Treyarch

Vassily Fillipov, Sony Santa Monica

Hugh Malan, CCP Games

Christopher Hall, Robert Hall, David Edwards, Avalanche Software

Eric Penner, Electronic Arts Vancouver

More performance without quality compromise: 5 rendering ideas from Battlefield 3

 * Frostbite 2

 Separable Bokeh DoF

 * Compute CoC from real world camera parameters, and store in alpha

 * Gaussian blur is common in DX9 games

 * Arbitrary blurs in image space are O(n^2)

 * Separable blurs: Gaussian, box, and skewed box

  -> hexagonal blurs: decompose a hexagon into 3 rhombi

   still 7 passes and 6 blurs…. :(

   Reduce to 2 passes:  1.) up & down-left; 2) down left + down right + down right

 * scatter

 * Hi Z culling

 ZCull reverse reload

 * Hi Z: store low res Z to quickly reject pixels

 * volume rendering; in deferred renderers is common to reproject points back into the world space

 * Reverse Hi Z, cull fragments that are close

 * Use to render CSM faster! CSM cuboids

 * Min/max SM 

 Chroma sub-sampled image processing

  * Decompose image into lum and chroma

  * to accelerate post processing

  * Reduce down to luma only -> 1/4 of bandwidth required

  -> 1280x720 can be packed into 360x720 luma ARGB

 * Future work: use for hexagonal blurs; only perform temporal AA for luma

 Tile-based deferred shading in Xbox360

  1. divide screen in screen-space tiles

  2. cull analytical lights, per tile

  3. compute lighting for all lights per each light

  * show each tile in color to designers to check the lighting fits in the budget

  * GPGPU culling -> screen divided in 920 tiles of 32x32 pixels

  * -normalize(v) is faster than normalize(-v)

 Temporally-stable SSAO 

  * linearize depth for better precision/distribution 

  * sample linear depth texture with linear sampling

  * fast grayscale blur - 8 as 8888, aliasing the AO results from R8

Physically-based Lighting in Call of Duty: Black Ops

 * Forward rendering, 2x MSAAß

 * Single pass lighting

 * Constraints: one primary light per surface

  - but unlimited secondary (baked) lights

 * primary specular -> microfacet BRDF

 * secondary specular -> reconstructed from env. probe

 * exposure volumes placed by artists

 * normal map mopping act like giant mirrors! 

 -> use variance maps "Lean mapping" I3D

Real-time image quilting: Arbitrary material blends, invisible seams, and no repeats

 * a splat at each vertex

 * blend 3 splats

 * possible for artists to define a custom transition texture

Dynamic Lighting in God of War 3

 * RSX is a bottleneck -> use SPUs

 * Hybrid vertex lights

Pre-integrated Skin Shading

 * Per-channel bent normals

9–11 pm

Conference Reception



MotorStorm Apocalypse: Creating Urban Off Road Racing

3:45–5:15 pm

Technical Papers: Tone Editing

3:45–5:15 pm

The Studio Workshop: How to Write Fast iPhone and Android Shaders in Unity


Tuesday, August 9th


9 am–12:15 pm

Course: Beyond Programmable Shading I

There are strong indications that the future of interac- tive graphics programming is a more flexible model than today’s OpenGL/Direct3D pipelines. Graphics devel- opers need to have a basic understanding of how to combine emerging parallel programming techniques and more flexible graphics processors with the traditional interactive-rendering pipeline. As the first in a series, this course introduces trends and directions in this emerging field.

Michael Houston, Advanced Micro Devices, Inc.

Aaron Lefohn, Intel Corporation

Why and how is interactive rendering changing

Aaron Lefohn, Intel

* This field changes fast. Rewrite the content of this course every year.

* Key events in this course history: OpenCL & ComputeShader, DX11, GPU 3D pipeline scheduling revealed

* Interactive rendering techniques are created using an inseparable mix of data- and task-parallel algorithms and graphics pipelines. 

* Is the rise of SW graphics temporary? No, the wheel is still turning, but it takes you to a different place every time.

* Trends:

 * Move to low power devices! (Apple's fault?)

 * System on a Chip (SoC), integrated CPU-GPU chips

Research in Games

Peter-Pike Sloan, Disney

* Phases:  Concept ~ X-video -> Pre-production ~ First Playable -> Full Production ~ Vertical Slide

* X-video: concept video (offline rendering) -> done by some external studio

* Split second concept: racing + destruction, grand scale, cinematic cameras

* Pre-production research: power plays, baking technology, effects (smoke, etc), destruction…

* things that can go wrong: people moved to other projects, ship date moved, project cancelled, studio closed….

* I3D talks -> gap between research and practical use in games

* Precomputed lighting -> check slides to see what's wrong with research papers

* Papers have sometimes no intuitive parameters -> too hard for artists to set up.

* Shadows -> several blind alleys … 

* SoC

* Stacked Die Memories -> programming to the men hierarchy, men in same package, physically bonded, wheel or reincarnation? 

The "Power" of Real-Time Rendering

Raja Koduri, Apple

* HW industry has to make some decisions now, that impact all developers community

* We ate out GPU cake, "vuoi la botte piena e la moglie ubriaca"; 16+ years of (sugar) high!

* In every GPU generation

 * +performance, and +perf/W

 * +programmability

 * keep compatibility with +8 year-old APIs

* Chip power = Static power + dynamic power

 * system power = CPU + GPU + other

 * static power = leakage of inactive transistors

 * dynamic power = from active switching transistors 

 * Energy = Power(W) * Time ->  when he talks about "power" he means "energy"

* Taxonomy of powers

 * Mobile devices: 0-10 W TDP

 * Mobile computers: 10-60 W TDP

 * Desktops (always plugged): 50-300 W TDP

 * TDP: thermal design power, max amount of power that the thermal system can sustain

* Disruptive transition

 * Sub Moore's law scaling of perf. per W

 * GPU scale better than CPUs -> "Dark Silicon and the end of multicore scaling" paper in ISCA'11

 * Market shift towards lower power computers: battery life, thermal limits, acoustics, $ KWH bill

 * HW vendors compelled to succeed in mobile markets

 * Dangerous assumption: everything seen in desktops will eventually appear in low-power devices

 -> that's not true! Check HW and how they have already made hard choices

* Differences of chips

 * CPUs prioritize freq, spend N for caches, cores, …

 * GPUs, lower freq and V, spend N for shaders, textures, pixels, etc

 * FixedFunction, lowest N, F, V for a given task

* What to use?

 * If your workload parallelizes well on a GPU, use the GPU

 * Optimize of system energy

* A peek inside the power management black box

 * static power = N * V * e^(-Vt)

  * power off unused areas (power gating) -> when not in use, static power ~= 0

  * fine print: latency with power toggle, impact on performance  

 * dynamic power = A*N*C*F*V^2

  * A: Acitivity, F: Freq

  * primary OS+HW strategy is to control F & V

* Basic app optimizations for power

 #1 Control frame rate to minimum desired

  * sometimes games spend more power in the menu screen!!! Higher framerate for nothing….

  * Portal 2 option: "laptop power savings"

 #2 min rendering

 #3 don't scatter work in a frame (coalesce)

 #4 avoid spin-loops, eg. CPU waiting on GPU, this looks like real work to CPU, 

  -> eg. occlusion queries. In research papers they do this bad many times

* Scheduling for power optimization is very complex subject

* FixedFunction revenge! premature declaration of death of fixed function in GPU 

* we may need APIs to manage power

? Recommendation for tool programmers to show how much power you are using, etc.

 - currently, only show voltage, etc. But there's not enough. 

Real-Time Rendering Architecture

Mike Houston, AMD

* GPU != CPU, understand the diff 

* GPU: heterogeneous chip multi-processor, tuned for graphics

* Fragment shaders are massive parallel programs, even if we don't realize that!

* SIMD processing  => share same instructions in the different cores (per fragment)

 -> GPUs are basically ALUs! That's why they are so powerful

* branches -> not all ALUs do useful work! worst case, 1 order of magnitude less / peak performance

* SIMD processing does not imply SIMD instructions 

* Stalls: a core cannot run the next instruction because of a dependency on a prev. op. 

 * Lots of independent fragments -> interleave processing -> when one group stalls, work on another

* "SIMD-engine" -> "wavefront"

* Move data to processors -> Cache hierarchy -> cache reduces latency! Program with care

* Bandwidth is a critical resource

* GPU men system is designed for throughput

* Stop using memory!!!! Recompute stuff !!! -> sounds bizarre, but it's faster and uses less power

* Texture caches -> reuse stuff!

* Reduce offload cost -> industry shifting rapidly in this direction

* Intel Sandy Bridge, NVIDIA Tegra 2 … 

Scheduling the Graphics Pipeline

Jonathan Ragan-Kelley, MIT


* Why GPUs, APIs impose the constraints they do.

* Develop intuition for what they can do well

* Understand key patterns for building your own pipelines

* Graphics workloads are irregular

 - shaders are optimized for regular, self-similar work

 - redistribute tasks to solve this, and dynamically aggregate

* Ordering -> triangles have to be in order

* Ray tracing on a SIMD machine -> scalar ray tracing; Packet tracing

* OptiX

* Rasterization is not programmable because

 - highly irregular

 - it must generate and aggregate regular output

* Can we relax the strict ordering requirements?

* Generic scheduler?

Parallel Programming for Real-Time Graphics

Aaron Lefohn, Intel

* When a parallel programming model abstracts a HW resource code written in the programming model scales across architecture with varying amounts of that resource 

* Definitions: execution context, work, concurrent execution, parallel execution, synchronization, granularity

* Vertex shakers: "pure data parallelism" 

* Conventional thread parallelism -> nothing is abstracted

* D3D/OpenGL rendering pipeline -> don't expose parallelism to the user; sync allowed between draw calls

-> two extremes! What's interesting is the middle ground

* Explicit SIMD programming

 float16 a, b, c;  c = a+b ;

* SPMD/Implicit SIMD programming

 parallel_for() {}

 concurrent_for() {}

* Task systems (Cilk, TBB, ConcRT, GCD, …) 

 spawn xxTask();

 … sync;

* GPU compute pseudo code

High Performance Graphics on the CPU with ISPC

Matt Pharr, Intel

* new open-source compiler: ispc (Intel SPMD Program Compiler)

* run SPMD programs on the CPU

* C-based syntax (+ parallel_for() )

* Eg. ray tracer in ispc

Software Rasterization on GPUs

Samuli Laine, Jacopo Pantaleoni, NVIDIA

* Work presented in High-Performance Graphics

* They built a full pixel pipeline using CUDA

 * Obey fundamental requirements of GFX pipe, like keeping ordering

 * Hole-free rasterizer

 * As fast as possible!

 * Run everything in parallel

 * min amount of sync

 * Focus on load balancing

 * programmable shading

 * Chunker-style pipeline with 4 stages: triangle setup -> bin raster -> coarse raster -> fine raster

 * Run data in large batches

 * keep data in input order all the time

 * Chunking in 2 stages, to bins and tiles


 * fragment distribution, etc. Check paper

 * They can't match HW raster, but they are close in performance

 * Pros: extensibility (add stuff!), and specialization to individual apps (remove unnecessary stuff!)


 * VoxelPipe: a programmable pipeline for 3D voxelization

 * voxelization: find all voxels overlapped by a triangle, for collisions, indirect illumination, etc.

 * Full-featured pipe for voxelization, like OpenGL for 2d rasterization

 * can do real-time photon mapping

2–5:15 pm

Course: Beyond Programmable Shading II

Michael Houston, Advanced Micro Devices, Inc.

Aaron Lefohn, Marco Salvi, Intel Corporation

Steven G. Parker, NVIDIA Corporation

Chris Wyman, University of Iowa

State-of-the-art talks!

Toward a blurry rasterizer

Jacob Munkbek, Intel

* Simultaneous Motion Blur + DoF -> 5D tuples

* Rasterization: which pixels a triangle covers

* Create an optimized rasterizer for MB&DoF.

* Blurry rasterization: moving triangles & triangles out of focus

* Check Decoupled Sampling paper in Siggraph 2011

* Stochastic rasterization does not come for free

* Efficient shading is a requirement

* Very easy for end-user if implemented in HW

Order-independent transparency

Marco Salvi, Intel

Interactive Global Illumination

Chris Wyman, Univ. of Iowa

User-defined pipelines for ray tracing

Steve Parker, NVIDIA


Panel: What is the right cross-platform abstraction for real-time 3D rendering

Talk: Deferred Shading Technique Using Frostbite in Battlefield 3 and Need for Speed The Run

* Reqs: 4200 spot lights, 400 point lights….

* Use SPU for tile-based deferred lighting (tile size limited by SPU mem)

* Implementation details available at DICE site (Cristina)

* Spotlights -> they have more efficient tile coverage than point lights :)

* 42 ms total SPU budget, ~6 lights/tile

* Small lights moved back to GPU

* LightTile jobs run for 7ms on SPU

* Outdoor lighting implemented on GPU; reuses some classification from SPU local lighting

* HDR clears on RSX are slow -> never clear!

* Trailer NFS: super rich lighting! :)

6:30 –11:00 pm

EA Reception

Electronic Arts Campus, 4330 Sanderson Way, Burnaby (Vancouver), BC V5G 4X1


9-10:30 pm

Out of Core


Paul Strauss Google, Inc.

Google Body: 3D Human Anatomy in the Browser

Arthur Blume

Won Chun

David Kogan

Vangelis Kokkevis

Nico Weber

Rachel Weinstein Petterson Roni Zeiger

Google, Inc.

Interactive Indirect Illumination Using Voxel Cone Tracing: An Insight

Cyril Crassin

INRIA Rhone-Alpes

Fabrice Neyret CNRS/LJK/INRIA

Miguel Sainz

Simon Green NVIDIA Corporation

Elmar Eisemann

École d’Ingénieurs Télécom ParisTech

Rendering the Interactive Dynamic Natural World of the Game: From Dust

Ronan Bel

Benoît Vimont

Ubisoft Montpellier Studio

Out-of-Core GPU Ray Tracing of Complex Scenes

Kirill Garanzha

Keldysh Institute of Applied Mathematics (Russian Academy of Sciences)

Simon Premoze

Alexander Bely CentiLeo

3-3:30 pm ("1000 Points of Light", 2-3:30)

Talk: Deferred Shading Technique Using Frostbite in Battlefield 3 and Need for Speed The Run

4:30–5:15 pm

Real-Time Live!

6–8 pm

Computer Animation Festival– Electronic Theater


Wednesday, August 10th



The power of atomic assets 


* Large scale production

* Atomic asset pipeline -> Divide to gain power!

* Push updates vs Pull updates? They started with push, but now they went to pull-updates.

Animation Workflow in Killzone 3: A Fast Facial Retargeting System for Game Characters

Andrea Arghinenti, Guerrilla Games

* Direct connection from mo-cap data to rigs

* Calibration: non-destructive and it takes a second

* the calibration info is stored in anim rig, and it is always available -> you can offset it or scale it

* mo-cap is not perfect -> post animation 

* layered anim, based on poses

* poses are created from mocap data; it's an additive system

* Cons: a high marker set, difficult to shoot (6 months)

Adaptive importance sampling for multi-ray gathering

* minimize noise during ray gathering operation

* AIS: allows importance sampler to adapt to ray occlusion

* spherical mapping -> affinity map

High-resolution relightable buildings from photographs

* reconstruct the geometry and textures (albedo, shading)

* great detail and cheap!

* Limitations: 

  - need diffuse lighting conditions (no direct sunlight!)

  - no specular materials (albedo only); not good for modern buildings (glass)

  - segmentation still requires some effort from the user

  - inaccessible places (roofs)

10:45 am–12:15 pm

Course: Production Volume Rendering 1

This course begins with an introduction to generating and rendering volumes, then presents a production- usable volumetrics toolkit, focusing on the feature set and why those features are desirable. Special emphasis is focused on the approaches taken in tackling efficient data structures, shading approaches, multi-threading/ parallelization, holdouts, and motion blurring.

Magnus Wrenninge, Sony Pictures Imageworks

Nafees Bin Zafar, DreamWorks Animation

* They are working on a book that will be released next year 

* Course notes: productionvolumerendering/

* Volumetrics: clouds, mist, dust, steam, smoke, fire

 - tiny bits of matter

 - represented as mass in a volume (density)

 - fuzzy, fluffy, sparkly 

* History: 

  - cloud tank effect at "Close encounters of the third kind", cheap! 20$

  - "Independence Day" (1996) uses cloud tank effect

  - Deep shadow maps, siggraph 2000

  - storm cloudscapes in "Stealth"

Volume modeling

 - turning some data into volumetric data

 - voxel buffers (but not only!) -> open source lib, Field3D

* Noise in volume modeling -> Perlin, book "Texture & Modeling"

 - choose parameterization, and apply noise in those coordinates

* Geometry-based

 - rasterization primitives 

 - instantiation-based primitives

* Motion blur in volume modeling

 - correct motion blur is almost impossible to achieve, so cheat!, eg. smear the samples

Volume rendering

 * Signed Distance Fields (aka Level Sets)

 * CSG modeling ops (union, difference, intersection)

 * Gradient of the level set is useful for ray tracing and collision

 * Light interactions: light hits media, absorbed, scattered, freq. changed, media could emit too

 * Transmittance: fraction of light that passes through (T)

 * Opacity: Fraction of light that it is absorbed (1-T)

 * Beer's Law: relates absorption capacity to T (exponentially)

 * For 1 voxel,

  - sample density

  - calculate T_i = e^(-σρΔx)

  - lookup lighting

  - lookup material color c

  - update T

  - calculate color

  - repeat

 * Raymarcher: the renderer (generates rays and advances them)

 * 3 modules: occlusion (beer's law), shader (T), integrator (color)

 * Important to let artists to explore these parameters (extinction, etc)

Precomputed lighting

 * use shadow maps to store occlusion

 * raymarching computes transmittance

 * Deep shadow maps (Siggraph 2000)


Course: Applying Color Theory to Digital Media and Visualization

This course highlights the visual impact of specific color combinations, provides practical suggestions on color mixing, and includes a hands-on session that teaches how to build and evaluate color schemes for digital media visualization.

Theresa-Marie, Rhyne Consultant

* Additive color models: RGB

* Substractive color models: CMYK, RYB (painter's substractive, not the same blue as in RGB! )

* Color model + color gamut = color space 

* CIE LAB is closely related to Munsell color system

 - Munsell system has no orange!

* HSV, proposed in Siggraph 1978

* Art: fauvism

* Colorbrewer 2.0 to develop colormaps (Synthia Brewer)

* Adobe's Kuler tool -> it's also a community, u "Like" colormaps

4:30–5:15 pm

Real-Time Live!

6–8 pm

Computer Animation Festival– Electronic Theater


9 am–12:15 pm

Course: Stereoscopy From XY to Z


Clouds in the Skies of Rio

11:30 am–12:30 pm

International Resources Event: CG in Europe

2–5:15 pm

Course: Production Volume Rendering 2


Thursday, August 11th


9–10:45 am

Talk: Hiding Complexity

SESSION CHAIR: Theodore Kim, University of Saskatchewan

Occlusion Culling in Alan Wake

Ari Silvennoinen, Teppo Soininen, Umbra Software Ltd

Markus Mäki, Olli Tervo, Remedy Entertainment, Ltd.

* Rendering challenges

 - Game is placed in a large outdoor world

 - dynamic objects

 - local shadow rendering

 - global shadow rendering

* Occlusion culling to get rid of hidden objects, and LOD to lower res of visible ones

* only a subset fits in memory -> streaming

* scalable culling (view frustum, distance, occlusion), fast updates, etc

* Umbra architecture is based on a spatial DB and visibility queries (check fig. in slides)

* Spatial DB: 

  - axis-aligned BSP tree, with varying SAH split planes,

 - premultiplied OBB

 - lazy update: mark nodes as dirty, update when actually traversed

 - automatic amortized updates 

* Avoid latency by interleaving work (but not 100% because of GPU starvation)

* Occlusion culling is a hybrid approach

 - screen space error bounds for artifacts

* Dynamic objects live in the same DB

 - temporal bounding volumes

 - take advantage of lazy and amortized updates

 - after a while, if they stop moving, they become static

* Local shadows

 - spheres and frustums

* Global shadows

 - lots of objects + cascades -> high cost

 - cull objects in shadow

 - cull objects which do not cast a visible shadow (+ interesting)

 - "Shadow caster culling for efficient shadow mapping", I3D 2011 (better than version shipped in the game)

* Check "Advances in real time rendering" course slides. 

Increasing Scene Complexity: Distributed Vectorized View Culling

Andrew Routledge, Electronic Arts: Blackbox

* Mostly worked on Need for Speed titles

* Need for Speed Undercover visual target goal

 - reflections, 3 cascades, cube maps, viewport for back mirror …

* Why so slow? 

 - for 1000 instances, tested against every view! 

 - processed linearly, etc

* Solution: vectorized approach -> parallelize for high performance

* for current gen consoles

 - reduce men access and cache misses

 - remove branches

* NFS world

 - 8km x 8km

 - 300m sections (streamed)

 - sections defined by contents creators

* Pipeline

 - redesign the data to be cache friendly

 - drop the quad tree for a per-section, pre-filtered instance info lists. Packed & aligned for optimal cache hits.

* Runtime 

 - AABB frustum check for visibility

 - select best culling operation for view

 - submit the section view instances list for culling

 - output stream = visibility * projected pixel

* Vector implementation

 - prefetch next instance info

 - read AAB data

 - AOS -> SOA

 - (check code in slides)

* Accept if ( pixel size > per view threshold ). Eg. for reflections, more pixels required

* Parallelism, 8ms -> 0.2ms (6 SPUs on PS3)

* Next game: NFS The Run

Practical Occlusion Culling in Killzone 3

Michal Valient, Guerrilla

* They use software rasterization, running on SPUs

 - render simplified version of the scene geom in the depth buffer

* Previous solution (manually placed portals) did not scale well, for outdoor environments

* Works automatically -> can be enabled early in production

* Completely dynamic -> any object can become an occluder

* Maps well to SPUs, no sync issues, no GPU costs related to visibility testing

* Very simple system! No magic involved

* Occluder setup -> similar to what u'd do on GPU

* Rasterization

 - split 640x360p depth buffer into 16 pixel-high strips

 - should be heavily optimized (assembly)

 - compress depth buffer

* Occlusion tests

 - tests happen in parallel

 - spatial Kd-tree for each mesh

 - fast bounding sphere tests (constant time) used for fast rejects + accurate tests (conservative)

* Generating good occluders

 - didn't want artists to create occluders by hand -> fully automatic

 - use physics mesh (created by artists). It's clean, but it tends to be larger than the visual mesh.

 - they built some heuristics to identity good occluders

  - discard anything that is small

  - discard by meta data (clutter, foliage…)

  - discard if surface area is significantly smaller than BB surface area

 - artists can override the process

* Future: creating occluders is hard… voxelization? 

? Would u use GPU if it was more powerful? -> If there are SPUs, why take budget of the GPU for something that is not making pretty pixels?

High-Quality Previewing of Shading and Lighting for Killzone 3

Francesco Giordana, Guerrilla Games

* Almost everything made in Maya

* Shaders created by artists with multiple layers

* Maya scene -> Maya Core World -> Viewport renderer

* Maya core world

 - interface to engine

 - holds scene graph

 - holds callbacks manager

 - handles openGL contexts

* Maya geometry objects: geom renderable by core renderer; wrapper; no copied data; draw flags, …

* Callbacks manager: keep both scene graphs in sync all the time

* Viewport renderer: traverse the scene graph

? There stumbled upon some Maya API bugs. Feedback, etc, took time. 6/7 months, but needs continuous update.

? There are things they don't use. Eg. they can render particles, but they can't control their timeline so they don't use particles. 

2–3:30 pm

Talk: Speed of Light

SESSION CHAIR: Dylan Moore, Apple, Inc.

Runtime Implementation of Modular Radiance Transfer

Bradford J. Loos, University of Utah

Lakulish Antani, University of North Carolina at Chapel Hill

Kenny Mitchell, Disney Interactive Studios

Derek Nowrouzezahrai, Wojciech Jarosz, Disney Research Zürich

Peter-Pike Sloan, Disney Interactive Studios

(conditionally accepted Siggraph Asia 2011)

* Real-time, runs on mobile devices

* Use local parts and replicate

* MRT dictionary (pre-process): Shapes, Prior, U Textures, Interfaces

* Lighting prior: store possible direct lights in a matrix

* U Textures represent the indirect lighting as a dictionary

* Artists define a block map for each level

* Check how to apply in DirectX in the slides

* Pad light map to avoid discontinuities between blacks

* Runtime dictionary ~5.9MB

* Implemented on iOS too. 

 - check optimization details in slides

* App store release: check @farpeek

Next-Generation Image-Based Lighting Using HDR Video

Jonas Unger, Stefan Gustavson, Joel Kronander, Linköpings universitet

Gerhard Bonnet, Gunnar Kaiser, SPHERON-VR AG

* Problem of IBL:

 - A single light probe can't capture the spatial variation in illumination

* Use more light probes -> HDR video to capture

* Time varying IBL, with single light probe, still can't capture spatial variation

* Spatially varying IBL -> use thousands of points

* Recover scene proxy geometry

 - video sequence -> point cloud data -> scene proxy model

* Eg. 2 rooms -> 2h to capture

* Radiance reprojection

* View dependent 4D texture

Triple Depth Culling

Pascal Gautron, Technicolor Research & Innovation

* Real-time visibility

* Z-buffer: fragments discarded after shading! some pixels may be rendering more than once…

* Triple Depth Culling ~ to having an Occlusion Culling Unit (doesn't exist in current HW)

* Buffer alternation: Color+Depth1, Color+Depth2

 - provides access to partial depth info

* Depth batching

 - amortize the cost of alternation

 - bad depth ordering leads to over shading

 - clusters of objects with similar depths tend to span large screen areas

* Programmable culling

 - can implement naive Early-Z culling

 - can be used anywhere in the code

* Cost added with dynamic branching

Non-Uniform Motion Deblurring for Camera Shakes Using Image Registration

Sunghyun Cho, Hojin Cho, Pohang University of Science and Technology

Yu-Wing Tai, Korea Advanced Institute of Science and Technology

Seungyong Lee, Pohang University of Science and Technology

2–5:15 pm (I'll skip 2-3:30 …)

Course: Filtering Approaches for Real-Time Anti-Aliasing

This course includes an overview of both research and industry filter-based, anti-aliasing techniques in games for all modern platforms (AMD and NVIDIA GPUs, PlayStation 3, and Xbox 360), low-level insight to ease adoption of these techniques and give attendees a complete concept-to-implementation roadmap, and deep quality, performance, and ease-of-integration comparisons of each technique.

0:00 Introduction Diego Gutierrez / Jorge Jimenez, Universidad de Zaragoza

0:05 A Directionally Adaptive Edge Anti-Aliasing Filter Jason Yang, Advanced Micro Devices, Inc.

0:20 Morphological Anti-Aliasing (MLAA) Alexander Reshetov, Intel Labs

0:35 Jimenez's MLAA Jorge Jimenez, Universidad de Zaragoza

0:50 Hybrid CPU/GPU MLAA on the Xbox-360 Pete Demoreuille, Double Fine Productions, Inc.

1:05 Low latency MLAA in God of War III

Cedric Perthuis, Sony Computer Entertainment

 * Cost of MSAA2x in GoW was close to 6ms!

 * They used MLAA from Edge library

 * Where to apply MLAA?

  - after opaque and transparent objects, before all post effects

  - on color buffer only, no ID buffer

 * Interleave frame (z-prepass & shadows computed while SPU MLAA is running)

 * SPU used as GPU coprocessors.

 * Memory -> double memory everywhere

 (live PS3 demo)

1:20 PlayStation Edge MLAA

Tobias Berghoff, Sony Computer Entertainment

 * Released one week after Siggraph 2010

 * Runs on any number of SPUs with little additional overhead

   - split image horizontally over SPU cores, merge results

 * Used in many games, GoW 3, Killzone 3, LBP2, …

 * Edge detection is the key to image quality

 * Instead of global threshold, relative: Compute a threshold for pixel pair

 * In Killzone 3, they feed the light buffer for edge detection -> false edges

  - do 2-stage predicated thresholding

  - when an edge is found in non-color data, increase sensitivity for color edge detection

1:35 The Saboteur Anti-Aliasing (SPUAA) Henry Yu, Kalloc Studios

( missing ? )

1:50 Break

2:00 Subpixel Reconstruction Antialiasing (SRAA)

Morgan McGuire, NVIDIA Corporation and Williams College

 * Use FXAA!!!! He recommends using FXAA for edge antialiasing!

 * SRAA is research targeting cases where 1 spp (1 sample per pixel) morphological approaches fail

 * SRAA is good for thin wires, etc, that disappear with MLAA. 

  - it connects these pixels, by preserving thickness of lines

 * Requires some MSAA buffer

 * ~1ms on GeForce 560 at 1080p

 * Combine with other methods:

  - run FXAA on 1x color before SRAA

  - keep alpha-blending for billboards

  - primitive AA on lines or select polygons

  - compatible with SSAA, MSAA, CSAA

 * Super-sampled depth is "free"

 * Use R8 ID buffer

2:15 Fast approximate Anti-Aliasing (FXAA)

Timothy Lottes, NVIDIA Corporation

* 2 algorithms: for consoles, and for PC (+quality)

* 1 shader pass, only color input, only color output

* Deferred rendering -> memory problem if you wanna use MSAA!

* It's just a kind of blur, so it's softer than MLAA

* Use luma as ref. to detect pixels not needing AA

 - conditional replacement on PS3

 - pixels that pass, get a 2 tap filter, direction perpendicular to local luma gradient

* 1 ms/ frame on 360

* 1.2 ms/frame on PS3 (because of the branching)

* FXAA 3.11 on the PC is a trimmed-down version of MLAA

 - 0.39 ms/frame on GTX 580

* Teaser for FXAA TSSAA: use previous frame for temporal AA

2:30 Distance-to-edge Anti-Aliasing (DEAA)

Hugh Malan, CCP hf.

* Computes distance to edge in the pixel shader

* Needs extra space 

* For a given triangle, compute distance to edge in the pixel shader

 - distance_x = -v/ddx(v); distance_y = -v/ddy(v)

* Other uses of the DEAA buffer: simulate alpha to coverage

2:45 Geometry Buffer Antialiasing (GBAA)

Emil Persson, Avalanche Studios

* Basic idea: use info where edges are, already available

* 1st attempt: GPAA (check shader code in slides)

 - excels on near horizontal/vertical case

 - line rasterization, not ideal for performance

* GBAA is similar to DEAA

 - geom info stored to render-target in main pass

 - full screen resolve pass -> fixed cost

 - stores distance to edge in the major direction

  d_dir = d / ( |n.x| > |n.y| ? n.x : n.y )

* GBAA resolve problem: gaps at silhouette edges -> search immediate neighbors for their closest edge

* Shader is quite simple (check slides)

* GBAA can AA any edge, if distance can be computed/estimated

* Resolve pass: 0.09ms on current GPU

* Future work: DX9/ consoles (no geometry shader)

* Highest quality AA of all? 

2:55 Directionally Localized Anti-Aliasing (DLAA) Dmitry Andreev, Lucas Arts

3:10 Crysis 2 Anti-Aliasing

Tiago Sousa, Crytek


- Orthogonal and general solutions, no per-platform AA solution

- Play nice with HDR/deferred 

- Sub-pixel accuracy is important

- Low memory footprint

- <2ms

* Temporal AA (Motion Blur)

 - less noticeable aliasing during movement

* A-Buffer SSAA

 - accumulation buffer, brute force

 - robust and best quality

 - can't render scene multiple times, of course!

* Distribute A-Buffer SSAA over frames?

 - even at 60 fps, ghosting occurs

 - minimizing blending: reprojection

 - still disoccluded regions have ghosting

 - reprojection range clamping

* Distributed A-Buffer SSAA caveats

 - not temporally stable

 - alpha blending problematic

 - multi-GPU

* Future work

 - SSA combo with post-processed AA, ~DLAA?

* Around ~1.7ms on consoles, + cost of creating velocity buffer, but it's also used for motion blur

3:25 Wrap-up and Discussion / Q & A

3:30 Close


9 am– 2:15 pm

Course: Compiler Techniques for Rendering

This course summarizes five cutting-edge projects that apply compiler technology to improve the performance and functionality of renderers and shading systems. Topics include: customizing shading languages for global illumination and other advanced rendering, analysis of shaders so that renderers may perform physically based light transport in correct units, automatic differentiation, and use of LLVM and dynamic code generation for improved shader performance.

Larry Gritz, Sony Pictures Imageworks

Mark Leone, Weta Digital Ltd.

Steven Parker, NVIDIA Corporation

Philipp Slusallek, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH

Bruce Walter, Cornell University


Computer Animation Festival Production Session:

Guerilla: The Creation of Killzone 3– Game Production Session


Fluid Dynamics and Lighting Implementation in PixelJunk Shooter 2