Game Developers Conference 2009
-----------------------------------------------------------------------------------
Monday - March 23
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Math tutorial
~~~~~~~~~~~~~~~~
* Parametric equations are nice because you can go back & fwd in time and
  everything is smooth. 
* Orientation representation (quaternions)
  - to interpolate between orientations
  - translation -> adding; rotation -> multiplication
  (contents are like Real Time Rendering, comparing matrices, Euler angles, quats, etc)
  
Check slides from the web page.


News
~~~~~

OnLive, Game on-demand service:
http://www.gamedaily.com/articles/news/introducing-onlive-and-the-end-of-consoles/?biz=1

Impressions:
 * can't notice lag in the controller
 * the whole thing slightly freezes sometimes (and that was over LAN...)
 * Horikiri: it existed something similar in Japan before (SH2/3? supported)


PhyreEngine 2
http://www.ps3center.net/news/2532/sony-unintentionally-unveils-4-new-games/
vegetation support (like Flower)

GameTrak Freedom
http://www.elotrolado.net/noticia_posicionamiento-3d-para-xbox-360-y-ps3_15902

GDC awards
http://www.n4g.com/events_gdc2009/News-300708.aspx

World of Goo also available for Mac ;)
http://www.worldofgoo.com/

Tuesday - March 24
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Insomniac Games Secrets of Console and Playstation 3 Programming
==================================================================

Developing for the CELL - Mike Acton
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1st Look at Hardware!!!
 - cell chip: 8/10 of space is SPUs -> you wanna put your code there
 - most SPU is dedicate to handle data (big register file, DMA access, ...)

Cell: - not magic
      - not radical change in high-performance design
      - fun to program for

Language & compilers: C++ & assembly (both PPU/SPU; C macros)

Other processors & I/O: GPU, Blu-ray, net...

Many assets

Game vs Engine code: divisions of development

* Good solutions for the CELL will be good solutions on other platforms
  -> focus on DATA FLOW.
* high-performance code is easy to port to Cell
 - if you don't know anything about the data flow, it's gonna be hard to port

#1 "it's too hard". 
 False. Multiprocessing is not new.

#2 "Cache & DMA data design too complex".
 This is because ppl try to abstract it away from programmers. Don't hide it!
  "to DMA" data in SPU ~ memcpy in PPU
 SPU sync: fence (extremely useful), barrier, lock line reservation
 (not exist in PPU! bad for sync!)
 
#3 "My code can't be made parallel".
 Yes, it can. 

#4 "It's the language. C/C++ is no good for parallel programming".
 Bullshit. All you need is to be able to communicate with the hardware.
 Don't hide from the issues, but understand them.

#5 "But I'm just doing this one little thing..."
 If everyone goes and uses PPU ...

#6 "What's the easiest way to split programs in SPU?"

General rules:
1. DATA is more important than CODE.
2. Where there's one, there's more than one.
   Work with groups of things, not just one object. 
   Never write code for one individual case. 
   Eg. vector class is unoptimizable!
   The "domain-model design" lie (C++). Try to avoid this model.
   -> model R/ how your data looks like, rather on some idea of how
      the "world" looks like
3. sofware is not a platform
 * need to understand HW. SW doesn't live in ether.
 * the real difficulty in the unlearning

Ultimage goal: get everything on the SPUs 
 (even if incrementally)
 -> for any kind of tasks 


SPU gameplay  - Joe Valenzuela
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

glossary: 
 - mobys: no real classes, simple structs. No hierarchy.
   Data that they need to render, etc., but no game logic
 - update classes
 - AsyncMobyUpdate: Guppys, Async
 - AggregateUpdate: 
 
SPU gameplay difficulties: multiprocessor, NUMA, different ISA, just different

* your virtual function don't work => you can't DMA your objects
* => your pointers don't work
* your code doesn't compile

* Object driven update
 for (all entities) {
	Entity *e;
	e->collect_info();
	e->update();
 }
 can't amortize! Don't do that
 
 More modular udpate
 
 for (all) { e->collect_info(); }
 for (all) { e->update(); }
 
Aggregate updating
* group instances by type

SPU gameplay intro:
"shaders", like for graphics -> code fragments

AsyncMobyUpdate <- {Guppys (run entirely on SPU), AsyncEffect}
 - one code fragment per AI state, for example

Instance data (data that actually gets transformed) 
 vs Common data (common to that update group)

gameplay "shaders":
 - 32k relocatable programs
 - makefile driven process combines code, data intro fragment
 - code fragments do: dma up instances, transform instance state, position, ...
   typical game stuff (preupdate, update, ...)
about instance data:
 - not an object, a subset of an update class
 - different visibility across PPU/SPU
 
data inside update class

3 mem addressing modes: direct, direct indexed, indirect indexed

Guppys: 
 - common use: "bangles" (arms, limbs)
 - guppy instance: position/orientation EA, joint remap table, animation joints, ...
 
AsyncEffect:
 - eg. stationary effects
 
SPU invoked code:
 1. immediate
 2. deferred
   PPU shims: flags set in SPU update
   Command buffer: small buffer in LS
   atomic allocators
 3. adhoc 
 
Porting code sucks
 - can result in over-abstracted systems
 - polymorphism ~ maintain a lot of code... 
 - design from scratch

?) for multiplatform:
 - still data-centric. Divide time from where "transformation" deviates. 
 
 
SPU wrangling
Scheduling and debugging - Jonathan Garrett
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

job-manager:
- submission order
- ring buffer per SPU

job-list: PPU adds jobs to the start of the buffer
          SPU consumes jobs at the end of the buffer
job-triggering:
 - lockline waiting
GPU interaction:
       frame 0  | frame 1   | frame 2
 PPU | update 0 | update 1  | update 2
 GPU |   ...    | render 0  | render 1
 TV  |   ...    | display 0 | display 1
 SPU |   ...    | assist 0  | assist 1  	
 (SPU signals when it's finished)
 
job-def: struct that defines a job

timeouts
- PPU watchdog ensures SPU job completed within reasonable time
asserts
- print & stop
- add bloating debug-only code... -> smaller asserts with halt
 -> halt are non-exact and can't be continued!
exceptions
 - own exception-handler
 - output added to QA reports
 
SPU ABI
 - often need to debug at the asm level
 - defines register usage (inc. how parameters passed between functions)

general SPU debugging
 - simplify (always the key!): 
    * disable unrelated code, 
    * run on single SPU, ...
 - embed debug info in your SPU structs
general mem layout
 - stack grows from high address to low
 

Pre-lighting in Resistance 2 - Mark Lee
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Outline:
 - past 
 - G-buffers and pre-lighting
 - Pre-lighting stages
 - implementation tips
 - pros and cons
 
Mulipass lighting problem:
 - O(M*L)
 - too much redundant work: 
   * repeat vertex transformation for each light
   * repeat texture lookups for each light
   
One solution:
  for each mesh
     render mesh
  for each light
     render light
  O(M+L)

G-Buffer
  - caches inputs to lighting pass to multiple buffers
  - all lighting in screen space
  - also nice for post-processing
  (check RTR Third Ed.)
  
Pre-lighting / light prepass
  - like G-buffer, but
   * caches only a subset of material props (eg. normals and specular power)
     in an initial geometry pass
   * a screen space pre-lighting is done before
   
Step 1 - Depth & Normals
  - R2 used 2x MSAA
  - Write out normals when you are rendering your early depth pass
  - use primary render buffer to store normals
  - write specular power into alpha channel of normal buffer
    -> use discard in fragment program for alpha
  - the viewspace normal myth
    store viewspace x and y, and reconstruct z: 
    ! Z can go negative due to perspective projection (subtle errors)
    -> they store x,y,z
    
Step 2 - Depth resolve
  - convert MSAA to non-MSAA resolution
  - moved earlier to allow us to do stenciling optimizations on non-MSAA lighting
  
Step 3 - Accumulate sun shadows
  - from static geom, precomputed in lightmaps
  - just accum sun shadows from dynamic casters
  - min blend used to choose darkest of all inputs
  - originally used an 8-bit buffer, but then changed to 32-bit for
    stencil optimizations

Step 4 - Accumulate dynamic lights
  - diffuse & specular
  - similar approach to sun shadow buffer
  - render all spotlight shadow maps using D16 linear depth
  - for each light 
    1. lay down stencil volumes
    2. rendered SS projected quad covering light
  - single buffer vs MRT, LDR vs HDR
  (Resistance was LDR :( )
  - MSAA vs non-MSAA: diffuse, etc all non-MSAA; specular is 2x supersampled
  
  result = C(mp, sum(P(l_i,gp)))
 
  Limitations:
   - limited range of materials can be factored in this way
   - workarounds: 
     * extra storage for extra material properties
     * Eg. in R2, skin with fwd type render
   - blended materials eg. fur
  
Rendering the scene
- scene is rendered identically to before with the addition of the lighting
  and sun shadow buffer lookups
  
Implementation tips
 - reconstructing position
   * don't store it in your G-buffer
   * z = 1 view fustrum, using linear depth =>
 - reconstructing depth
   * W-buffering not supported on PS2
   * recover z/w (zOverW)  (check recovering depth sample is PS3 SDK)
 - stenciling algorithm
   * stencil shadow hardware
 - if the cam goes inside light volume
   * switch to depth fail stencil test
   * only when we have to, since this disables Z-cull optimizations
    (we need some fudge factor here)

Pros & Cons

 G-buffer 
  * requires only a single geom pass
 Pre-lighting
  * easier to retrofit into traditional rendering pipelines -> 
  -> can keep all your current shaders
  * lower mem bandwidth
  * can reuse your primary shaders for fwd rendering of alpha
 Problems of both: 
  * alpha blending is problematic
  * encoding diff material types is not elegant
  
  
Insomniac Physics - Eric Christensen
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Outline
 IG physics systems
 "shaders" (code fragments)
 lib shaders
 custom event shaders

Resistance:
 - ported from PC to PS3
 - PPU heavy
 - SPU processes blocked
 - Physics update:
   * PPU stuff
   * Run SPU collision jobs, and just wait ... (sync)
   * PPU stuff
   * Run SPU simulation jobs, + PPU simulation for big simulations...
   * PPU process results
 - physics had the largest impact on frame rate
 
Phase 2: Ratchet & Clank Future
 - collision & sim run in a single SPPU job
 - single sync-point
 - large PPU window from start of Job to End Job
 - physics update:
   * start physics SPU job, and continue
     + simulation
     + update joints
     + DMA results
   * sync
 - "shaders" helped free up local store 
   otherwise, code may feet, but not data ...
   Think about data first!
   
Physics interaction "shaders"
 - shaders are loaded into LS during the collision process and called via a 
   function table
Physics Jacobian shaders
 - shader called from another shader
 - constraints are sorted by type
 - saved 100k!
Physics Solver Shaders
 Eg. function prototype:
   SolverSim(SimPool*, Manifold*, dimensions, ManagedLS, ConstraintFunc*, ...)
 - get loaded by the main physics kernel
 - Full sim, IK, or "cheap" objects
Custom event shaders (currently 2)
 - anyone can author their own custom event shader for physics

Phase 3: Resistance 2
 - Immediate and Deferred modes
 - constraint data streaming
 - using library shaders for collision
 - Physics update:
   * start 
    + update immediate physics jobs
   * PPU work
    + deferred jobs
 - IK runs in immediate mode, because it needs to be tweaked continually
   by gameplay
 - stuff that doesn't need to be computed in one frame, deferred.
 
 Constraint Data Streaming: all the events didn't fit in LS (-> 8 chunks)
 
Current Phase
 - building of physics object lists as an SPU
 - anything that needs PPU data can be allocated in the SPU
 - use of lib shaders for broad phase collision caching

Looking fwd:
 - optimize DMAs
 - better data organization 
 
?1) strategy for load balancing
  - keep things as simple as possible
  - discuss where the next bottleneck is gonna be 
    (after moving Physics to SPU, next neck was Navigation)
?2) shader that calls other shader is resident, so it remains in LS
    after returning from the call 
?3) shaders can be loaded dynamically
?4) Jobs ~ coarse systems (Physics, etc)
    each job, fragments of code (shaders)
?5) why don't balance physics with PPU?
    PPU is already overloaded.
    The thing that remains in the PPU will always be the bottleneck.
?6) how to put the line between assembly / C / C++ approach ?
    Adhoc. Case by case...
    Design your data first, and try C++ first.
?7) you use SPURS?
    No. Just load it and throw it up.
    That kind of solution is over-solving the problem.
    Dynamic no good. We want to manage allocation manually, to enforce simplicity.
?8) how would you improve the Cell?
    Maybe small wishes here and there.
    But it's not their job to complain.
    You are given a piece of hardware, and your job is understanding and make
    things work. Saying "no, we don't work like that" is very unprofessional.
?9) Tendency is gonna be move things from GPU to SPU, because the GPU is always
    busy. You always want to render more and more stuff.    
   
 
(*) I don't like the word "shader" for small pieces of code (code fragments). 
It made "some" sense in a GPU, but not on Cell. 


Null Fairy pisses me off  (random talk) - Mike Acton
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to optimize if you don't know what you are getting?
Check where the data comes from: entry points.
The least entry points for a function, the safer.

Technical Goals in R2 (random talk) - Mike Acton
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Huge levels
huge characters
tons of characters
lots of action
massive ships, filling the sky
more dynamic lighting
improved shadows
tons of water
improved cinematics

-> learn things about scaling and LOD
Eg. lots of effort for make nice water, but from the human scale. 
  But looked up from a huge monster, it just looks flat...
-> they didn't think on characters bumping on each other, but tons 
 of characters would bump... since they didn't have the mocap data,
 they overconstrained the navigation system, so they would never
 bump onto each other! navigating became neck...
-> where to spend resources? background action? foreground?
 where is focus.
-> difficult to mix ground space and sky space (for ships)
 Different far_clip?    
-> artists had a hard time to put lights.
 They wanted physical justification.
 Sometimes they modeled lamps, where they are not needed...


Wednesday - March 25
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


Discovering new development opportunities
 Satoru Iwata - Nintendo
================================================================

Miyamoto's way:
 - look for the reason why people have fun in a particular activity
   (in particular, Miyamoto's current hobby)
 - make a prototype with a small team (sometimes, just one programmer)
 - work on multiple projects
 - trail ane error: no deadlines (2 years or more)
 - once it works, move it to production stage (+deadline)

 * "kidnap" some random Nintendo employee from time to time, and 
  check how she responds to the game ("over the shoulder" check).
  
Wii Fit install base is almost the same as PS3 install base...
(He's asking to develop for it)

Wii System Menu 4.0
- DS cards of more than 2GB can be used to store contents
- Arcade emulation added


Impressions: 
 Very interesting talk. It gives an idea of their key to success.
 
 
Programming Tips for Scalable Graphics Performance - Intel
================================================================

Myth: optimizing for Integrated limits opportunity of using 
high performance GPU capabilities
-> scale your game! scale features according to HW features

Intel Integrated Graphics (IIG) Architecture

(not interesting)


Indies SGI Roundtable
=================================================================
Goals:
 * make some kind of central place where indies developers can
   find/exchange resources 
 * list "indies-friendly" middleware companies
 * mentorship: how to start a project, admin resources, etc.


Guerrilla Tactics: KILLZONE's Art Tools and Techniques
 Jan-Bart van Beek - Art director - Guerrilla Games
================================================================

Killzone 2.
~~~~~~~~~~~~ 
 - E3 2005 prerendered. No gameconcept yet. No engine.
 - 18 months full production
 
 140 Guerrilla Games staff
 50 Sony staff
 5 outsourcing partners: mocap, anim, concept
 
Deferred Rendering 
~~~~~~~~~~~~~~~~~~~
 * check "Real Time Rendering 3rd Ed."
 * Trailer can be paused, and you can move the camera around 
 
 Cons
 - Costs about 22Mb of extra VRAM
 - No mixing of alternative lighting models
   * no cartoon rendering
   * no sub-surface scattering
   * no custom fall-off
   * no traslucent materials
 Pros
  - no lighting calcs in the shaders
  - no light limit per object
  - "infinite" amount of dynamic lights (~350 in heaviest levels)
  - "infinite" amount of shadow casting lights 
     * about 8 active usually
     * dynamic shadows cast out with distance
 
 Myth cons
  - no AA: simply solved
  - transparencies: done by 2ari and 3ari renderers
     * 2nd is full res AA - geom
  - shader flexibility

Shader Creation
~~~~~~~~~~~~~~~~~
 fully adopt Maya's hypershader workflow
 - all required maya shading nodes supported
 - WYSIWYG

Level Building Blocks
~~~~~~~~~~~~~~~~~~~~~~
to solve these problems: 
 - enormous amount of effort: 30 man-monthts for a multiplayer level
 - difficult to art direct
 - very laborious to edit (no repository)
 - much time spend on technicalities

Level BB:
 - based on Unreal Ed' Static meshes
 - BB are modelled, shaded, LOD'ed in Maya (outsourced)
 - exported into a repository for use by level artist
 - level artits place, rotate, scale BB in Maya
 - asset management: assetDB
   * became the primary tool for level art and design
 
Shader Repository
 - similar to BB
 
Impact on Workflow
 - reduction on cost (3x faster level creation)
 - easier art directional process -> higher quality art
 - easy global editing of content
 - automatic content generation rocks! -> more time to focus on artistic quality
 
Particle effects (PFX)
~~~~~~~~~~~~~~~~~~~~~~~
 - run on SPUs
 - handles about 300 systems and 5000 particles per frame
 - 200 particle collisions per frame
 - particle driven shader variables
 - low-res & full-res buffers to optimize
 
Color correction
~~~~~~~~~~~~~~~~~
 - Image-based: uses 2D image as LUT
 - ColorTweak Module; to tweak in real-time and on-target (for different TVs)
  * by object: sky, particles, foreground (gun), etc


Practical SPU Programming in God of War III 
======================================================================
Outline
- simulation of game, joypad input, etc
- scene traversal
- rendering scene

In one frame, typically, simulation (CPU) in parallel to render (GPU)
For more than 1 CPU, simulation (CPU0) || scene (CPU1) || render (GPU)

In 99% of cases, u'll be bottlenecked by either CPU or GPU
 -> create Helper CPU, and move things from the neck to the other
 In the Cell, the Helper CPU are the SPUs.
  - have affinity towards maths ops
  - mem. limitation
  - full general purpose processor (not a co-processor!)
  
SPU is super fast
 - manual optimization can speed up 48x (compiler never comes close)
SPU == PPU
 - keep code compilable on both platforms

Incrementally move parts of the systems to SPUs

They have an On-Screen Profiler
 - both PPU and SPU profilers are in sync
 - allows for easy identification of || tasks
 - very useful to detect stalled syncs, etc.

Systems of the SPU:
 - sim: anim, cloth, collision, procedural textures
 - scene: culling, shadows, push buffer generation, meta tasks
 - render: geometry conditioning, sound

Offloading the simulation
~~~~~~~~~~~~~~~~~~~~~~~~~~
 - Titans: 
   * they are moving levels -> collision for the Titans was a neck
   * provide tech to artists and designers
 - Cloth sim:
   * independent jobs, naturally parallel (Kratos loin cloth, enemies)
   * one job per cloth sim. (across 5 SPUs)
   * job dominated by processing; data volume is very low
   * simply lifting the code from PPU to SPU (DMA call)
 - Culling
   * simple frustum checks against bound spheres
   * still on PU: occluder selection, visibility bit processing
 - Push buffer generation
   * each SPU fetches a small group of model references (one batch) at a time
   * double buffer DMA, fetch model B while processing model A
   * masked memory access cost
   * adapted the PPU version to handle interleaved DMA (helped to debugging)
   * the SPU version is also the PPU version!

Offloading the GPU
~~~~~~~~~~~~~~~~~~~
 - Geometry processing
   * techniques: post processing, vertex processing, SW rasterizers
   * they focused on offloading the cost of the opaque pass
   * majority comes from vertex processing and lighting -> moved to SPU
   * pass all vertices through the SPUs
   * EDGE: geom processing library available to all PS3 devels
     + highly optimized SPU code
   * one job per drawcall
   * typical frame holds about 3000 geom job
   * most of their vertex shader is in here
   * augmented lighting calculations
     Decompress -> Skinning -> Culling -> Generate normals -> lighting code ->
      -> compress to RSX
 - Color correction
   * run as a post effect pass to give a certain (cinematic) look
   * kick a SPU jon early on to generate a cube map base on parametric input
   
Lessons learnt
~~~~~~~~~~~~~~~~
- Go parallel
  * do not special case the SPU, it's a general purpose processor
  * offload from the currently bound system (the current neck)
- No Premature optimizations! 
  * focus on user experience
  * optimize as needed
- Measure speed
  * measure before u jump! The on screen profiler is your first tool
  
?1) SPU scheduling? SPURs?
  custom module
  
?2) 6th SPU: all for sound, and some other stuff (eg. the cube for color correction)

?3) color per vertex? 
   their own hybrid propietary lighting model (maybe GDC 2010?)
   
?4) overlap between cloths and physics?
   No physics in GOWIII. So.. none.

?5) need more SPUs?
   Yes, please. Around 8 ~ 16 SPUs would be nice.
   
?6) frame rate? 
   Allow frame drops (down to 30fps). 60 fps if possible.

* at the end, they showed us a closed demo of the gameplay.

 
Thursday - March 26
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


The rendering technology of Killzone 2
 Michal Valient - Guerrilla Games
=======================================================================

Outline: (30 fps)
 - Deferred shading 
 - a diet for render targets (compress mem)
 - dirty lighting tricks
 - rendering, memory and SPUs

About deferred shading (check RTR3 and Guerrilla notes)
 1. geometry pass fills the G-buffer (depth, normals, albedo, shininess)
 2. lighting pass: accumulate lights
 + lower resolution fwd rendering for transparency
 4xRGBA8 + D24S8 = 18.8MB
 + 2x MSAA 36MB
 * packing in Cg: unpack_4ubyte(pack_2half(Normal.xy))
 * Light Accumulation Buffer (LAB)
  
 Lighting pass
  - the most expensive
  - 100+ dynamic lights per frame
  - 10+ shadow casting lights per frame
  - AA means more of everything
  
Avoid hard work where possible
 - don't run shaders
   * use ur early z/stencil cull unit
   * depth bounds test is the new cool
     reject pixels outside the z-range
   * enable conditional rendering
     run a fragment query in one pass. If the light is not visible, reject.
 - optimized light shaders (for each combination of light features)
 - fade-out shadows for small lights
 - remove small objects from shadow maps

Light pass & MSAA -> in-shader supersampling -> cheaper sampling
 * as fast as non-MSAA (check slides)

Sunlight: fullscreen directional light
 * sun shadow channel serves for ambient occlusion purposes
 * fake MSAA
 * scene is also heavily post-processed, so can't notice fake stuff
 - shadow map rendering
  * for each slice
  * shadow map changes every frame :(
   => fix: remove shadow map rotation
     * align shadow maps to WORLD instead of VIEW
     * remove sub-pixel movement

GPU
 - Push buffer (PB) building
  * multiple SPUs building PB in ||
 - fixed memory pool: blocks with ID, that RSX consumes
 
Conclusion:
 - keep it simple and straightforward
 
?1) light prepass compared with deferred rendering?
  They had it in the beginning. 
  they started to add more and more data to the G-buffers,
  and 2 passes was too much for performance.
  
?2) color correction for additional "cheating"
   Glow intensity lets them go over intensity 1 (~ HDR)
   
?3) they use RSX mem. compression


Making the "impossible" possible 
 Hideo Kojima
=====================================================================

MSX2 eg.: max 32 sprites, + than 8 sprites horizontally aligned, they 
 disappear. "Combat game", but bullets would disappear...
 => convert it to a stealth game

There's nothing "impossible". Just change the point of view.
Or reformulate the problem.


Camera based games: The next generation
 Diarmid Campbell
=====================================================================

Outline:
 Camera based games
 What is EyePet?
 improving the tech
 future research

Camera based games
 - tipically you see yourself in the pictures
 - PS Eye: 60 fps -> a lot of data!
 
Computer Vision - a hard problem!
 * to understand an image: lighting, perspective, occlusion
  -> don't even try! just extract what information you can
  - Image differencing -> motion buffer
    * sometimes gives false detection, that can be quite problematic
     -> accumulate motion before triggering
       -> but you pay a price in responsiveness
 PSEye + PS3
  - higher resolution, high sensitivity, more computing power
  
  - Optical flow: Track points of interest; compute overall motion
    Eg. trigger just using rotation motion  
    Blurbs - just using motion differencing   
 
EyePet
~~~~~~~
 (Not included in the talk: AI, rendering, anim, physics)
 
 - a virtual pet in your living room
 - create your own toys -> draw toys
   Drawings were too small ...
   Classify pixels into "paper"/"pen"
    - use adaptive threshold, to avoid gradients of light
     I - I *G and threshold the result
    - vectorize the image (extract lines)
     * but a lot of pixels are left other, or duplicated lines
      -> take skeleton first; trace lines with no duplicates
    - turn these lines into a virtual 3D object 
 - The pet reacts to you:
  * create a grid of motion buttons
  * identify motion close to the pet
  * connect motion over several framers
  * create a tracker object: 3D position, velocity and life
  * Trackers follow the object.
  
Future research
~~~~~~~~~~~~~~~~
 - head tracking: initialized with the libface
   * custom tracking looking just differences
   * libface itself not good for tracking, because the size of the box jitters
     and it's slow -> the price you pay for detecting any face
   
   demo: keeping equilibrium of ball on nose.
     - these user skills thing was missing in previous games
   
   Problems with rotations:
    -> changet to color based. Init with libface, and use histogram to compare
     problems: similar colors in the background....
      -> combine both approaches
      
 - marker based AR: Eye of Judgement
  * Prototype with ARToolKit
  Problems:
   - markers changing brightness
   - want a small marker
   
   Discard quads:
    - trace contours
    - take out vertices, one by one, so shape doesn't change much
      'till it becomes a quad.
    - if difference is small, keep quad
    - compute 4 points
    - calculate homography
      * render the patter 4 times, and compare with image
      
   Finding the marker may be hard.
    * adaptive threshold also fails
   The contour follower may fall even with correct threshold 
   -> test more threshold levels
   -> create a new contour follower
      Slow. After optimizing: 60fps on 4SPUs on assembly...
      
   -> to deal with occlussion, design a 3D marker, so one 2D marker
      is always visible.
   
?1) libs may be available to PS3 developers when they are ready
?2) next hardware may project infrared light, to detect depth
    But too expensive atm.
    
    
Tech Artists Roundtable
 Jeff Hanna
===================================================================
How do you schedule Tech Artists (TA)
* say SCRUM and I SCREAM...
* TA left out of Scrum?
* TA morphs into "support" role
  -> Scrum cards of support time
* identify the pipeline
* why is it more difficult to schedule TA than tool programmers?
  TA are part of the part of art team. They work for the artists. 
* very organized TA teams: can work on several projects
  vs. small teams, where TA have to do artist job, etc., as needed.
=> we are never gonna be able to define the role.
   There are too many roles. 

How much of a spec? -> schedule  
* tools cleanup

How to start a TA discipline?
* prove it ->
* pick up tasks that of recurrent problems and show that you can reduce bugs
 -> convince you need more ppl
 -> make tools for THEM (not for you) 
    -> better iterations (art director agrees)
* what makes a "beautiful" tool?
 - encourages creativity 
   -> artists are there to make art, not to think of technical limitations
 - datagrid controls are evil....
 - also a tool that can be easily extended...
 - how the artist remembers what is what? 
   * artist don't wanna read documentation...
   -> UI should be self-explanatory
   -> tool tip wiki is also nice

Outsourcing
 - do you provide your tools to the outsource?
  * provide docs? -> screencast is nicer... if they have Internet!
  
Where do TA come from?
 - former artists? -> bad programming practices? (bad variables names, etc)
 - former programmers? -> don't understand artists needs? 
 
 
Designing terror: Inside the Resident Evil 5 production process
 Yoshiaki Hirabayashi
==================================================================
Cinematic supervisor

topic not on production, but smaller:
[ Real-time movies in RE5 ]

Producion flow of RT movies: 
 1. coop with overseas team; 
 2. pre-visualization

1) 
 ( in slides )
  MB: motion builder
  ENG: in-house tool 
  
In RE4, mostly Japanese (just foreign half actors)
In RE5, more foreign involvement (actor, half CG team, mocap team, director) 
  Problems: different langs, cultures, work styles
   -> to solve problems by oneself, is a waste of time and resources
   -> "middleman": organization similar to management to put things in order
     -> find someone with already stablished connections, because Japanese
        companies don't know about American companies.
        -> if more animators are needed, where to go, etc.

2) previsualization
 they shifted to previsualization for:
  - just a small discrepancy between the teams 
     -> detrimental impact on the final product
  * short production time, high-quality graphics, ...
  By visualizing our goals at an early stage, 
  we ensure that everyone is on the same page 意思統一
  * It's cheaper to make changes in the early stages of development than later in CG
  
  VIDEO STORYBOARD
   - storyboard
   - CG storyboard (for just previs., like Hollywood movies),
   - video shooting of rehearsal
   
  pros
  + 意思統一
  + it also increases efficiency of mocap, CG production
  + virtual camera as reference
  
  cons
  - detailed process, so increases cost
  
New technology
~~~~~~~~~~~~~~
1) virtual camera
 it's like an AR camera, where you see the CG data interactively by moving the 
 camera.
 -> it reduces the editing time, because you can use the data from the virtual
    camera
  2 types:
   - "body capture system" (in-house) 20% (in Osaka)
   - "InterSense system" (in LA) 80%
   
2) facial capture
 2005 tests: can a Japanese mocap data be used in Western face?
  -> animators clean the data by hand
 
 they put facial capture system inside the voice recording room!
 * they wanted to put special atention to lip sync
 * it's hard to capture emotions (演技)
 * performance retakes are easy, but lip-sync retakes are hard
 
 Facial animation flow: similar to body mocap flow
 * they can treat each step of the facial animation independently
   (~ to Photoshop layers)
   
Work with MT-framework (Capcom engine)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lighting & Filter
- material color
  You can select parts, eg. just the face, and change Specular factor, etc.
  In RE4, they had to change textures.  
- light & shadow
   infinite light + light space shadow maps
   * just one factor is very hard to manipulate
   * split in 2 is easier to manipulate for artists
   * costly process
- filter: tonemap, bloom, color correct, DOF, motion blur, ...
   Color correct: almost everything u can do in Photoshop
   
MT-framework demo


Friday - March 27
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Fast GPU Histogram Analysis for Scene Post Processing
 Andy Luedke - Halo
==================================================================
histogram analysis 
 - for tone adjustment, etc

Average scene luminance - varies significantly with small perceived changes 
 in HDR scenes
  -> use histogram
   * still limited by fixed number of bins
   - can be generated on CPU, from a reduced textures
   - GPU queries to update histogram bins 
     -> low granularity, delayed scene response

Luminance histogram
 - used to find interesting exposure control points: 
  * median luminance (50th percentile)
  * bright point (90th ~ 95th percentile)
 - slow
 - not so great for exposure control

Sorted Luminance buffer
 - fixes some problems
 - expensive to sort on the CPU
 - easy to find sorted percentiles 
 
GPU sorting
 - sorts multiple channels at once (eg. luminance and depth)
 - sorted buffer remains on GPU
 - bitonic sort works weel on pixel shaders
   1/2 log_2 n * (log_2 n + 1) passes
 - scale to slower hardware by reducing size of sorting buffer
 - bitonic sort works best on power of 2 textures
 * in their game, 128 x 64

GPU exposure processing
 - shader samples sorted luminance buffer and outputs updated
  exposure control values
  
Local exposure control
 - use one channel of sort buffer as a key for another channel's sort
 RGBA = [ lum, depth, local lum, ley]
 - allows you to divide the sceen into multiple exposure zones and mix local and 
 global adjustments
 - use different region masks to customize to ur game needs
   eg.  _________
       |    |    |
       |__ / \ __|
       |   \ /   |
       |____|____|
       
       
 1. rendern main scene
 2. downsamples and compute luminance
 3. bitonic sort
 4. update exposure controls
 5. update tonemapping settings
 6. tonemapper
 
 GPU Games, Chapter 37
 GPU gems 2, 46
 UberFlow: a GPU-based particle engine
 

Real-Time Water Dynamics: Practical Rendering of Fluid Simulations
 Rama Hoetzlein - UC Santa Barbara
====================================================================
focus on surface extraction, more than simulation
* sim -> surface extraction -> rendering
 - sim: {grid based, particles}
 - surf. ext: marching cubes
 - rendering: {ray casting, polygons, point based rendering}

Simulation
 Grids 38 x 38 x 38 at 37 fps
 vs Particles 60,000 at 57 fps
 
 Particles:
 1. compute pressure from neighbors
 2. compute forces
 3. integrate
 
 Open SPH simulator: wwww.rchoetzlein.com
 
Surface extraction:
 - Metablobs O(kp)
   & Marching cubes O(n^3)
   -> render O(kpn^3)
 - Point based rendering
   Screne Space Fluid Rendering with Curvature Flow. i3D. NVIDIA
   * no lighting :(
   
 - how to do better?
  Observations: 
   1. water is highly transparent
    -> custom shader: decrease alpha proportional to highlight
   2. shadows and env. maps are critical for depth perception
   3. SSAO and DOF also important for perception
   
  Sphere Scan Conversion
   1. group streams: 
      gen true 3D surfaces for shadows, transparency, etc. Very fast
      * very comparabe to marching cubes
   2. deform geometry
   3. render
   
Holy Grail: avoid interior particles


Style in Rendering: The history and technique behind AFRO SAMURAI's look
 Bryan Johnston (Namco) designer, Danny Chan (EA) shader
=========================================================================
Story of manga & anime adaptation
-> videogame adaptation

Takashi Okazaki gave them freedom to design the game.

* outsourcing for some environments
* they kept stylizing afro
 -> more "next-gen" look by using normal maps
 
Okazaki's work
 - angularity
  Z-brush gives "blobby" surfaces, the opposite of angularity...

Uber shader
 diffuse map, normal map, specular map, light map, decal map, emissive map,
 cube map, shadow map, point lights, spot light, directional lights

 * it just looked like a pretty puppet... uncanny valley
 * distinguish from other games by making their own shaders
 -> they gave them 2 weeks...
   1. quick prototyping: NVIDIA fx composer
   2. collaborative iterations: designers & programmers
   3. flexible components: sliders, etc
   4. hand-crafted look
   5. reproducibility
   
Character shader
 - put as much as possible of the hand-crafted look in the diffuse texture
 Flats: flat colors
  + color shift ("shape" of cloth)
  + grunge layer (dirt)
  + occlusion layer (shadows)
  + sketch mark layer (lines)
  
 - toon ramp component
 eg. metallic ramp gradient -> tri-tone ramp (white, gray, black)
 artist can customize how the surface responds to lights
 * The texture is 2 dimensions, so it is possible to use another input,
   not just light (but they didn't use)
   
 - character specific hatch texture
  hatch mips ("RTR 3rd Ed." - brush paint mip map textures)
 
 - extrusion: distance order inversion
  * artifacts around some edges, proportional to distance
  * they didn't use this technique
 - border detection
  * needs to be tuned, so there aren't too many edges
  * there always problems with noisy edges
  
 - outline pass (from Namco Japan)
  Screen Space outlining
  screenNormal.xyz = InverseTransposeLocal2Projection * vertexNormal.xyz
  
 - edge light
  * it helps to distinguish him from the background (emphasizes depth,
    like unsharp mask of depth)
  * it's used to convey health -> light goes to red when he's dying
  
 - light scattering
  * better mood and atmosphere
   color = color * extinction + in-scattering
   
Summary:
  - distinguish yourself - it's easier than you think
  - experiment and throw away
  - move fast in small teams
  - licensors can be cool

?1) the hair uses a separate shader.
    It looked very bad in the beginning. So they invested more time in hair... 
?2) they took out the specular component


The human play machine
 Chaim Gingold
=======================================================================
neuroscience
~~~~~~~~~~~~~~~
culture, social, language, make-believe, space, seeking, senses, body, play

play
~~~~~
how do we play?
what tools humans use to play?
where the body ends? -> extend youself with tools

space
~~~~~~
how games handle space? 
2D maps, board games, 3D space...
hide & seek
how to experience space?
 - Japanese garden design
 - blind-folded
 - scale: katamary damacy
 - impossible space: 無限回路
 
make-believe
~~~~~~~~~~~~~
SCEI EE: Emotion-Engine
intentionality
we fill in blank spaces
superheroes
Sims 2: ordinary phantasies
boundaries of fictional space
 * mario outside the map (walking up where the score is)
disguise: "wearing" a guitar in Guitar Hero
fire camps

senses
~~~~~~~
touch in the dark
World of Goo - phantom sense of touch
visual sense - finding Waldo
sound
taste

social
~~~~~~~~~
competition
reversion of power: PacMan (chased/chaser)
cooperation: a team
common enemy
empathy - look at someone and know how they are feeling
 -> identification
 * can games communicate the same as movies?
 * Wiimocon - ppl feel like they can control it 
 (even if they can't)
nursering instinct -> pets
romantic love
#include "social_emotion.h" doesn't work...

language
~~~~~~~~~~~
initially sounds -> music
 -> gibberish (Sims) - it sounds like if they were talking
signs: STOP
poetry
scrabble, taboo

culture
~~~~~~~~
weirdness
things you wear 
katamary damacy - stuff doesn't matter, just reconfigure
GTA - social morals
Borat - challenging our world
Wario

conclusions
~~~~~~~~~~~~
DON'T game design -> DO play design
when we play, we experience many things together
 - social awareness, self-awareness, etc.
 - our existance, viscerally, in the senses
resposibility
 - what are we making? -> make it for good, not evil

(Google his name to find the slides)

Impressions:
~~~~~~~~~~~~~
 Super fast. Like a shower brainstorming. Kindda random ideas & keywords.
 

Rendering Techniques in Gears of War 2
 Nicklas Smedberg, Daniel Wright  - Epic Games
======================================================================

Epic Games - creators of Unreal Engine 3
Nicklas - Sweddish, C64 demoscene, went pro in 1997

Outline
 - Gore techniques
 - Blood techniques
 - Lighting
 
Gore
~~~~~
Gore goals:
 - no seams between breakable pieces (GoW1 used rigid 1-bone skinning)
 - dimember skin meshes (eg. ragdolls)
 - hand-modeled internals
 - minimal impact on undamaged meshes
 
Gore mesh
 - 2 versions of skinned mesh
 - undamaged mesh 
 - gore mesh with more info: 
  * full freedom to hand-model cuts and guts
  * skeleton with breakable joint constraints
    + broken by script code - not using physics engine (as in GoW 1)
    + info per gore piece (hit points, type of FX, dependency of other pieces, etc)
    + an extra set of bones of 4 weights

Tearing off a limb
 - switch to the gore mesh
 - determine which constraint to break (per case, not physics!)
 - get the pai of broken bones
 - create a separate vertexbuffer for vertex weights
  * unique per gore mesh instance
 - for each vertex
  1. if influenced by a broken bone, copy 2nd set of weights
  2. otherwise, copy 1st set of weights (from original vertex buffer)
 - add physics impulse to torn-off pieces
 
Data-driven gore
 - set up in the unreal editor
 - only used for really big enemies
 - used as visual "damage states"
 - gore pieces stacked on top of each other
 - hit points, impact radius, dependency links (leaf first), etc
 
Scripted Gore
 - set up in gameplay code (unreal Script)
 - only dismember dead people
  * no animation/gameplay consequences
 - different damage types break constraints differently
  * complete obliterations (grenades)
  - partial  (shotguns)
 - headshots spawn additional effects (gibs)
 - chainsaw breaks specific constraints
 - ragdolls break constraints based on hit location
 
 Eg. "meatshields", covering with dead bodies, etc
 
Blood techniques
~~~~~~~~~~~~~~~~~
 Many diff. techniques in combi:
 - projected blood decals
 - screen-space blood splatter
 - world-space particle effects
 - surface-space shader fx
 - fluid simulation
 - morphing geometry

Improved Decal features
 - easier to spawn decals on anything
 - supports any artist-created material (shader)
 - three decal pools 
 - heuristic to replace oldest decal
 - proximity check to avoid multiple decals on top of each other
 - AABBtree to find triangles within the decal frustum
 - index buffers
 - frustum clipping by GPU
 - decal visibility test
 - decals on fractured meshes
 - one drawcal per decal
   * allows for scissoring
   * each decal is fully featured (shaded, etc)
 - statically lit decals re-use lightmaps from underlying geometry
 - dynamically lit decals use additive multi-pass lighting
 
 Eg. blood smears
  - project decals on ground and on cover walls when you're hurt
  - project a new decal every 0.5s while moving
  - using fading "paintbrush" effect along movement direction
  - mix with standard blood decals to break up pattern

Screen-space blood effects
 - 3D particle effects that use camera-facing particles
 - allows for more dynamic "screen-space" effects (artist can define material)
 
World-space blood effects
 - used world-space coods for fake lighting on sprites and gibs
 - 4x6 "movie frames" in a sub-UV texture to get realistic motion
 
eg. surface-space
 * blood color have negative blur and green components
 
Fluid simulation
 - using Unreal Engine 3 fluid simulation feature
 - ability to raise / lower the entire fluid
 - simulated on separate thread
 - height-field tessellated on GPU
 
Morphing blood geometry (for "heart beats" etc)
 - animate individual vertices, not a skeleton
 - blend between morph targets
 
 
Character lighting
~~~~~~~~~~~~~~~~~~~
 goals:
  - integrate closely with environment
  - hundreds of lights
  - cinematic look
 
 2 types of lights:
  - main dominant lights
  - the ones used to fake GI (bounce)
  
 * evaluate Spherical Harmonics (SH) per pixel 
 
 
Screen-Space ambient occlusion optimizations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 - noisy shadows
 - spatial filter pass
 
 * 720p, 16 occlusion samples, full res, 20 filter samples
  -> too slow
 
 8 samples, 12 filter samples
  multifreq. occlusion is lost
 
 downsized render target
  - AO is mostly low frequency
  -> artifacts around silhouettes
  
 temporal filter, to hide artifacts when camera moves
  - Reverse reprojection caching
  * doesn't help when nothing is moving
  gotchas
  * accurate last frame position
  * bilinear filtering
  * double buffer
  * world-space precision
  
 computation mask
  - to avoid doing work when it's not necessary
  
  
Star Ocean 4: Flexible Shader Management and Post-Processing
 Yoshiharu Gotanda - R&D tri-Ace Inc. (for Square-Enix)
============================================================================

Aska: their 3rd gen engine
 * multiple platforms: PC, 360, NDS
 * fully sync production environment
   - result synchronized in DevKit with Maya
 * physically-based camera (focus, grain, exposure, film profile)

Flexible shader management
 - artist can create shaders in Maya, without a programmer
 - need to train artists, need to know some physics
 - shader generated at run time
 cons:
  - explosion of number of shader variations -> large shader binary
  - must create possible shader variations

 Sudvide shader nodes
 
Post-processing effects
 
AHSL
 - in-house shader language based on HLSL and CG
 - correspondance with Maya
 - shader immediate constants
 - shader cache; components of cache: key, constant table, shader binary
 - shader profile data -> supports development
 
 Problem:
 - compile shaders at run-time
 - size of the cache size
 -> decompress each shader binary at run-time
  : separated shader cache to L1 and L2
  : supported multiple shader cache files
  : created a tool to manage shader files
 - increased the size... 50 MB, > 30,000 shader combis
 - Shader Adaptors dominated most of the shader combinations
   80%
   
Cache file creation -> by the QA team
 - tough problem...

The engine was developing simultaneously with game development for earlier projects
 - many shadow algorithms implemented -> artists were using multiple and 
   for several unexpected uses...
 
System also used for 
 - particle rendering
 - post-processing  
   
Flexible post-processing
 why physics? -> unnatural effects from cameras puzzle us 
 (shot couldn't have been taken with a real camera)
 Rendering must be processed in linear color space.
 * Bokeh is not DoF. (he means the blurriness)
 * Gaussian blur is softer than real bokeh (which is flat)
 * Blend vs. per-pixel blur (for me, blend looks quite ok)
   (~ Poisson DoF from ATI)
 * Bleeding artifacts -> use a mask image
 * F-stop also decides shape:
   more circular when opened (F2.8)
   hexagonal when closed (F5.6)  
 * optical vignetting
   - corner of pix are darkened
   - "cat eye effect" (eclipse bokeh)
   -> compute attenuation curve and eclipsing ratio
 * regular motion blur
 * glare filter
 * No film profile (Reinhard adaptation) vs Film simulation
  -> more realistic to simulate negative films
 * Film grain 
   - multiply blending
   - C-MOS noise: "additive" blending; Bayer pattern x White noise

Problems:
 - artists couldn't handle so many parameters of a real camera
 -> they prepared templates in SO4 (and made some seminars)
   * still, some sims were never used

Summary
 - real camera -> changing one parameter affects another
 (isn't that a problem for designers?)
 
Future:
 - diffraction, aberration, ghost, scatter-based
   
   
Impression:
 shader framework
 - it may give more flexibility to artists, but it's a nightmare
   for programmers... Unreasonable combinations of shaders.
   Hard to optimize. 
 - Camera model too complicated:
   * for both designers and for the game console...
   * better fix those "templates" at the beginning, for both
     the sake of programmers and designers