Game Developers Conference 2009 ----------------------------------------------------------------------------------- Monday - March 23 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Math tutorial ~~~~~~~~~~~~~~~~ * Parametric equations are nice because you can go back & fwd in time and everything is smooth. * Orientation representation (quaternions) - to interpolate between orientations - translation -> adding; rotation -> multiplication (contents are like Real Time Rendering, comparing matrices, Euler angles, quats, etc) Check slides from the web page. News ~~~~~ OnLive, Game on-demand service: http://www.gamedaily.com/articles/news/introducing-onlive-and-the-end-of-consoles/?biz=1 Impressions: * can't notice lag in the controller * the whole thing slightly freezes sometimes (and that was over LAN...) * Horikiri: it existed something similar in Japan before (SH2/3? supported) PhyreEngine 2 http://www.ps3center.net/news/2532/sony-unintentionally-unveils-4-new-games/ vegetation support (like Flower) GameTrak Freedom http://www.elotrolado.net/noticia_posicionamiento-3d-para-xbox-360-y-ps3_15902 GDC awards http://www.n4g.com/events_gdc2009/News-300708.aspx World of Goo also available for Mac ;) http://www.worldofgoo.com/ Tuesday - March 24 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Insomniac Games Secrets of Console and Playstation 3 Programming ================================================================== Developing for the CELL - Mike Acton ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1st Look at Hardware!!! - cell chip: 8/10 of space is SPUs -> you wanna put your code there - most SPU is dedicate to handle data (big register file, DMA access, ...) Cell: - not magic - not radical change in high-performance design - fun to program for Language & compilers: C++ & assembly (both PPU/SPU; C macros) Other processors & I/O: GPU, Blu-ray, net... Many assets Game vs Engine code: divisions of development * Good solutions for the CELL will be good solutions on other platforms -> focus on DATA FLOW. * high-performance code is easy to port to Cell - if you don't know anything about the data flow, it's gonna be hard to port #1 "it's too hard". False. Multiprocessing is not new. #2 "Cache & DMA data design too complex". This is because ppl try to abstract it away from programmers. Don't hide it! "to DMA" data in SPU ~ memcpy in PPU SPU sync: fence (extremely useful), barrier, lock line reservation (not exist in PPU! bad for sync!) #3 "My code can't be made parallel". Yes, it can. #4 "It's the language. C/C++ is no good for parallel programming". Bullshit. All you need is to be able to communicate with the hardware. Don't hide from the issues, but understand them. #5 "But I'm just doing this one little thing..." If everyone goes and uses PPU ... #6 "What's the easiest way to split programs in SPU?" General rules: 1. DATA is more important than CODE. 2. Where there's one, there's more than one. Work with groups of things, not just one object. Never write code for one individual case. Eg. vector class is unoptimizable! The "domain-model design" lie (C++). Try to avoid this model. -> model R/ how your data looks like, rather on some idea of how the "world" looks like 3. sofware is not a platform * need to understand HW. SW doesn't live in ether. * the real difficulty in the unlearning Ultimage goal: get everything on the SPUs (even if incrementally) -> for any kind of tasks SPU gameplay - Joe Valenzuela ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ glossary: - mobys: no real classes, simple structs. No hierarchy. Data that they need to render, etc., but no game logic - update classes - AsyncMobyUpdate: Guppys, Async - AggregateUpdate: SPU gameplay difficulties: multiprocessor, NUMA, different ISA, just different * your virtual function don't work => you can't DMA your objects * => your pointers don't work * your code doesn't compile * Object driven update for (all entities) { Entity *e; e->collect_info(); e->update(); } can't amortize! Don't do that More modular udpate for (all) { e->collect_info(); } for (all) { e->update(); } Aggregate updating * group instances by type SPU gameplay intro: "shaders", like for graphics -> code fragments AsyncMobyUpdate <- {Guppys (run entirely on SPU), AsyncEffect} - one code fragment per AI state, for example Instance data (data that actually gets transformed) vs Common data (common to that update group) gameplay "shaders": - 32k relocatable programs - makefile driven process combines code, data intro fragment - code fragments do: dma up instances, transform instance state, position, ... typical game stuff (preupdate, update, ...) about instance data: - not an object, a subset of an update class - different visibility across PPU/SPU data inside update class 3 mem addressing modes: direct, direct indexed, indirect indexed Guppys: - common use: "bangles" (arms, limbs) - guppy instance: position/orientation EA, joint remap table, animation joints, ... AsyncEffect: - eg. stationary effects SPU invoked code: 1. immediate 2. deferred PPU shims: flags set in SPU update Command buffer: small buffer in LS atomic allocators 3. adhoc Porting code sucks - can result in over-abstracted systems - polymorphism ~ maintain a lot of code... - design from scratch ?) for multiplatform: - still data-centric. Divide time from where "transformation" deviates. SPU wrangling Scheduling and debugging - Jonathan Garrett ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ job-manager: - submission order - ring buffer per SPU job-list: PPU adds jobs to the start of the buffer SPU consumes jobs at the end of the buffer job-triggering: - lockline waiting GPU interaction: frame 0 | frame 1 | frame 2 PPU | update 0 | update 1 | update 2 GPU | ... | render 0 | render 1 TV | ... | display 0 | display 1 SPU | ... | assist 0 | assist 1 (SPU signals when it's finished) job-def: struct that defines a job timeouts - PPU watchdog ensures SPU job completed within reasonable time asserts - print & stop - add bloating debug-only code... -> smaller asserts with halt -> halt are non-exact and can't be continued! exceptions - own exception-handler - output added to QA reports SPU ABI - often need to debug at the asm level - defines register usage (inc. how parameters passed between functions) general SPU debugging - simplify (always the key!): * disable unrelated code, * run on single SPU, ... - embed debug info in your SPU structs general mem layout - stack grows from high address to low Pre-lighting in Resistance 2 - Mark Lee ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Outline: - past - G-buffers and pre-lighting - Pre-lighting stages - implementation tips - pros and cons Mulipass lighting problem: - O(M*L) - too much redundant work: * repeat vertex transformation for each light * repeat texture lookups for each light One solution: for each mesh render mesh for each light render light O(M+L) G-Buffer - caches inputs to lighting pass to multiple buffers - all lighting in screen space - also nice for post-processing (check RTR Third Ed.) Pre-lighting / light prepass - like G-buffer, but * caches only a subset of material props (eg. normals and specular power) in an initial geometry pass * a screen space pre-lighting is done before Step 1 - Depth & Normals - R2 used 2x MSAA - Write out normals when you are rendering your early depth pass - use primary render buffer to store normals - write specular power into alpha channel of normal buffer -> use discard in fragment program for alpha - the viewspace normal myth store viewspace x and y, and reconstruct z: ! Z can go negative due to perspective projection (subtle errors) -> they store x,y,z Step 2 - Depth resolve - convert MSAA to non-MSAA resolution - moved earlier to allow us to do stenciling optimizations on non-MSAA lighting Step 3 - Accumulate sun shadows - from static geom, precomputed in lightmaps - just accum sun shadows from dynamic casters - min blend used to choose darkest of all inputs - originally used an 8-bit buffer, but then changed to 32-bit for stencil optimizations Step 4 - Accumulate dynamic lights - diffuse & specular - similar approach to sun shadow buffer - render all spotlight shadow maps using D16 linear depth - for each light 1. lay down stencil volumes 2. rendered SS projected quad covering light - single buffer vs MRT, LDR vs HDR (Resistance was LDR :( ) - MSAA vs non-MSAA: diffuse, etc all non-MSAA; specular is 2x supersampled result = C(mp, sum(P(l_i,gp))) Limitations: - limited range of materials can be factored in this way - workarounds: * extra storage for extra material properties * Eg. in R2, skin with fwd type render - blended materials eg. fur Rendering the scene - scene is rendered identically to before with the addition of the lighting and sun shadow buffer lookups Implementation tips - reconstructing position * don't store it in your G-buffer * z = 1 view fustrum, using linear depth => - reconstructing depth * W-buffering not supported on PS2 * recover z/w (zOverW) (check recovering depth sample is PS3 SDK) - stenciling algorithm * stencil shadow hardware - if the cam goes inside light volume * switch to depth fail stencil test * only when we have to, since this disables Z-cull optimizations (we need some fudge factor here) Pros & Cons G-buffer * requires only a single geom pass Pre-lighting * easier to retrofit into traditional rendering pipelines -> -> can keep all your current shaders * lower mem bandwidth * can reuse your primary shaders for fwd rendering of alpha Problems of both: * alpha blending is problematic * encoding diff material types is not elegant Insomniac Physics - Eric Christensen ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Outline IG physics systems "shaders" (code fragments) lib shaders custom event shaders Resistance: - ported from PC to PS3 - PPU heavy - SPU processes blocked - Physics update: * PPU stuff * Run SPU collision jobs, and just wait ... (sync) * PPU stuff * Run SPU simulation jobs, + PPU simulation for big simulations... * PPU process results - physics had the largest impact on frame rate Phase 2: Ratchet & Clank Future - collision & sim run in a single SPPU job - single sync-point - large PPU window from start of Job to End Job - physics update: * start physics SPU job, and continue + simulation + update joints + DMA results * sync - "shaders" helped free up local store otherwise, code may feet, but not data ... Think about data first! Physics interaction "shaders" - shaders are loaded into LS during the collision process and called via a function table Physics Jacobian shaders - shader called from another shader - constraints are sorted by type - saved 100k! Physics Solver Shaders Eg. function prototype: SolverSim(SimPool*, Manifold*, dimensions, ManagedLS, ConstraintFunc*, ...) - get loaded by the main physics kernel - Full sim, IK, or "cheap" objects Custom event shaders (currently 2) - anyone can author their own custom event shader for physics Phase 3: Resistance 2 - Immediate and Deferred modes - constraint data streaming - using library shaders for collision - Physics update: * start + update immediate physics jobs * PPU work + deferred jobs - IK runs in immediate mode, because it needs to be tweaked continually by gameplay - stuff that doesn't need to be computed in one frame, deferred. Constraint Data Streaming: all the events didn't fit in LS (-> 8 chunks) Current Phase - building of physics object lists as an SPU - anything that needs PPU data can be allocated in the SPU - use of lib shaders for broad phase collision caching Looking fwd: - optimize DMAs - better data organization ?1) strategy for load balancing - keep things as simple as possible - discuss where the next bottleneck is gonna be (after moving Physics to SPU, next neck was Navigation) ?2) shader that calls other shader is resident, so it remains in LS after returning from the call ?3) shaders can be loaded dynamically ?4) Jobs ~ coarse systems (Physics, etc) each job, fragments of code (shaders) ?5) why don't balance physics with PPU? PPU is already overloaded. The thing that remains in the PPU will always be the bottleneck. ?6) how to put the line between assembly / C / C++ approach ? Adhoc. Case by case... Design your data first, and try C++ first. ?7) you use SPURS? No. Just load it and throw it up. That kind of solution is over-solving the problem. Dynamic no good. We want to manage allocation manually, to enforce simplicity. ?8) how would you improve the Cell? Maybe small wishes here and there. But it's not their job to complain. You are given a piece of hardware, and your job is understanding and make things work. Saying "no, we don't work like that" is very unprofessional. ?9) Tendency is gonna be move things from GPU to SPU, because the GPU is always busy. You always want to render more and more stuff. (*) I don't like the word "shader" for small pieces of code (code fragments). It made "some" sense in a GPU, but not on Cell. Null Fairy pisses me off (random talk) - Mike Acton ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ How to optimize if you don't know what you are getting? Check where the data comes from: entry points. The least entry points for a function, the safer. Technical Goals in R2 (random talk) - Mike Acton ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Huge levels huge characters tons of characters lots of action massive ships, filling the sky more dynamic lighting improved shadows tons of water improved cinematics -> learn things about scaling and LOD Eg. lots of effort for make nice water, but from the human scale. But looked up from a huge monster, it just looks flat... -> they didn't think on characters bumping on each other, but tons of characters would bump... since they didn't have the mocap data, they overconstrained the navigation system, so they would never bump onto each other! navigating became neck... -> where to spend resources? background action? foreground? where is focus. -> difficult to mix ground space and sky space (for ships) Different far_clip? -> artists had a hard time to put lights. They wanted physical justification. Sometimes they modeled lamps, where they are not needed... Wednesday - March 25 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Discovering new development opportunities Satoru Iwata - Nintendo ================================================================ Miyamoto's way: - look for the reason why people have fun in a particular activity (in particular, Miyamoto's current hobby) - make a prototype with a small team (sometimes, just one programmer) - work on multiple projects - trail ane error: no deadlines (2 years or more) - once it works, move it to production stage (+deadline) * "kidnap" some random Nintendo employee from time to time, and check how she responds to the game ("over the shoulder" check). Wii Fit install base is almost the same as PS3 install base... (He's asking to develop for it) Wii System Menu 4.0 - DS cards of more than 2GB can be used to store contents - Arcade emulation added Impressions: Very interesting talk. It gives an idea of their key to success. Programming Tips for Scalable Graphics Performance - Intel ================================================================ Myth: optimizing for Integrated limits opportunity of using high performance GPU capabilities -> scale your game! scale features according to HW features Intel Integrated Graphics (IIG) Architecture (not interesting) Indies SGI Roundtable ================================================================= Goals: * make some kind of central place where indies developers can find/exchange resources * list "indies-friendly" middleware companies * mentorship: how to start a project, admin resources, etc. Guerrilla Tactics: KILLZONE's Art Tools and Techniques Jan-Bart van Beek - Art director - Guerrilla Games ================================================================ Killzone 2. ~~~~~~~~~~~~ - E3 2005 prerendered. No gameconcept yet. No engine. - 18 months full production 140 Guerrilla Games staff 50 Sony staff 5 outsourcing partners: mocap, anim, concept Deferred Rendering ~~~~~~~~~~~~~~~~~~~ * check "Real Time Rendering 3rd Ed." * Trailer can be paused, and you can move the camera around Cons - Costs about 22Mb of extra VRAM - No mixing of alternative lighting models * no cartoon rendering * no sub-surface scattering * no custom fall-off * no traslucent materials Pros - no lighting calcs in the shaders - no light limit per object - "infinite" amount of dynamic lights (~350 in heaviest levels) - "infinite" amount of shadow casting lights * about 8 active usually * dynamic shadows cast out with distance Myth cons - no AA: simply solved - transparencies: done by 2ari and 3ari renderers * 2nd is full res AA - geom - shader flexibility Shader Creation ~~~~~~~~~~~~~~~~~ fully adopt Maya's hypershader workflow - all required maya shading nodes supported - WYSIWYG Level Building Blocks ~~~~~~~~~~~~~~~~~~~~~~ to solve these problems: - enormous amount of effort: 30 man-monthts for a multiplayer level - difficult to art direct - very laborious to edit (no repository) - much time spend on technicalities Level BB: - based on Unreal Ed' Static meshes - BB are modelled, shaded, LOD'ed in Maya (outsourced) - exported into a repository for use by level artist - level artits place, rotate, scale BB in Maya - asset management: assetDB * became the primary tool for level art and design Shader Repository - similar to BB Impact on Workflow - reduction on cost (3x faster level creation) - easier art directional process -> higher quality art - easy global editing of content - automatic content generation rocks! -> more time to focus on artistic quality Particle effects (PFX) ~~~~~~~~~~~~~~~~~~~~~~~ - run on SPUs - handles about 300 systems and 5000 particles per frame - 200 particle collisions per frame - particle driven shader variables - low-res & full-res buffers to optimize Color correction ~~~~~~~~~~~~~~~~~ - Image-based: uses 2D image as LUT - ColorTweak Module; to tweak in real-time and on-target (for different TVs) * by object: sky, particles, foreground (gun), etc Practical SPU Programming in God of War III ====================================================================== Outline - simulation of game, joypad input, etc - scene traversal - rendering scene In one frame, typically, simulation (CPU) in parallel to render (GPU) For more than 1 CPU, simulation (CPU0) || scene (CPU1) || render (GPU) In 99% of cases, u'll be bottlenecked by either CPU or GPU -> create Helper CPU, and move things from the neck to the other In the Cell, the Helper CPU are the SPUs. - have affinity towards maths ops - mem. limitation - full general purpose processor (not a co-processor!) SPU is super fast - manual optimization can speed up 48x (compiler never comes close) SPU == PPU - keep code compilable on both platforms Incrementally move parts of the systems to SPUs They have an On-Screen Profiler - both PPU and SPU profilers are in sync - allows for easy identification of || tasks - very useful to detect stalled syncs, etc. Systems of the SPU: - sim: anim, cloth, collision, procedural textures - scene: culling, shadows, push buffer generation, meta tasks - render: geometry conditioning, sound Offloading the simulation ~~~~~~~~~~~~~~~~~~~~~~~~~~ - Titans: * they are moving levels -> collision for the Titans was a neck * provide tech to artists and designers - Cloth sim: * independent jobs, naturally parallel (Kratos loin cloth, enemies) * one job per cloth sim. (across 5 SPUs) * job dominated by processing; data volume is very low * simply lifting the code from PPU to SPU (DMA call) - Culling * simple frustum checks against bound spheres * still on PU: occluder selection, visibility bit processing - Push buffer generation * each SPU fetches a small group of model references (one batch) at a time * double buffer DMA, fetch model B while processing model A * masked memory access cost * adapted the PPU version to handle interleaved DMA (helped to debugging) * the SPU version is also the PPU version! Offloading the GPU ~~~~~~~~~~~~~~~~~~~ - Geometry processing * techniques: post processing, vertex processing, SW rasterizers * they focused on offloading the cost of the opaque pass * majority comes from vertex processing and lighting -> moved to SPU * pass all vertices through the SPUs * EDGE: geom processing library available to all PS3 devels + highly optimized SPU code * one job per drawcall * typical frame holds about 3000 geom job * most of their vertex shader is in here * augmented lighting calculations Decompress -> Skinning -> Culling -> Generate normals -> lighting code -> -> compress to RSX - Color correction * run as a post effect pass to give a certain (cinematic) look * kick a SPU jon early on to generate a cube map base on parametric input Lessons learnt ~~~~~~~~~~~~~~~~ - Go parallel * do not special case the SPU, it's a general purpose processor * offload from the currently bound system (the current neck) - No Premature optimizations! * focus on user experience * optimize as needed - Measure speed * measure before u jump! The on screen profiler is your first tool ?1) SPU scheduling? SPURs? custom module ?2) 6th SPU: all for sound, and some other stuff (eg. the cube for color correction) ?3) color per vertex? their own hybrid propietary lighting model (maybe GDC 2010?) ?4) overlap between cloths and physics? No physics in GOWIII. So.. none. ?5) need more SPUs? Yes, please. Around 8 ~ 16 SPUs would be nice. ?6) frame rate? Allow frame drops (down to 30fps). 60 fps if possible. * at the end, they showed us a closed demo of the gameplay. Thursday - March 26 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| The rendering technology of Killzone 2 Michal Valient - Guerrilla Games ======================================================================= Outline: (30 fps) - Deferred shading - a diet for render targets (compress mem) - dirty lighting tricks - rendering, memory and SPUs About deferred shading (check RTR3 and Guerrilla notes) 1. geometry pass fills the G-buffer (depth, normals, albedo, shininess) 2. lighting pass: accumulate lights + lower resolution fwd rendering for transparency 4xRGBA8 + D24S8 = 18.8MB + 2x MSAA 36MB * packing in Cg: unpack_4ubyte(pack_2half(Normal.xy)) * Light Accumulation Buffer (LAB) Lighting pass - the most expensive - 100+ dynamic lights per frame - 10+ shadow casting lights per frame - AA means more of everything Avoid hard work where possible - don't run shaders * use ur early z/stencil cull unit * depth bounds test is the new cool reject pixels outside the z-range * enable conditional rendering run a fragment query in one pass. If the light is not visible, reject. - optimized light shaders (for each combination of light features) - fade-out shadows for small lights - remove small objects from shadow maps Light pass & MSAA -> in-shader supersampling -> cheaper sampling * as fast as non-MSAA (check slides) Sunlight: fullscreen directional light * sun shadow channel serves for ambient occlusion purposes * fake MSAA * scene is also heavily post-processed, so can't notice fake stuff - shadow map rendering * for each slice * shadow map changes every frame :( => fix: remove shadow map rotation * align shadow maps to WORLD instead of VIEW * remove sub-pixel movement GPU - Push buffer (PB) building * multiple SPUs building PB in || - fixed memory pool: blocks with ID, that RSX consumes Conclusion: - keep it simple and straightforward ?1) light prepass compared with deferred rendering? They had it in the beginning. they started to add more and more data to the G-buffers, and 2 passes was too much for performance. ?2) color correction for additional "cheating" Glow intensity lets them go over intensity 1 (~ HDR) ?3) they use RSX mem. compression Making the "impossible" possible Hideo Kojima ===================================================================== MSX2 eg.: max 32 sprites, + than 8 sprites horizontally aligned, they disappear. "Combat game", but bullets would disappear... => convert it to a stealth game There's nothing "impossible". Just change the point of view. Or reformulate the problem. Camera based games: The next generation Diarmid Campbell ===================================================================== Outline: Camera based games What is EyePet? improving the tech future research Camera based games - tipically you see yourself in the pictures - PS Eye: 60 fps -> a lot of data! Computer Vision - a hard problem! * to understand an image: lighting, perspective, occlusion -> don't even try! just extract what information you can - Image differencing -> motion buffer * sometimes gives false detection, that can be quite problematic -> accumulate motion before triggering -> but you pay a price in responsiveness PSEye + PS3 - higher resolution, high sensitivity, more computing power - Optical flow: Track points of interest; compute overall motion Eg. trigger just using rotation motion Blurbs - just using motion differencing EyePet ~~~~~~~ (Not included in the talk: AI, rendering, anim, physics) - a virtual pet in your living room - create your own toys -> draw toys Drawings were too small ... Classify pixels into "paper"/"pen" - use adaptive threshold, to avoid gradients of light I - I *G and threshold the result - vectorize the image (extract lines) * but a lot of pixels are left other, or duplicated lines -> take skeleton first; trace lines with no duplicates - turn these lines into a virtual 3D object - The pet reacts to you: * create a grid of motion buttons * identify motion close to the pet * connect motion over several framers * create a tracker object: 3D position, velocity and life * Trackers follow the object. Future research ~~~~~~~~~~~~~~~~ - head tracking: initialized with the libface * custom tracking looking just differences * libface itself not good for tracking, because the size of the box jitters and it's slow -> the price you pay for detecting any face demo: keeping equilibrium of ball on nose. - these user skills thing was missing in previous games Problems with rotations: -> changet to color based. Init with libface, and use histogram to compare problems: similar colors in the background.... -> combine both approaches - marker based AR: Eye of Judgement * Prototype with ARToolKit Problems: - markers changing brightness - want a small marker Discard quads: - trace contours - take out vertices, one by one, so shape doesn't change much 'till it becomes a quad. - if difference is small, keep quad - compute 4 points - calculate homography * render the patter 4 times, and compare with image Finding the marker may be hard. * adaptive threshold also fails The contour follower may fall even with correct threshold -> test more threshold levels -> create a new contour follower Slow. After optimizing: 60fps on 4SPUs on assembly... -> to deal with occlussion, design a 3D marker, so one 2D marker is always visible. ?1) libs may be available to PS3 developers when they are ready ?2) next hardware may project infrared light, to detect depth But too expensive atm. Tech Artists Roundtable Jeff Hanna =================================================================== How do you schedule Tech Artists (TA) * say SCRUM and I SCREAM... * TA left out of Scrum? * TA morphs into "support" role -> Scrum cards of support time * identify the pipeline * why is it more difficult to schedule TA than tool programmers? TA are part of the part of art team. They work for the artists. * very organized TA teams: can work on several projects vs. small teams, where TA have to do artist job, etc., as needed. => we are never gonna be able to define the role. There are too many roles. How much of a spec? -> schedule * tools cleanup How to start a TA discipline? * prove it -> * pick up tasks that of recurrent problems and show that you can reduce bugs -> convince you need more ppl -> make tools for THEM (not for you) -> better iterations (art director agrees) * what makes a "beautiful" tool? - encourages creativity -> artists are there to make art, not to think of technical limitations - datagrid controls are evil.... - also a tool that can be easily extended... - how the artist remembers what is what? * artist don't wanna read documentation... -> UI should be self-explanatory -> tool tip wiki is also nice Outsourcing - do you provide your tools to the outsource? * provide docs? -> screencast is nicer... if they have Internet! Where do TA come from? - former artists? -> bad programming practices? (bad variables names, etc) - former programmers? -> don't understand artists needs? Designing terror: Inside the Resident Evil 5 production process Yoshiaki Hirabayashi ================================================================== Cinematic supervisor topic not on production, but smaller: [ Real-time movies in RE5 ] Producion flow of RT movies: 1. coop with overseas team; 2. pre-visualization 1) ( in slides ) MB: motion builder ENG: in-house tool In RE4, mostly Japanese (just foreign half actors) In RE5, more foreign involvement (actor, half CG team, mocap team, director) Problems: different langs, cultures, work styles -> to solve problems by oneself, is a waste of time and resources -> "middleman": organization similar to management to put things in order -> find someone with already stablished connections, because Japanese companies don't know about American companies. -> if more animators are needed, where to go, etc. 2) previsualization they shifted to previsualization for: - just a small discrepancy between the teams -> detrimental impact on the final product * short production time, high-quality graphics, ... By visualizing our goals at an early stage, we ensure that everyone is on the same page 意思統一 * It's cheaper to make changes in the early stages of development than later in CG VIDEO STORYBOARD - storyboard - CG storyboard (for just previs., like Hollywood movies), - video shooting of rehearsal pros + 意思統一 + it also increases efficiency of mocap, CG production + virtual camera as reference cons - detailed process, so increases cost New technology ~~~~~~~~~~~~~~ 1) virtual camera it's like an AR camera, where you see the CG data interactively by moving the camera. -> it reduces the editing time, because you can use the data from the virtual camera 2 types: - "body capture system" (in-house) 20% (in Osaka) - "InterSense system" (in LA) 80% 2) facial capture 2005 tests: can a Japanese mocap data be used in Western face? -> animators clean the data by hand they put facial capture system inside the voice recording room! * they wanted to put special atention to lip sync * it's hard to capture emotions (演技) * performance retakes are easy, but lip-sync retakes are hard Facial animation flow: similar to body mocap flow * they can treat each step of the facial animation independently (~ to Photoshop layers) Work with MT-framework (Capcom engine) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lighting & Filter - material color You can select parts, eg. just the face, and change Specular factor, etc. In RE4, they had to change textures. - light & shadow infinite light + light space shadow maps * just one factor is very hard to manipulate * split in 2 is easier to manipulate for artists * costly process - filter: tonemap, bloom, color correct, DOF, motion blur, ... Color correct: almost everything u can do in Photoshop MT-framework demo Friday - March 27 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Fast GPU Histogram Analysis for Scene Post Processing Andy Luedke - Halo ================================================================== histogram analysis - for tone adjustment, etc Average scene luminance - varies significantly with small perceived changes in HDR scenes -> use histogram * still limited by fixed number of bins - can be generated on CPU, from a reduced textures - GPU queries to update histogram bins -> low granularity, delayed scene response Luminance histogram - used to find interesting exposure control points: * median luminance (50th percentile) * bright point (90th ~ 95th percentile) - slow - not so great for exposure control Sorted Luminance buffer - fixes some problems - expensive to sort on the CPU - easy to find sorted percentiles GPU sorting - sorts multiple channels at once (eg. luminance and depth) - sorted buffer remains on GPU - bitonic sort works weel on pixel shaders 1/2 log_2 n * (log_2 n + 1) passes - scale to slower hardware by reducing size of sorting buffer - bitonic sort works best on power of 2 textures * in their game, 128 x 64 GPU exposure processing - shader samples sorted luminance buffer and outputs updated exposure control values Local exposure control - use one channel of sort buffer as a key for another channel's sort RGBA = [ lum, depth, local lum, ley] - allows you to divide the sceen into multiple exposure zones and mix local and global adjustments - use different region masks to customize to ur game needs eg. _________ | | | |__ / \ __| | \ / | |____|____| 1. rendern main scene 2. downsamples and compute luminance 3. bitonic sort 4. update exposure controls 5. update tonemapping settings 6. tonemapper GPU Games, Chapter 37 GPU gems 2, 46 UberFlow: a GPU-based particle engine Real-Time Water Dynamics: Practical Rendering of Fluid Simulations Rama Hoetzlein - UC Santa Barbara ==================================================================== focus on surface extraction, more than simulation * sim -> surface extraction -> rendering - sim: {grid based, particles} - surf. ext: marching cubes - rendering: {ray casting, polygons, point based rendering} Simulation Grids 38 x 38 x 38 at 37 fps vs Particles 60,000 at 57 fps Particles: 1. compute pressure from neighbors 2. compute forces 3. integrate Open SPH simulator: wwww.rchoetzlein.com Surface extraction: - Metablobs O(kp) & Marching cubes O(n^3) -> render O(kpn^3) - Point based rendering Screne Space Fluid Rendering with Curvature Flow. i3D. NVIDIA * no lighting :( - how to do better? Observations: 1. water is highly transparent -> custom shader: decrease alpha proportional to highlight 2. shadows and env. maps are critical for depth perception 3. SSAO and DOF also important for perception Sphere Scan Conversion 1. group streams: gen true 3D surfaces for shadows, transparency, etc. Very fast * very comparabe to marching cubes 2. deform geometry 3. render Holy Grail: avoid interior particles Style in Rendering: The history and technique behind AFRO SAMURAI's look Bryan Johnston (Namco) designer, Danny Chan (EA) shader ========================================================================= Story of manga & anime adaptation -> videogame adaptation Takashi Okazaki gave them freedom to design the game. * outsourcing for some environments * they kept stylizing afro -> more "next-gen" look by using normal maps Okazaki's work - angularity Z-brush gives "blobby" surfaces, the opposite of angularity... Uber shader diffuse map, normal map, specular map, light map, decal map, emissive map, cube map, shadow map, point lights, spot light, directional lights * it just looked like a pretty puppet... uncanny valley * distinguish from other games by making their own shaders -> they gave them 2 weeks... 1. quick prototyping: NVIDIA fx composer 2. collaborative iterations: designers & programmers 3. flexible components: sliders, etc 4. hand-crafted look 5. reproducibility Character shader - put as much as possible of the hand-crafted look in the diffuse texture Flats: flat colors + color shift ("shape" of cloth) + grunge layer (dirt) + occlusion layer (shadows) + sketch mark layer (lines) - toon ramp component eg. metallic ramp gradient -> tri-tone ramp (white, gray, black) artist can customize how the surface responds to lights * The texture is 2 dimensions, so it is possible to use another input, not just light (but they didn't use) - character specific hatch texture hatch mips ("RTR 3rd Ed." - brush paint mip map textures) - extrusion: distance order inversion * artifacts around some edges, proportional to distance * they didn't use this technique - border detection * needs to be tuned, so there aren't too many edges * there always problems with noisy edges - outline pass (from Namco Japan) Screen Space outlining screenNormal.xyz = InverseTransposeLocal2Projection * vertexNormal.xyz - edge light * it helps to distinguish him from the background (emphasizes depth, like unsharp mask of depth) * it's used to convey health -> light goes to red when he's dying - light scattering * better mood and atmosphere color = color * extinction + in-scattering Summary: - distinguish yourself - it's easier than you think - experiment and throw away - move fast in small teams - licensors can be cool ?1) the hair uses a separate shader. It looked very bad in the beginning. So they invested more time in hair... ?2) they took out the specular component The human play machine Chaim Gingold ======================================================================= neuroscience ~~~~~~~~~~~~~~~ culture, social, language, make-believe, space, seeking, senses, body, play play ~~~~~ how do we play? what tools humans use to play? where the body ends? -> extend youself with tools space ~~~~~~ how games handle space? 2D maps, board games, 3D space... hide & seek how to experience space? - Japanese garden design - blind-folded - scale: katamary damacy - impossible space: 無限回路 make-believe ~~~~~~~~~~~~~ SCEI EE: Emotion-Engine intentionality we fill in blank spaces superheroes Sims 2: ordinary phantasies boundaries of fictional space * mario outside the map (walking up where the score is) disguise: "wearing" a guitar in Guitar Hero fire camps senses ~~~~~~~ touch in the dark World of Goo - phantom sense of touch visual sense - finding Waldo sound taste social ~~~~~~~~~ competition reversion of power: PacMan (chased/chaser) cooperation: a team common enemy empathy - look at someone and know how they are feeling -> identification * can games communicate the same as movies? * Wiimocon - ppl feel like they can control it (even if they can't) nursering instinct -> pets romantic love #include "social_emotion.h" doesn't work... language ~~~~~~~~~~~ initially sounds -> music -> gibberish (Sims) - it sounds like if they were talking signs: STOP poetry scrabble, taboo culture ~~~~~~~~ weirdness things you wear katamary damacy - stuff doesn't matter, just reconfigure GTA - social morals Borat - challenging our world Wario conclusions ~~~~~~~~~~~~ DON'T game design -> DO play design when we play, we experience many things together - social awareness, self-awareness, etc. - our existance, viscerally, in the senses resposibility - what are we making? -> make it for good, not evil (Google his name to find the slides) Impressions: ~~~~~~~~~~~~~ Super fast. Like a shower brainstorming. Kindda random ideas & keywords. Rendering Techniques in Gears of War 2 Nicklas Smedberg, Daniel Wright - Epic Games ====================================================================== Epic Games - creators of Unreal Engine 3 Nicklas - Sweddish, C64 demoscene, went pro in 1997 Outline - Gore techniques - Blood techniques - Lighting Gore ~~~~~ Gore goals: - no seams between breakable pieces (GoW1 used rigid 1-bone skinning) - dimember skin meshes (eg. ragdolls) - hand-modeled internals - minimal impact on undamaged meshes Gore mesh - 2 versions of skinned mesh - undamaged mesh - gore mesh with more info: * full freedom to hand-model cuts and guts * skeleton with breakable joint constraints + broken by script code - not using physics engine (as in GoW 1) + info per gore piece (hit points, type of FX, dependency of other pieces, etc) + an extra set of bones of 4 weights Tearing off a limb - switch to the gore mesh - determine which constraint to break (per case, not physics!) - get the pai of broken bones - create a separate vertexbuffer for vertex weights * unique per gore mesh instance - for each vertex 1. if influenced by a broken bone, copy 2nd set of weights 2. otherwise, copy 1st set of weights (from original vertex buffer) - add physics impulse to torn-off pieces Data-driven gore - set up in the unreal editor - only used for really big enemies - used as visual "damage states" - gore pieces stacked on top of each other - hit points, impact radius, dependency links (leaf first), etc Scripted Gore - set up in gameplay code (unreal Script) - only dismember dead people * no animation/gameplay consequences - different damage types break constraints differently * complete obliterations (grenades) - partial (shotguns) - headshots spawn additional effects (gibs) - chainsaw breaks specific constraints - ragdolls break constraints based on hit location Eg. "meatshields", covering with dead bodies, etc Blood techniques ~~~~~~~~~~~~~~~~~ Many diff. techniques in combi: - projected blood decals - screen-space blood splatter - world-space particle effects - surface-space shader fx - fluid simulation - morphing geometry Improved Decal features - easier to spawn decals on anything - supports any artist-created material (shader) - three decal pools - heuristic to replace oldest decal - proximity check to avoid multiple decals on top of each other - AABBtree to find triangles within the decal frustum - index buffers - frustum clipping by GPU - decal visibility test - decals on fractured meshes - one drawcal per decal * allows for scissoring * each decal is fully featured (shaded, etc) - statically lit decals re-use lightmaps from underlying geometry - dynamically lit decals use additive multi-pass lighting Eg. blood smears - project decals on ground and on cover walls when you're hurt - project a new decal every 0.5s while moving - using fading "paintbrush" effect along movement direction - mix with standard blood decals to break up pattern Screen-space blood effects - 3D particle effects that use camera-facing particles - allows for more dynamic "screen-space" effects (artist can define material) World-space blood effects - used world-space coods for fake lighting on sprites and gibs - 4x6 "movie frames" in a sub-UV texture to get realistic motion eg. surface-space * blood color have negative blur and green components Fluid simulation - using Unreal Engine 3 fluid simulation feature - ability to raise / lower the entire fluid - simulated on separate thread - height-field tessellated on GPU Morphing blood geometry (for "heart beats" etc) - animate individual vertices, not a skeleton - blend between morph targets Character lighting ~~~~~~~~~~~~~~~~~~~ goals: - integrate closely with environment - hundreds of lights - cinematic look 2 types of lights: - main dominant lights - the ones used to fake GI (bounce) * evaluate Spherical Harmonics (SH) per pixel Screen-Space ambient occlusion optimizations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - noisy shadows - spatial filter pass * 720p, 16 occlusion samples, full res, 20 filter samples -> too slow 8 samples, 12 filter samples multifreq. occlusion is lost downsized render target - AO is mostly low frequency -> artifacts around silhouettes temporal filter, to hide artifacts when camera moves - Reverse reprojection caching * doesn't help when nothing is moving gotchas * accurate last frame position * bilinear filtering * double buffer * world-space precision computation mask - to avoid doing work when it's not necessary Star Ocean 4: Flexible Shader Management and Post-Processing Yoshiharu Gotanda - R&D tri-Ace Inc. (for Square-Enix) ============================================================================ Aska: their 3rd gen engine * multiple platforms: PC, 360, NDS * fully sync production environment - result synchronized in DevKit with Maya * physically-based camera (focus, grain, exposure, film profile) Flexible shader management - artist can create shaders in Maya, without a programmer - need to train artists, need to know some physics - shader generated at run time cons: - explosion of number of shader variations -> large shader binary - must create possible shader variations Sudvide shader nodes Post-processing effects AHSL - in-house shader language based on HLSL and CG - correspondance with Maya - shader immediate constants - shader cache; components of cache: key, constant table, shader binary - shader profile data -> supports development Problem: - compile shaders at run-time - size of the cache size -> decompress each shader binary at run-time : separated shader cache to L1 and L2 : supported multiple shader cache files : created a tool to manage shader files - increased the size... 50 MB, > 30,000 shader combis - Shader Adaptors dominated most of the shader combinations 80% Cache file creation -> by the QA team - tough problem... The engine was developing simultaneously with game development for earlier projects - many shadow algorithms implemented -> artists were using multiple and for several unexpected uses... System also used for - particle rendering - post-processing Flexible post-processing why physics? -> unnatural effects from cameras puzzle us (shot couldn't have been taken with a real camera) Rendering must be processed in linear color space. * Bokeh is not DoF. (he means the blurriness) * Gaussian blur is softer than real bokeh (which is flat) * Blend vs. per-pixel blur (for me, blend looks quite ok) (~ Poisson DoF from ATI) * Bleeding artifacts -> use a mask image * F-stop also decides shape: more circular when opened (F2.8) hexagonal when closed (F5.6) * optical vignetting - corner of pix are darkened - "cat eye effect" (eclipse bokeh) -> compute attenuation curve and eclipsing ratio * regular motion blur * glare filter * No film profile (Reinhard adaptation) vs Film simulation -> more realistic to simulate negative films * Film grain - multiply blending - C-MOS noise: "additive" blending; Bayer pattern x White noise Problems: - artists couldn't handle so many parameters of a real camera -> they prepared templates in SO4 (and made some seminars) * still, some sims were never used Summary - real camera -> changing one parameter affects another (isn't that a problem for designers?) Future: - diffraction, aberration, ghost, scatter-based Impression: shader framework - it may give more flexibility to artists, but it's a nightmare for programmers... Unreasonable combinations of shaders. Hard to optimize. - Camera model too complicated: * for both designers and for the game console... * better fix those "templates" at the beginning, for both the sake of programmers and designers