SSAO

My Speeches 2016. 8. 31. 13:54
336x280(권장), 300x250(권장), 250x250, 200x200 크기의 광고 코드만 넣을 수 있습니다.

사내 스터디에서 다루었던 SSAO 개요



Posted by ozlael
,

Inferred lighting

My Speeches 2016. 8. 31. 13:49
336x280(권장), 300x250(권장), 250x250, 200x200 크기의 광고 코드만 넣을 수 있습니다.

사내 스터디에서 다루었던 Inferred lighting 자료



Posted by ozlael
,
336x280(권장), 300x250(권장), 250x250, 200x200 크기의 광고 코드만 넣을 수 있습니다.

G-buffer Normal을 R16FG16F 포맷으로 사용 시 Z 부호 문제 Deferred Rendering

저희 엔진은 G-buffer Normal을 Deferred Rendering in Killzone 2에 나온 방식으로 x,y만 저장하고, 읽을 시 Normal.z = sqrt(1.0f - Normal.x제곱 - Normal.y제곱)으로 계산해냅니다.  하지만 값의 부호는 양수지요. 처음에는 뷰 스페이스 노말이므로 뒷면을 그리지 않는 한 z의 부호는 항상 음수일 것으로 생각하고 무조건 음수로 사용했었습니다만 경기도 오산이였습니다. 양수가 되는 경우도 있더군요. 카메라가 지면 가까이서 하늘을 향했 을 시 지면의 뷰 스페이스 노말 z가 양수가 되는 경우가 발생합니다. 
그래서 G-buffer Depth의 값에 뷰 스페이스 노말의 z 부호를 저장하여 해결했습니다. 어짜피 깊이 값의 부호는 항상 양수이므로 저장 값의 부호는 노말 z의 부호로 사용하면 문제 해결이지요. depth를 안읽고 normal만 읽는 경우는 없으니 추가 비용에 대한 부담도 없다고 보면 되겠지요.
티스님의 글을 보고 생각나서 몇자 적어보았습니다 :-)


Posted by ozlael
,
336x280(권장), 300x250(권장), 250x250, 200x200 크기의 광고 코드만 넣을 수 있습니다.

10년도 작성 글 백업

--------------------------------


디퍼드 랜더 시스템 개략( Forward, Deferred, Light Prepass, Inferred)Deferred Rendering

얼마 전 저희 엔진 랜더 시스템에 대한 간략한 리뷰를 가졌습니다. 그때 사용 된 PT 슬라이드를 올리기는 좀 거시기 하여서 그 중 올려도 상관 없는 몇 장만 살포시 올려봅니다 -ㅈ-; 대강의 디퍼드 랜더 시스템 개략 설명입니다. Forward, Deferred, Light Prepass, Inferred 시스템을 대략적으로 한장씩 표현하였습니다.


Posted by ozlael
,
336x280(권장), 300x250(권장), 250x250, 200x200 크기의 광고 코드만 넣을 수 있습니다.

Compact Normal Storage for Small G-Buffers Deferred Rendering

원문 : http://aras-p.info/texts/CompactNormalStorage.html

Compact Normal Storage for Small G-Buffers

Intro

Various deferred shading/lighting approaches or image postprocessing effects need to store normals as part of their G-buffer. Let’s figure out a compact storage method for view space normals. In my case, main target is minimalist G-buffer, where depth and normals are packed into a single 32 bit (8 bits/channel) render texture. I try to minimize error and shader cycles to encode/decode.

Now of course, 8 bits/channel storage for normals can be not enough for shading, especially if you want specular (low precision & quantization leads to specular “wobble” when camera or objects move). However, everything below should Just Work (tm) for 10 or 16 bits/channel integer formats. For 16 bits/channel half-float formats, some of the computations are not necessary (e.g. bringing normal values into 0..1 range).

If you know other ways to store/encode normals, please let me know in the comments!

Various normal encoding methods and their comparison below. Notes:

  • Error images are: 1-pow(dot(n1,n2),1024) and abs(n1-n2)*30, where n1 is actual normal, and n2 is normal encoded into a texture, read back & decoded. MSE and PSNR is computed on the difference (abs(n1-n2)) image.
  • Shader code is HLSL. Compiled into ps_3_0 by d3dx9_42.dll (February 2010 SDK).
  • Radeon GPU performance numbers from AMD’s GPU ShaderAnalyzer 1.53, using Catalyst 9.12 driver.
  • GeForce GPU performance numbers from NVIDIA’s NVShaderPerf 2.0, using 174.74 driver.

Note: there was an error!

Original version of my article had some stupidity: encoding shaders did not normalize the incoming per-vertex normal. This resulted in quality evaluation results being somewhat wrong. Also, if normal is assumed to be normalized, then three methods in original article (Sphere Map, Cry Engine 3 and Lambert Azimuthal) are in fact completely equivalent. The old version is still available for the sake of integrity of the internets.

Test Playground Application

Here is a small Windows application I used to test everything below: NormalEncodingPlayground.zip(4.8MB, source included).

It requires GPU with Shader Model 3.0 support. When it writes fancy shader reports, it expects AMD’s GPUShaderAnalyzer and NVIDIA’s NVShaderPerf to be installed. Source code should build with Visual C++ 2008.

Baseline: store X&Y&Z

Just to set the basis, store all three components of the normal. It’s not suitable for our quest, but I include it here to evaluate “base” encoding error (which happens here only because of quantization to 8 bits per component).

Encoding, Error to Power, Error * 30 images below. MSE: 0.000008; PSNR: 51.081 dB.
  

EncodingDecoding
half4 encode (half3 n, float3 view){    return half4(n.xyz*0.5+0.5,0);}
half3 decode (half4 enc, float3 view){    return enc.xyz*2-1;}
ps_3_0def c0, 0.5, 0, 0, 0dcl_texcoord_pp v0.xyzmad_pp oC0, v0.xyzx, c0.xxxy, c0.xxxy
ps_3_0def c0, 2, -1, 0, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp oC0.xyz, r0, c0.x, c0.ymov_pp oC0.w, c0.z
1 ALURadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 1.00 clkGeForce 7800GT: 1 GPR, 1.00 clkGeForce 8800GTX: 6 GPR, 8.00 clk
2 ALU, 1 TEXRadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 1.00 clkGeForce 7800GT: 1 GPR, 1.00 clkGeForce 8800GTX: 6 GPR, 10.00 clk

Method #1: store X&Y, reconstruct Z

Used by Killzone 2 among others (PDF link).

Encoding, Error to Power, Error * 30 images below. MSE: 0.013514; PSNR: 18.692 dB.
  

Pros:
  • Very simple to encode/decode
Cons:
  • Normal can point away from the camera. My test scene setup actually has that. See Resistance 2 Prelighting paper (PDF link) for explanation.
EncodingDecoding
half4 encode (half3 n, float3 view){    return half4(n.xy*0.5+0.5,0,0);}
half3 decode (half2 enc, float3 view){    half3 n;    n.xy = enc*2-1;    n.z = sqrt(1-dot(n.xy, n.xy));    return n;}
ps_3_0def c0, 0.5, 0, 0, 0dcl_texcoord_pp v0.xymad_pp oC0, v0.xyxx, c0.xxyy, c0.xxyy
ps_3_0def c0, 2, -1, 1, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xy, r0, c0.x, c0.ydp2add_pp r0.z, r0, -r0, c0.zmov_pp oC0.xy, r0rsq_pp r0.x, r0.zrcp_pp oC0.z, r0.xmov_pp oC0.w, c0.w
1 ALURadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 1.00 clkGeForce 7800GT: 1 GPR, 1.00 clkGeForce 8800GTX: 5 GPR, 7.00 clk
7 ALU, 1 TEXRadeon HD 2400: 1 GPR, 1.00 clkRadeon HD 3870: 1 GPR, 1.00 clkRadeon HD 5870: 1 GPR, 0.50 clkGeForce 6200: 1 GPR, 4.00 clkGeForce 7800GT: 1 GPR, 3.00 clkGeForce 8800GTX: 5 GPR, 15.00 clk

Method #3: Spherical Coordinates

It is possible to use spherical coordinates to encode the normal. Since we know it’s unit length, we can just store the two angles.

Suggested by Pat Wilson of Garage Games: GG blog post. Other mentions: MJP’s blogGarageGames threadWolf Engel’s bloggamedev.net forum thread.

Encoding, Error to Power, Error * 30 images below. MSE: 0.000062; PSNR: 42.042 dB.
  

Pros:
  • Suitable for normals in general (not necessarily view space)
Cons:
  • Uses trig instructions (quite heavy on ALU). Possible to replace some of that with texture lookups though.
EncodingDecoding
#define kPI 3.1415926536fhalf4 encode (half3 n, float3 view){    return half4(      (half2(atan2(n.y,n.x)/kPI, n.z)+1.0)*0.5,      0,0);}
half3 decode (half2 enc, float3 view){    half2 ang = enc*2-1;    half2 scth;    sincos(ang.x * kPI, scth.x, scth.y);    half2 scphi = half2(sqrt(1.0 - ang.y*ang.y), ang.y);    return half3(scth.y*scphi.x, scth.x*scphi.x, scphi.y);}
ps_3_0def c0, 0.999866009, 0, 1, 3.14159274def c1, 0.0208350997, -0.0851330012,    0.180141002, -0.330299497def c2, -2, 1.57079637, 0.318309873, 0.5dcl_texcoord_pp v0.xyzadd_pp r0.xy, -v0_abs, v0_abs.yxzwcmp_pp r0.xz, r0.x, v0_abs.xyyw, v0_abs.yyxwcmp_pp r0.y, r0.y, c0.y, c0.zrcp_pp r0.z, r0.zmul_pp r0.x, r0.x, r0.zmul_pp r0.z, r0.x, r0.xmad_pp r0.w, r0.z, c1.x, c1.ymad_pp r0.w, r0.z, r0.w, c1.zmad_pp r0.w, r0.z, r0.w, c1.wmad_pp r0.z, r0.z, r0.w, c0.xmul_pp r0.x, r0.x, r0.zmad_pp r0.z, r0.x, c2.x, c2.ymad_pp r0.x, r0.z, r0.y, r0.xcmp_pp r0.y, v0.x, -c0.y, -c0.wadd_pp r0.x, r0.x, r0.yadd_pp r0.y, r0.x, r0.xadd_pp r0.z, -v0.x, v0.ycmp_pp r0.zw, r0.z, v0.xyxy, v0.xyyxcmp_pp r0.zw, r0, c0.xyyz, c0.xyzymul_pp r0.z, r0.w, r0.zmad_pp r0.x, r0.z, -r0.y, r0.xmul_pp r0.x, r0.x, c2.zmov_pp r0.y, v0.zadd_pp r0.xy, r0, c0.zmul_pp oC0.xy, r0, c2.wmov_pp oC0.zw, c0.y
ps_3_0def c0, 2, -1, 0.5, 1def c1, 6.28318548, -3.14159274, 1, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xy, r0, c0.x, c0.ymad r0.x, r0.x, c0.z, c0.zfrc r0.x, r0.xmad r0.x, r0.x, c1.x, c1.ysincos_pp r1.xy, r0.xmad_pp r0.x, r0.y, -r0.y, c0.wmul_pp oC0.zw, r0.y, c1rsq_pp r0.x, r0.xrcp_pp r0.x, r0.xmul_pp oC0.xy, r1, r0.x
26 ALURadeon HD 2400: 1 GPR, 17.00 clkRadeon HD 3870: 1 GPR, 4.25 clkRadeon HD 5870: 2 GPR, 0.95 clkGeForce 6200: 2 GPR, 12.00 clkGeForce 7800GT: 2 GPR, 9.00 clkGeForce 8800GTX: 9 GPR, 43.00 clk
17 ALU, 1 TEXRadeon HD 2400: 1 GPR, 17.00 clkRadeon HD 3870: 1 GPR, 4.25 clkRadeon HD 5870: 2 GPR, 0.95 clkGeForce 6200: 2 GPR, 7.00 clkGeForce 7800GT: 1 GPR, 5.00 clkGeForce 8800GTX: 6 GPR, 23.00 clk

Method #4: Spheremap Transform

Spherical environment mapping (indirectly) maps reflection vector to a texture coordinate in [0..1] range. The reflection vector can point away from the camera, just like our view space normals. Bingo! See Siggraph 99 notes for sphere map math. Normal we want to encode is R, resulting values are (s,t).

If we assume that incoming normal is normalized, then there are methods derived from elsewhere that end up being exactly equivalent:

  • Used in Cry Engine 3, presented by Martin Mittring in “A bit more Deferred” presentation (PPT link, slide 13). For Unity, I had to negate Z component of view space normal to produce good results, I guess Unity’s and Cry Engine’s coordinate systems are different. The code would be:
    half2 encode (half3 n, float3 view){    half2 enc = normalize(n.xy) * (sqrt(-n.z*0.5+0.5));    enc = enc*0.5+0.5;    return enc;}half3 decode (half4 enc, float3 view){    half4 nn = enc*half4(2,2,0,0) + half4(-1,-1,1,-1);    half l = dot(nn.xyz,-nn.xyw);    nn.z = l;    nn.xy *= sqrt(l);    return nn.xyz * 2 + half3(0,0,-1);}
  • Lambert Azimuthal Equal-Area projection (Wikipedia link). Suggested by Sean Barrett in commentsfor this article. The code would be:
    half2 encode (half3 n, float3 view){    half f = sqrt(8*n.z+8);    return n.xy / f + 0.5;}half3 decode (half4 enc, float3 view){    half2 fenc = enc*4-2;    half f = dot(fenc,fenc);    half g = sqrt(1-f/4);    half3 n;    n.xy = fenc*g;    n.z = 1-f/2;    return n;}

Encoding, Error to Power, Error * 30 images below. MSE: 0.000016; PSNR: 48.071 dB.
  

Pros:
  • Quality pretty good!
  • Quite cheap to encode/decode.
  • Similar derivation used by Cry Engine 3, so it must be good :)
Cons:
  • ???
EncodingDecoding
half4 encode (half3 n, float3 view){    half p = sqrt(n.z*8+8);    return half4(n.xy/p + 0.5,0,0);}
half3 decode (half2 enc, float3 view){    half2 fenc = enc*4-2;    half f = dot(fenc,fenc);    half g = sqrt(1-f/4);    half3 n;    n.xy = fenc*g;    n.z = 1-f/2;    return n;}
ps_3_0def c0, 8, 0.5, 0, 0dcl_texcoord_pp v0.xyzmad_pp r0.x, v0.z, c0.x, c0.xrsq_pp r0.x, r0.xmad_pp oC0.xy, v0, r0.x, c0.ymov_pp oC0.zw, c0.z
ps_3_0def c0, 4, -2, 0, 1def c1, 0.25, 0.5, 1, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xy, r0, c0.x, c0.ydp2add_pp r0.z, r0, r0, c0.zmad_pp r0.zw, r0.z, -c1.xyxy, c1.zrsq_pp r0.z, r0.zmul_pp oC0.zw, r0.w, c0.xywzrcp_pp r0.z, r0.zmul_pp oC0.xy, r0, r0.z
4 ALURadeon HD 2400: 2 GPR, 3.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 4.00 clkGeForce 7800GT: 1 GPR, 2.00 clkGeForce 8800GTX: 5 GPR, 12.00 clk
8 ALU, 1 TEXRadeon HD 2400: 2 GPR, 3.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 6.00 clkGeForce 7800GT: 1 GPR, 3.00 clkGeForce 8800GTX: 6 GPR, 15.00 clk

Method #7: Stereographic Projection

What the title says: use Stereographic Projection (Wikipedia link), plus rescaling so that “practically visible” range of normals maps into unit circle (regular stereographic projection maps sphere to circle of infinite size). In my tests, scaling factor of 1.7777 produced best results; in practice it depends on FOV used and how much do you care about normals that point away from the camera.

Suggested by Sean Barrett and Ignacio Castano in comments for this article.

Encoding, Error to Power, Error * 30 images below. MSE: 0.000038; PSNR: 44.147 dB.
  

Pros:
  • Quality pretty good!
  • Quite cheap to encode/decode.
Cons:
  • ???
EncodingDecoding
half4 encode (half3 n, float3 view){    half scale = 1.7777;    half2 enc = n.xy / (n.z+1);    enc /= scale;    enc = enc*0.5+0.5;    return half4(enc,0,0);}
half3 decode (half4 enc, float3 view){    half scale = 1.7777;    half3 nn =        enc.xyz*half3(2*scale,2*scale,0) +        half3(-scale,-scale,1);    half g = 2.0 / dot(nn.xyz,nn.xyz);    half3 n;    n.xy = g*nn.xy;    n.z = g-1;    return n;}
ps_3_0def c0, 1, 0.281262308, 0.5, 0dcl_texcoord_pp v0.xyzadd_pp r0.x, c0.x, v0.zrcp r0.x, r0.xmul_pp r0.xy, r0.x, v0mad_pp oC0.xy, r0, c0.y, c0.zmov_pp oC0.zw, c0.w
ps_3_0def c0, 3.55539989, 0, -1.77769995, 1def c1, 2, -1, 0, 0dcl_texcoord2 v0.xydcl_2d s0texld_pp r0, v0, s0mad_pp r0.xyz, r0, c0.xxyw, c0.zzwwdp3_pp r0.z, r0, r0rcp r0.z, r0.zadd_pp r0.w, r0.z, r0.zmad_pp oC0.z, r0.z, c1.x, c1.ymul_pp oC0.xy, r0, r0.wmov_pp oC0.w, c0.y
5 ALURadeon HD 2400: 2 GPR, 4.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 2.00 clkGeForce 7800GT: 1 GPR, 2.00 clkGeForce 8800GTX: 5 GPR, 12.00 clk
7 ALU, 1 TEXRadeon HD 2400: 2 GPR, 4.00 clkRadeon HD 3870: 2 GPR, 1.00 clkRadeon HD 5870: 2 GPR, 0.50 clkGeForce 6200: 1 GPR, 4.00 clkGeForce 7800GT: 1 GPR, 4.00 clkGeForce 8800GTX: 6 GPR, 12.00 clk

Method #8: Per-pixel View Space

If we compute view space per-pixel, then Z component of a normal can never be negative. Then just store X&Y, and compute Z.

Suggested by Yuriy O’Donnell on Twitter.

Encoding, Error to Power, Error * 30 images below. MSE: 0.000134; PSNR: 38.730 dB.
  

Pros:
  • ???
Cons:
  • Quite heavy on ALU
EncodingDecoding
float3x3 make_view_mat (float3 view){    view = normalize(view);    float3 x,y,z;    z = -view;    x = normalize (float3(z.z, 0, -z.x));    y = cross (z,x);    return float3x3 (x,y,z);}half4 encode (half3 n, float3 view){    return half4(mul (make_view_mat(view), n).xy*0.5+0.5,0,0);}half3 decode (half4 enc, float3 view){    half3 n;    n.xy = enc*2-1;    n.z = sqrt(1+dot(n.xy,-n.xy));    n = mul(n, make_view_mat(view));    return n;}
ps_3_0def c0, 1, -1, 0, 0.5dcl_texcoord_pp v0.xyzdcl_texcoord1 v1.xyzmov r0.x, c0.znrm r1.xyz, v1mov r1.w, -r1.zmul r0.yz, r1.xxzw, c0.xxywdp2add r0.w, r1.wxzw, r0.zyzw, c0.zrsq r0.w, r0.wmul r0.xyz, r0, r0.wmul r2.xyz, -r1.zxyw, r0mad r1.xyz, -r1.yzxw, r0.yzxw, -r2dp2add r0.x, r0.zyzw, v0.xzzw, c0.zdp3 r0.y, r1, v0mad_pp oC0.xy, r0, c0.w, c0.wmov_pp oC0.zw, c0.z
ps_3_0def c0, 2, -1, 1, 0dcl_texcoord1 v0.xyzdcl_texcoord2 v1.xydcl_2d s0mov r0.y, c0.wnrm r1.xyz, v0mov r1.w, -r1.zmul r0.xz, r1.zyxw, c0.yyzwdp2add r0.w, r1.wxzw, r0.xzzw, c0.wrsq r0.w, r0.wmul r0.xyz, r0, r0.wmul r2.xyz, -r1.zxyw, r0.yzxwmad r2.xyz, -r1.yzxw, r0.zxyw, -r2texld_pp r3, v1, s0mad_pp r3.xy, r3, c0.x, c0.ymul r2.xyz, r2, r3.ymad r0.xyz, r3.x, r0, r2dp2add_pp r0.w, r3, -r3, c0.zrsq_pp r0.w, r0.wrcp_pp r0.w, r0.wmad_pp oC0.xyz, r0.w, -r1, r0mov_pp oC0.w, c0.w
17 ALURadeon HD 2400: 3 GPR, 11.00 clkRadeon HD 3870: 3 GPR, 2.75 clkRadeon HD 5870: 2 GPR, 0.80 clkGeForce 6200: 4 GPR, 12.00 clkGeForce 7800GT: 4 GPR, 8.00 clkGeForce 8800GTX: 8 GPR, 24.00 clk
21 ALU, 1 TEXRadeon HD 2400: 3 GPR, 11.00 clkRadeon HD 3870: 3 GPR, 2.75 clkRadeon HD 5870: 2 GPR, 0.80 clkGeForce 6200: 3 GPR, 12.00 clkGeForce 7800GT: 3 GPR, 9.00 clkGeForce 8800GTX: 12 GPR, 29.00 clk

Performance Comparison

GPU performance comparison in a single table:

#1: X & Y#3: Spherical#4: Spheremap#7: Stereo#8: PPView
Encoding, GPU cycles
Radeon HD24001.0017.003.004.0011.00
Radeon HD58700.500.950.500.500.80
GeForce 62001.0012.004.002.0012.00
GeForce 88007.0043.0012.0012.0024.00
Decoding, GPU cycles
Radeon HD24001.0017.003.004.0011.00
Radeon HD58700.500.950.501.000.80
GeForce 62004.007.006.004.0012.00
GeForce 880015.0023.0015.0012.0029.00
Encoding, D3D ALU+TEX instruction slots
SM3.01264517
Decoding, D3D ALU+TEX instruction slots
SM3.08189822

Quality Comparison

Quality comparison in a single table. PSNR based, higher numbers are better.

MethodPSNR, dB
#1: X & Y18.629
#3: Spherical42.042
#4: Spheremap48.071
#7: Stereographic44.147
#8: Per pixel view38.730


Posted by ozlael
,