17 September 2016

Performance Tests BGFX vs. GLSL (Update September 2016)

Standardized performance testing from the command line explained (so you keep your MAME configuration fully intact - again many thanks for Headrush69 for this insight):

Open Terminal and change into your MAME directory, then execute (example for BGFX/CRT-GEOM-DELUXE testing):
./mame64 -video bgfx -str 60 -noafs -bgfx_screen_chains crt-geom-deluxe -bgfx_backend opengl -artcrop -noreadconfig -nosleep -nothrottle rtype

What this example does: It launches R-Type for 60 seconds with the BGFX/CRT-GEOM-DELUXE shader and the OpenGL backend on full speed. Other configuration options of mame.ini left out. After exiting MAME, the Terminal window provides you with the performance result:
Average speed: 634.69% (60 seconds)

If you want to test the GLSL shaders that you have installed manually, be sure that you provide the full path. Example with bezels disabled:
./mame64 -video opengl -nofilter -gl_glsl -glsl_shader_mame0 /Users/xxxxxxxxxxxx/Documents/mame/CRT-GEOM-DELUXE/CRT-geom-halation -str 60 -noafs -artcrop -noreadconfig -nosleep -nothrottle -nobezel elevator

Conclusions from my performance tests: 
  • CRT-GEOM rules both in OpenGL/GLSL and BGFX performance-wise - but take the new OpenGL/GLSL-Halation shaders (discussed here), since they perform much better! You can use Glow both in OpenGL and BGFX without risking a performance hit. 
  • If your machine can handle BGFX, the main advantages are that (i) you can fine-tune with sliders in the game and immediately see the result and (ii) you can even switch shaders in the game in real-time, which is extremely cool. Details are here.
  • The BGFX/HLSL shaders crawl on weaker machines, while the BGFX/CRT-GEOM shader performs decently on weaker machines as well (in my case it works very well on a MacBook Pro 13'' Mid 2012, whereas HLSL crawls with 40%)
  • As for BGFX, I would really like to see a slimmed down version of the default MAME HLSL shader that does not take away that much performance.
  • GLSL in MAME 0.163 vs 0.177: Elevator Action: 730% vs 840%, Gyruss 287.15% vs 429.85% -> very nice performance increases also under OpenGL! (However it seems that the 0.178 release predominantly looses these gains again) 
Here are some details


My current setup
  • Hardware: iMac 27 (Late 2013), 3.5 GHz i7 with 32 GB RAM, 3 TB Fusion Drive and Nvidia TTX 780M 4 GB
  • Operating System: OSX El Capitan 10.11.6
  • MAME: 0.177 (self-compiled Binary), preliminary testing done with 0.178 self-compiled
  • Frontend: none, as explained above, you run from the command line and retain your configuration
Shaders tested:
  • GLSL/CRT-GEOM Halation: with own settings, as discussed here
  • GLSL/CRT-GEOM old shaders without Halations: as discussed here (only some testing)
  • BGFX HLSL Shader: with own settings as discussed here
  • BGFX CRT-GEOM Shader (now in MAME)
The insights of these tests are highly interesting:
  1. GLSL is the performance king - however the advantage to BGFX diminishes (CRT-GEOM). Even on a modern machine, BGFX requires more resources than OpenGL/GLSL - but the BGFX/CRT-GEOM implementation is very promising (it also performs nicely on an aging MacBook Pro 13'' Mid 2012 and on a Mac mini mid 2011
  2. With MAME 0.177 the gap between GLSL and BGFX (both CRT-GEOM-Deluxe) is further narrowing down.
  3. In BGFX, there is no difference between the Metal and the OpenGL backend (in MAME 0.177, the Metal backend is broken)
  4. BGFX/CRT-GEOM (now part of MAME) has significantly higher performance than the BGFX HLSL shader (probably due to less effects) 
  5. Glow causes no performance hit, so you can use the CRT-GEOM Deluxe shader; the new CRT-GEOM shaders also perform significantly better than the old ones!
  6. On a huge screen like the 27’’ iMac, the Window mode makes a difference, but in a strange way: In OpenGL/GLSL Fullscreen mode is better, whereas in BGFX, Window mode is better for performance (Update: In MAME 0.177, this effect seems to be gone, and now Fullscreen is faster than Windowed mode).
  7. Horizontal games and vertical games can differ a lot. OpenGL is king in horizontal games, but BGFX keeps performance better in vertical games. Another strange thing to follow up more closely. Means that the performance gap between OpenGL and BGFX seems to be much higher in horizontal games than in vertical games.
  8. BGFX/HLSL crawls on weaker machines. It would be interesting to have a MAME BGFX shader with less effects available, it seems that it is the sheer amount of effects in the standard BGFX shader that drives performance down so significantly. Generally I would not recommend BGFX/HLSL as standard shader for the time being.
  9. There can be big surprises, for example Defender (Red Label): BGFX-CRT-GEOM rocks the house and even surpasses GLSL! This game virtually makes all conclusions irrelevant for this game - except that CRT-GEOM rules. Or 1942 (Revision B), where GLSL fullscreen blows away everything else.
Here are the numbers (60 sec without user interaction, best performing shader first) - I am stopping testing here because the picture is already quite consistent - red numbers do not fit into the overall picture.


1942 (Revision B)
GLSL/CRT-GEOM Halation, Fullscreen - 1044,74%
GLSL/CRT-GEOM Halation, Window - 720,36%
BGFX/MAME-HLSL, Metal-own settings - 209,18%
BGFX/MAME-HLSL, OpenGL-own settings - 209,32%
BGFX/Metal-CRT-GEOM-Glow, Window - 212,08%
BGFX/Metal-CRT-GEOM-Glow, Fullscreen - 203,57%


Defender (Red Label)
BGFX/Metal-CRT-GEOM-Deluxe, Fullscreen - 750,06%
BGFX/Metal-CRT-GEOM-Deluxe, Window - 742,88%
GLSL/CRT-GEOM Halation, Fullscreen - 628,28%
GLSL/CRT-GEOM Halation, Window - 618,24%
BGFX/MAME-HLSL, Metal-own settings - 208,90%
BGFX/MAME-HLSL, OpenGL-own settings - 209,08%
BGFX/MAME-HLSL, GLES-own settings - 208,90%

Elevator Action
GLSL/CRT-GEOM Halation, Fullscreen - 723,53%
GLSL/CRT-GEOM Halation, Window - 659,42%
GLSL/CRT-GEOM older shaders without Halation, Fullscreen - 667,59%
GLSL/CRT-GEOM older shaders without Halation, Window - 589,93%
BGFX/Metal-CRT-GEOM-Glow, Window - 200,70%
BGFX/Metal-CRT-GEOM-Glow, Fullscreen - 176,44%
BGFX/Metal-CRT-GEOM (no glow), Window - 194,79%
BGFX/Metal-CRT-GEOM (no glow), Fullscreen - 165,58%
BGFX/MAME shader, Metal-own settings - 138,65%
BGFX/MAME shader, OpenGL-own settings - 137,31%
BGFX/MAME shader, GLES-own settings - 137,62%

Gyruss
GLSL/CRT-GEOM Halation, Fullscreen - 435,58%
GLSL/CRT-GEOM Halation, Window - 397,92%
GLSL/CRT-GEOM older shaders without Halation, Fullscreen - 435,08%
GLSL/CRT-GEOM older shaders without Halation, Window - 421,04%
BGFX/Metal-CRT-GEOM-Glow, Window - 191,21%
BGFX/Metal-CRT-GEOM-Glow, Fullscreen - 181,03%
BGFX/Metal-CRT-GEOM (no glow), Window - 196,12%
BGFX/Metal-CRT-GEOM (no glow), Fullscreen - 183,83%
BGFX/MAME shader, Metal-own settings - 176,74%
BGFX/MAME shader, OpenGL-own settings - 182,78%
BGFX/MAME shader, GLES-own settings - 175,19%

Off Road Challenge (v.1.63)
GLSL/CRT-GEOM Halation, Window - 124,03%
GLSL/CRT-GEOM Halation, Fullscreen - 123,60%
BGFX/Metal-CRT-GEOM-Glow, Window - 125,97%
BGFX/MAME-HLSL shader, Metal-own settings - 125,36%
BGFX/MAME-HLSL shader, OpenGL-own settings - 124,86%


Out Run (sitdown/upright, Rev A)
GLSL/CRT-GEOM Halation, Fullscreen - 665,09%
GLSL/CRT-GEOM Halation, Window - 633,56%
BGFX/Metal-CRT-GEOM-Glow, Window - 627,28%
BGFX/Metal-CRT-GEOM-Glow, Fullscreen - 582,52%
BGFX/MAME-HLSL shader, Metal-own settings - 201,04%
BGFX/MAME-HLSL shader, OpenGL-own settings - 201,07%

R-Type (World)
GLSL/CRT-GEOM Halation, Window - 737,64%
GLSL/CRT-GEOM Halation, Fullscreen - 845,61%
BGFX/Metal-CRT-GEOM-Glow, Window - 633,44%
BGFX/MAME-HLSL shader, Metal-own settings - 224,57%
BGFX/MAME-HLSL shader, OpenGL-own settings - 224,46%

Three Wonders (USA 919520)
GLSL/CRT-GEOM Halation, Fullscreen - 790,39%
GLSL/CRT-GEOM Halation, Window - 676,34%
BGFX/Metal-CRT-GEOM-Glow, Window - 627,39%
BGFX/MAME-HLSL shader, Metal-own settings - 205,08%
BGFX/MAME-HLSL shader, OpenGL-own settings - 205,27%


3 comments:

  1. The HLSL shader port uses oversampling by default, which results in 4 times more samples than without oversampling. Take a look into mame\bgfx\chains\hlsl.json, search for "scale": 2 and remove these lines or replace 2 by 1. After that the performance should at least be doubled and the comparison with the other shaders would be more meaningful.

    ReplyDelete
    Replies
    1. Thanks so much Jeeze, indeed! This is exactly what I was wondering all the time. I can confirm that changing 'scale' in lines 177 and 182 brings the hlsl shader in range of crt-geom-deluxe. That leaves me with two questions: Is this a bug? In line 163, the default value for scale is commented as 1, however the actual value is then 2. Moreover, it is hard for me to see any difference in quality between 2 and 1. So finally I can bring the hlsl shader to the same performance level as crt-geom-deluxe. So again thank you, exactly the hint I was waiting for.

      Delete
    2. It is not a bug, but maybe oversampling should not be used as default. However oversampling helps to improve the appearance of scan-lines - especially if they are thinner than 2px - and also reduces the moiré-effect when the image is distorted.

      The crt-geom shader also uses oversampling for scan-lines but only for scan-lines. The scaling of 2 activates oversampling for all shader passes that uses this render target, which cost much more performance.

      Delete

Any comments are welcome!