GameBoy emulator accuracy for Speedrunning

Goal: Determine the accuracy (in terms of runtime speed) of emulators and different hardware platforms to see if they are suitable for speedrunning.

Approach: Do many black-box tests (i.e. assume nothing about the implementation) using reproducible segments of real games and time how close each emulator/platform gets to the original hardware.

Results:

Notes:

Tests

Rationale

  1. Tests must be consistent/deterministic. Repeated tests show the exact same results, also all original hardware behaves the same (don’t track hardware generation-related quirks like this).
  2. Tests must be easy to execute. The segment to record must be fast to reach, and only consistent pressing of buttons throughout the test should happen (not even buffered inputs or similar) to avoid input errors.
  3. Segment to record must be of reasonable length (> 1 minute) to and further to mitigate recording issues like frame fading.

Current test suite

Currently the following games are used for testing (with their respective column header in bold):

  1. SML: Super Mario Land (World) (Rev 1) (sha1: 418203621b887caa090215d97e3f509b79affd3e)
  2. KDL: Kirby's Dream Land (USA, Europe) (sha1: 90979baa1d0e24b41b5c304c5ddaf77450692d5a)
  3. DT: Dick Tracy (USA) (sha1: 906361b2066c2b48500b9b709f7b4ed1018309c0)
  4. BK: Balloon Kid (USA, Europe) (sha1: 0cb5adc9bdef5320f3f156efed0d47a618e2299f)
  5. BB: Buster Bros. (USA) (sha1: 0d1692ff60ef1f6a97bbfd2bf8c1548f1f7439ed)
  6. SML 2: Super Mario Land 2: 6 Golden Coins (USA, Europe) (Rev 2) (sha1: d11d94fa3c36b9f72e925070b66bb4f16d31001e)
  7. TS: TaleSpin (USA) (sha1: 155eb5458c971a9a84e3be19711994ca2a4c24d8)
  8. AS: Altered Space - A 3-D Alien Adventure (USA) (sha1: ca586c59b2473a8bec2f0f37504cb74e3a1e4d11)
  9. DK: Donkey Kong (Japan, USA) (SGB Enhanced) (sha1: 6ed661bd1d6d8cdd48e1c10f8ca4e8dcba49128e)
  10. WL2: Wario Land II (USA, Europe) (sha1: c65820b2e52d00e6ce60e0a432fab002fec4386f)

Each test is tracked in a separate sheet (accessible at the bottom of the spreadsheet). More details regarding frame number, execution date etc. can be found there.

Test execution instructions

Marker frames

Additional tests

Besides the black-box tests, a minimal set of white-box tests are executed, blargg's tests. The reason for this is to ensure a minimal level of confidence in the implementation of the emulator regarding CPU instruction/timing and memory timing.
blargg's tests can be found here: http://slack.net/~ant/old/gb-tests/. Sound tests are not required to pass.

Criteria for passing

To be considered "suitable for speedrunning", an emulator/platform has to fulfill the following criteria:

Recordings

The following hardware and software was used for recording (unless mentioned differently):

After recording, ffmpeg was used to add frame numbers to each frame. The marker frames, as defined in the document above, are then directly entered into the spreadsheet and the amount of frames and difference to the baseline (SGB2) is calculated.

Contact

Discord: adrianus#9213
Email: @gmail.com

All video footage is available for verification. Please contact me if you are interested.