Skip to content

YM2612 DAC decoding

Technical discussion about the VGM format, and all the software you need to handle VGM files.

Moderator: Staff

  • ZeroByte Offline
  • Posts: 3
  • Joined: 2021-12-18, 0:43:35

YM2612 DAC decoding

Post by ZeroByte »

First of all, I'd like to thank all of you guys for the amazing work being done with these chips. I love chip music, and have been diving into a lot of low level stuff with them lately.

I'm doing a project that converts VGM into something native on the Commander X16. The FM stuff and PSG style chips that I've written conversions for seem to work pretty well, but now I'm onto PCM. One of the PCM sources I'd like to support is the YM2612 DAC.

Messing around with the raw PCM data from the VGM for Sonic The Hedgehog 1, Green Hill Zone, I'm coming across some peculiar stuff. The pop snare sample actually seems to sound higher quality when I play it back with Audacity or Sox, etc than it sounds on the 2612. I'm trying to figure out if I just haven't found the correct sample rate to use, or if there's a quirk about how the DAC playback works such that a straightforward playback of the sample data will never reproduce the sound correctly.

I tried counting the DAC writes and noting the delta-t between the first and last DAC writes and using that as the sample rate, but it still doesn't sound like it does on the Sega. Here's an example of the steps I've taken to play back the samples in a test:

I used vgm2txt to get the byte offset/size of the sample bank in the VGM
I then used dd with those values to extract the sample data from the VGM into a raw file (e.g. sonic.raw)
Then I can play it using sox at various sample rates. e.g.:
play -t raw -e unsigned-integer -b 8 -r 8000 sonic.raw

Trying different sample rates doesn't lead to anything that sounds correct to me, and this includes also importing raw into Audacity, which sounds the same as sox does.
  • User avatar
  • ValleyBell Offline
  • Posts: 4768
  • Joined: 2011-12-01, 20:20:07
  • Location: Germany

Post by ValleyBell »

There are a few things to consider when trying to replicate YM2612 DAC playback.
  1. When you assume e.g. a rate of 8000 Hz, this will be resampled in some way.
    This is usually done using some resampler that interpolates the 8000 Hz to e.g. 48000 Hz in a way that "sounds good".
    However the YM2612 doesn't do any interpolation. This is equal to "nearest neighbor resampling" and sounds a fair bit harsher and noisier (due to aliasing) than high-quality resamplers.
  2. The sample rate of the samples is not consistent on the MegaDrive in most games.
    It is more like: play at a certain sample rate for 15 ms, then pause for 1 ms, then play for 15 ms, stop for 1 ms, etc.
    This makes the sample a bit noisier as well.
Point 1 is probably what makes the main difference though.
Doing high-quality resampling from a low sample rate to a high one makes the sound a lot more muffled than what you hear on the MegaDrive.

btw, here are the formulae for calculating the intended sample rate (based on my SMPS driver research):
  • bass drum, DAC driver rate 0x17
    3579545 * 2 / (301 + 26 * (0x17-1)) = 7159090 / 873 = 8201 Hz
  • snare drum, DAC driver rate 0x01
    3579545 * 2 / (301 + 26 * (0x01-1)) = 7159090 / 301 = 23784 Hz
  • ZeroByte Offline
  • Posts: 3
  • Joined: 2021-12-18, 0:43:35

Post by ZeroByte »

Thanks for the information, ValleyBell. :beer:
I'll go see if sox has a "nearest neighbor" resampling mode and see what the results sound like at the sample rates you provided.

EDIT: adding "downsample" to the sox command and using the sample rate of 23784 sounds pretty much right.
ValleyBell wrote:btw, here are the formulae for calculating the intended sample rate (based on my SMPS driver research):
  • bass drum, DAC driver rate 0x17
    3579545 * 2 / (301 + 26 * (0x17-1)) = 7159090 / 873 = 8201 Hz
  • snare drum, DAC driver rate 0x01
    3579545 * 2 / (301 + 26 * (0x01-1)) = 7159090 / 301 = 23784 Hz
So where does the DAC driver rate come from? I.e. what should I be looking for in order to determine this value?
Also - I know a lot of "magic numbers" come into play in this kind of stuff... Is the 301+26* portion of the above formula constant or does it also vary based on some factor? (If constant for YM2612 at phi=3579545, that's good enough for me)
ValleyBell wrote: The sample rate of the samples is not consistent on the MegaDrive in most games.
It is more like: play at a certain sample rate for 15 ms, then pause for 1 ms, then play for 15 ms, stop for 1 ms, etc.
This makes the sample a bit noisier as well.
This one is definitely making me "lose sleep" figuring out what I'm going to do to best approximate this. My goal is not perfect accuracy but "good enough" reproduction so that does give me some latitude. The biggest challenge for me is figuring out exactly what to consider as a "trigger" for start/end of a particular clip because simply looking for DAC pointer moves doesn't necessarily mean "new sound," depending on what the original sound engine was doing....

I guess I'll just have to start generating some output and tweaking, etc.
  • User avatar
  • ValleyBell Offline
  • Posts: 4768
  • Joined: 2011-12-01, 20:20:07
  • Location: Germany

Post by ValleyBell »

ZeroByte wrote:
ValleyBell wrote:btw, here are the formulae for calculating the intended sample rate (based on my SMPS driver research):
  • bass drum, DAC driver rate 0x17
    3579545 * 2 / (301 + 26 * (0x17-1)) = 7159090 / 873 = 8201 Hz
  • snare drum, DAC driver rate 0x01
    3579545 * 2 / (301 + 26 * (0x01-1)) = 7159090 / 301 = 23784 Hz
So where does the DAC driver rate come from? I.e. what should I be looking for in order to determine this value?
Also - I know a lot of "magic numbers" come into play in this kind of stuff... Is the 301+26* portion of the above formula constant or does it also vary based on some factor? (If constant for YM2612 at phi=3579545, that's good enough for me)
Those magic values are all the result of some in-depth research I did on the SMPS sound and DAC drivers some years ago.
I'll just do a short explanation of what the numbers mean:
  • 3579545 = Z80 CPU clock rate (assumes NTSC MegaDrive)
  • 301 = minimum number of CPU cycles that the Z80 sound code takes to execute one PCM loop iteration (-> this determines the highest possible sample rate of the DAC driver)
  • *2 = one PCM loop iteration outputs 2 samples (samples are stored as 4-bit DPCM, so one byte = 8 bits results in 2 samples)
  • 26*n = the "n" is a delay counter. Higher values of "n" result in slower sample playback.
    The values for "n" are specific to each PCM sample. (The DAC driver has a table storing sample offset, sample size and this "n" value.)
For Sonic 1, all values are constant, except for "n", which changes per-sample. (Additionally, the sound driver can adjust this "n" in realtime to play the timpani at different pitches.)
For other MegaDrive games, the formula is most likely different.
  • ZeroByte Offline
  • Posts: 3
  • Joined: 2021-12-18, 0:43:35

Post by ZeroByte »

ValleyBell wrote:For other MegaDrive games, the formula is most likely different.
That's what I thought, given how the DAC works and how it's encoded in VGM.
That's why I was surprised when you had actual values and a formula to compute what the "expected base sample rate" was.

Going through the vgm2txt logs on Green Hill Zone, another interesting quirk I see is that sometimes samples are written to the DAC as $52 $2A $XX and don't use $8x - i.e. those samples aren't even in the PCM data block. Since Sonic is just the tune I'm working with while developing the X16 player routine and the conversion script, I think I'll just use these values you provided to manually generate some samples to work with and worry about automating the process later. (i.e. get the player done, then make the conversion script work later)

Probably the best all-purpose solution for the converter will be to just render the DAC output as new 44K PCM logs (which would conveniently record the sample jitter and aliasing effects), compute indexes into those, and resample the clips to whatever rate is desired.
Post Reply