Understanding timing in VGMPlay

yngwie87 · Post by **yngwie87** » 2023-12-01, 17:45:51

Howdy! Just discovered this fantastic community and the VGM files. I'm interested in oldschool sound chips and I'm involved in many projects to bring them to life. Being an hardware guy, I'm more familiar with firmware-side C lang, so my question stems from my lack of experience in software dev.

I was reverse engineering the VGMPlay (legacy) software to better understand the core of the player. What I cannot figure out at all is how the 1/44100 Hz = 2.2676e-05 s tick signal is generated . This signal should be used in players as trigger to fetch & parse the single commands (correct me if wrong).

What is the strategy to generate such a small timing signal in VGMPlay?

I would expect a timer thread that triggers (every 1/44100 Hz) another thread (or the main program) to parse the current VGM command. However, a different strategy seems to be adopted. Could you please elaborate on this? What exactly is the purpose of the FillBuffer() function?

Another related question is: assuming writing to real hardware, how would the InterpretFile() function change considering that writes requires a non-zero time?

I hope I've been clear enough.
Thank you for support.

Cheers

Post by **ValleyBell** » 2023-12-08, 13:33:34

Sorry for the late reply.

There are two ways of how VGMPlay does timing.

Method 1 - when using the "wave output" APIs:

When software-emulation is active, then VGMPlay sets up the operating system's audio API (e.g. Windows waveOut) so that it outputs data to the speakers with a specific sample rate - e.g. 48000 Hz.
It then starts a thread that checks whether or not the OS needs more wave data.
If that is the case, then it will call FillBuffer() with the respective number of samples (output rate, in this example 48 KHz).
for each output sample, FillBuffer() will then call InterpretFile() / InterpretVGM() and render the emulated sound chips for 1 output sample
InterpretVGM() itself uses the current absolute timestamp of the playback to calculate, until which VGM tick it has to process the file, see VGMPlay.c, line 4744
It processes the VGM file until the calculated tick is reached.

Method 2 - being called by OS audio API:

When only hardware playback is enabled, the timing works slightly differently. VGMPlay then starts a separate "playback thread". It has a playback loop that runs with up to 1000 Hz. (basically: as fast as possible without hogging the CPU)
In this playback loop, it measures "current system time - last system time where rendering occoured".
From that, it can calculate, how many "samples" need to be rendered. (This is using the "output sample rate" from method 1 as time unit.)
It then calls FillBuffer(), just like in method 1. And from there the timing works in the same way.

---

Summary:

It counts the "output playback time position". For the time scale, it uses the output sample rate that the user wants.
[VGM playback time position] = [output playback time position] * [vgm tick rate == 44100 Hz] / [output sample rate]
It processes all VGM events from [last VGM tick processed] to [VGM playback time position].

Whether or not an event takes non-zero "real" time doesn't matter. It just calculates how much time passed since the last time it processed data and "catches up" until the current time.
Considering that an OS like Windows has absolutely no guarantee on how long an actual "sleep" command takes, this is also the only way to get stable timing in my opinion.

yngwie87 · Post by **yngwie87** » 2023-12-16, 23:29:01

Hi ValleyBell! A heartfelt thank you for your reply. Didn't expect such a detailed answer, so wooow!

Sorry for the late reply but I wanted to experiment a bit with the source code and information you provided.

I was wandering why two different mechanisms for emulated sound cores and real hardware...

Anyway, Method 1 seems like an optimal solution for emulation playback since it is almost impossible to generate accurate and precise ticks with general-purpose operating systems. I experimented with source code by putting writes to a USB CDC device (Full Speed) in 0x50, 0x52, and 0x53 VGM commands instead of chip_reg_write() calls. I also put an USB write in 0x80..0x8F since I'm trying to communicate with real YM2612/PSG hardware. USB writes, made of 2 (PSG) or 3 bytes (YM2612), are received by an embedded system that also controls the sound chips. No buffering has been implemented here.

What I find interesting is that streaming of PSG and FM commands works quite well (aside from rare non-software related communication errors) if no 8x commands are involved. When PCM samples with waits are present, VMGs sound like corrupted with crackling and/or popping and the entire playback slows down. It appears that delays in wait commands affect timing .

What could be wrong with 8x commands? Do non-zero write times to USB affect the playback timing in any way?

Again, thanks for your kind support.

Regards

Post by **ValleyBell** » 2023-12-19, 11:02:19

The only thing I can imagine here is, that the YM2612 PCM commands are happening faster than the USB CDC device can handle.

YM2612 PCM streaming requires a command rate of up to 30 KHz. If you need 3 bytes per command, you need to transfer 90 000 bytes/second. Maybe that is a bit too much?
It could also be the USB driver itself that is a bottleneck due to the huge amount of system calls, which can be very CPU-heavy.

Things can also go wrong on the YM2612 side. Do you ensure that the chip receives commands not faster than 54 KHz?
I doubt that the issue happens in this part though. It would be likely that non-PCM VGMs would break sometimes as well if things were too fast on the YM2612 side.

yngwie87 · Post by **yngwie87** » 2023-12-25, 18:38:03

Thank you for your reply, ValleyBell, and Merry Christmas!!

Just to let you know that I'm trying to verify the requirements you mentioned through experimental measurements.

ValleyBell wrote: ↑2023-12-19, 11:02:19 The only thing I can imagine here is, that the YM2612 PCM commands are happening faster than the USB CDC device can handle.

YM2612 PCM streaming requires a command rate of up to 30 KHz. If you need 3 bytes per command, you need to transfer 90 000 bytes/second. Maybe that is a bit too much?
It could also be the USB driver itself that is a bottleneck due to the huge amount of system calls, which can be very CPU-heavy.

The 90 kB/s rate is theoretically achievable with USB bulk transfers but I'd like to estimate the actual throughput during streaming with VGMPlay.

ValleyBell wrote: ↑2023-12-19, 11:02:19 Things can also go wrong on the YM2612 side. Do you ensure that the chip receives commands not faster than 54 KHz?
I doubt that the issue happens in this part though. It would be likely that non-PCM VGMs would break sometimes as well if things were too fast on the YM2612 side.

Good point. I'll try to capture the signals with a logic analyzer to ensure that time requirements are met.

More details in the next days, if work and family allow it...

If you think this is OOT, I can start another thread.

Happy holidays

VGMRips

Understanding timing in VGMPlay

Understanding timing in VGMPlay

Re: Understanding timing in VGMPlay

Re: Understanding timing in VGMPlay

Re: Understanding timing in VGMPlay

Re: Understanding timing in VGMPlay