Skip to content

VGM 2.0 suggestions / ideas

orig. title: Multiple AY chips with stereo

Technical discussion about the VGM format, and all the software you need to handle VGM files.

Moderator: Staff

  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

Some comments / thoughts about stream commands.

Firstly, the stream commands currently seem to support one stream per chip, and do not specify which register to write to. I think each stream should be assigned an ID which should be referred in the playback commands.

Secondly, for $30 (Set sample format) the term “channel” is pretty vague to me. IMO it should be really generic, specifying a VGM command prefix for the data, sample data length and sample data increment (to support interleaving). Then I can just forward it directly to the player’s processing loop which will take care of the rest, no matter what chip it is for.

Third, it would be good if command $30 was required to be specified in the “initialisation section” block (previously known as the memory data section). No need to change stream IDs / format at runtime I think.

Lastly, I wonder what’s the idea for YM2612-like direct PCM writes?

They can be played with stream commands, however this is maybe too exact, I would imagine on e.g. Z80 there are deviations (how much?). On the one hand, the increased precision will sound better, and the commands are compact. On the other hand imperfections in the original aren’t captured, and the VGM preparation is more complicated. Is this ok?

The obvious alternative is no special processing, which is the cleanest solution if it weren’t for memory use; four bytes per PCM data byte (incl. wait) where it has one now. A “manual advance” could be hacked onto chips to make it three, by allowing data value to be omitted (implying read from stream). But I’m not sure if that’s worth the trouble, to be honest I’m not thrilled by the huge memory footprint of the PJ2612 VGMs, much less three times as much.

What are your thoughts?

Post by vampirefrog »

From what I've seen, ADPCM streaming happens at a fixed rate, which is a division of a clock frequency. So for OKI 6258, you have 4MHz or 8MHz divided by 512, 768, 1024 so you get 5 different sampling frequencies, though the data bytes contain two samples each, so you'd write at half that rate. Either way, if you store the stream frequency as a fraction of integers, so I guess two 32-bit ints, it would be quite exact. Then, in the player code, you could find the greatest common factor and simplify that fraction, and you'd have to add the player's sampling frequency to the equation, and you'd end up with something like a rate of 15625 / 44100, 31250 / 132300, 15625 / 88200 and so on, if my calculations are correct, and even those can be further simplified. You can then use a combination of numerator, denominator and remainder to simulate the stream's clock without any worries about integer overflow. This is similar to fixed point math, but instead of dividing by a power of two, you're dividing by something more relevant precise.

Anyway, this method works fine and perfectly precise for ADPCM streaming on the 6258, I don't know about other chips. I suppose, ideally, you could store yet another field that dictates how the data in the stream frequency field is stored: as an integer ratio, as a floating point number, as a double, fixed-point (which is still just a fraction, but the denominator is a power of two) etc. A double precision floating point value would take the same space as two 32-bit ints btw.
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

Well, ideally PCM streaming would happen at an exact rate. But in practice there will be some jitter, a DMA controller needs to wait for bus access, and especially for CPU-controlled streaming like on Z80, it’s hard to stream at a totally perfect rate if you’re also controlling the FM chip and buffering.

So maybe the stream control commands are too exact? Is it important to capture the imprecisions in e.g. Sega MegaDrive games PCM? OTOH I guess the alternative is pretty heavy on the data size (3-4x as much as it is in VGM 1.0, which is already pretty hefty), so maybe we simply do not have the luxury to care about that. Or maybe we shouldn’t even care since it is either barely audible or more pleasant on the ears.

Btw, IMO the streaming commands should specify rate as a period in system clock cycles (as pretty much every sound chip does as well). Currently the 2.0 proposal allows rate to be specified as a Hz frequency or a period in sound chip cycles, but I think the frequency option should go, and the period should use system clock cycles (“chip 0”) rather than sound chip cycles. Since the streaming rate is driven by the CPU clock, it is the most natural unit to use.

Additionally it’s easier to process in the player, because on a system clock period value I need to do zero processing, I can just subtract it from the running counter like I do for regular commands, no division needed to determine the period every time the pitch is changed. Numerator denumerator is even worse, requiring two divisions, at runtime, so occurring frequently. And 32-bit divisions are pretty rough on a Z80.
vampirefrog wrote:I suppose, ideally, you could store yet another field that dictates how the data in the stream frequency field is stored: as an integer ratio, as a floating point number, as a double, fixed-point (which is still just a fraction, but the denominator is a power of two) etc. A double precision floating point value would take the same space as two 32-bit ints btw.
Please no… :) Let’s just pick the best option instead of being undecisive and making me implement a plethora of ways, half of which take forever to calculate. Also having more than one way to specify rate is pretty useless anyway, as a user how do I know which I’m supposed to use. If one really must, a fixed point fractional field could be added to the period, which can easily be ignored by the player. But given how streaming is tied to the CPU clock, it doubt it would add actual precision, unless the original streaming code also used fixed point.
  • ctr Offline
  • Posts: 492
  • Joined: 2013-07-17, 23:32:39

Post by ctr »

grauw wrote:Btw, IMO the streaming commands should specify rate as a period in system clock cycles (as pretty much every sound chip does as well). Currently the 2.0 proposal allows rate to be specified as a Hz frequency or a period in sound chip cycles, but I think the frequency option should go, and the period should use system clock cycles (“chip 0”) rather than sound chip cycles. Since the streaming rate is driven by the CPU clock, it is the most natural unit to use.
This will break compatibility with existing VGMs when converted to the new format. We use Hz because the generic tools that create the PCM streams simply don't care about the main CPU frequency or dividers. We just look at the average write frequency and make a Hz assumption from that.

The way data is streamed vary a bit between systems. Some systems may use DMA, some use FIFO buffers (PWM) and some just use CPU manually timing everything (YM2612)

MD is generally not suitable for DAC streams unless the YM2612 timer was used and the write frequency is constant (with some jittering). Examples being Echo and GEMS. SMPS games usually use a tight Z80 loop to time the samples and for those the actual frequency will vary depending on CPU load (and even the VDP load for the MD). The frequency averaging algorithm used in dacopt does not work well for these games, so the old way of seeking and single byte read commands work better. (although at a size cost)
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

I see, thanks for the information.
ctr wrote:
grauw wrote:Btw, IMO the streaming commands should specify rate as a period in system clock cycles
This will break compatibility with existing VGMs when converted to the new format. We use Hz because the generic tools that create the PCM streams simply don't care about the main CPU frequency or dividers. We just look at the average write frequency and make a Hz assumption from that.
Hm, ok… I think if you use period with a fixed point fraction, it should be precise enough to convert from Hz.

Post by NewRisingSun »

I will post a new draft during the weekend that incorporates some of the recent points.

I will insist however on allowing the sample rate to be specified both in Hz and chip clocks, as that allows for maximum precision for different applications at the bearable cost of one additional division operation. I will also insist that both data block and stream commands, how ever they will look like in the final version, are simpler than in the VGM v1 specification. The "stream ID" in particular is one aspect that I would like to completely get rid of, if at all possible.

I also tend to come down on the side of making VGM2 authoring easier while being precise at the expense of increased player complexity as long as the format does not demand impossible things from hardware players.
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

Looking forward!

Post by vampirefrog »

Might I suggest that "VGM" or "Video Game Music" does not necessarily imply sound chips, and does not limit itself to chiptune era music. Current day full orchestra sound tracks are also video game music, and they have nothing to do with sound chips, except, perhaps, if they used a DX7 or something.

Technically, the VGM format is a list of commands to be written to sound chips, plus timing info. It also does not imply that it is a log. A VGM file is not necessarily a log of commands captured from an emulator or from hardware, it is simply a list of commands and timing. It also does not imply to be a video game specific format, it can apply to any hardware, real or emulated, where there is writing to sound synthesis chips. You could, however, make the argument that it's specific to the 8 bit and 16 bit era.

In fact, now that I think about it, VGM files are not specific to Video, Games, Video Games or Music. You could make the argument that usually it's music, but it can also be sound fx or just chip initialization.

But either way, I might make a few nerdy suggestions:

.SCL Sound Chip Log,
.SCM Sound Chip Music,
.CM Chip Music,
.CT Chip Tune (technically incorrect though),
.SCB Sound Chip Bus,
.SCC Sound Chip Capture or Sound Chip Commands,
.CC Chip Capture or Chip Commands,
.SCW Sound Chip Writes,
.CW Chip Writes,
.CWL Chip Write Log,
.CL Chip Log,
.SWL Sound (chip) Write Log,
.AL Audio Log,
.AWL Audio Write Log,
.CAL Cpu Audio Log,
.CCL Chip Command List,
.ACC Audio Chip Commands,
.SL Sound Log,
.SC Sound Commands,
.SOC SOund Commands,
.CHIPC CHIP Commands,
.CHI All pervasive energy, part of every living thing,
.COC COmmands for Chips,
.MCC Music Chip Commands,
.MUC MUsic Commands,
.CUM Chip Unified Music
.CHM CHip Music

Post by vampirefrog »

I don't know if this has been mentioned, but here's an idea:

make it so the format is future proof: commands have byte length defined in the file somewhere. This way players can just ignore commands for chips they don't support, but still parse the file correctly.
  • ctr Offline
  • Posts: 492
  • Joined: 2013-07-17, 23:32:39

Post by ctr »

VGM 1 already has this, the command number itself defines the length of the command. No reason to remove this feature in VGM 2 I'd say.

Post by vampirefrog »

ctr wrote:VGM 1 already has this, the command number itself defines the length of the command. No reason to remove this feature in VGM 2 I'd say.
I don't think it's properly implemented in VGM 1, I think it should be better generalised. The player shouldn't need to know what each command length is, rather, the file header should specify this somehow.
  • ctr Offline
  • Posts: 492
  • Joined: 2013-07-17, 23:32:39

Post by ctr »

That would just make the file more complicated to parse. The only commands that need to be backward compatible across players are chip writes anyway. We already have a set amount of formats for chip writes (depending on if the address or data bus is 8-bit or 16-bit, if there is a 'port' bank, and such with various combinations). Besides, VGM commands (and certainly their length) should not change between files, even if you have a table of every command in the header it will just add to the complexity of parsing a VGM file. The only variable length commands we have in VGM 1 files, data blocks, will be moved outside the command stream so that won't be an issue. Other special features of VGM files like dac stream commands will probably be turned into a virtual "chip" of its own. This way backwards compatibility can be ensured while no fluff ends up in the actual VGM command structure
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

The command length is already encoded in the command itself (bits 4-6), so it’s upwards compatible.

Post by vampirefrog »

how about:

- a marker command, that does nothing but mark a spot in the file
- a way to choose or prefer the chip core to render a particular chip's sound
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

Here’s two ideas which may affect the “purity” of the specification, but do decrease the data size.

I was thinking, there are several chips with 2 ports, e.g. OPNA, OPL3, etc. It would save quite a bit of space if two of the track command’s bit 6-4 values would be used to denote 2 data bytes, so that the port number doesn’t need to be encoded. E.g.:

Code: Select all

$80-$FF                 Write to chip
                        Bit 7:          1
                        Bit 6-4:        Number of data bytes:
                                        0: 1 data bytes
                                        1: 2 data bytes
                                        2: 2 data bytes (secondary port)
                                        3: 3 data bytes
                                        4: 4 data bytes
                                        5: 5 data bytes
                                        6: Command is followed by uint8 value
                                           denoting the number of data bytes-1
                                        7: Command is followed by uint32 value
                                           denoting the number of data bytes-1
                        Bit 3-0:        Chip number (0-15)
In other words, for a chip with two ports a value of 1 would be interpreted as data bytes 00 nn nn, and a value of 2 would be interpreted as data bytes 01 nn nn. It can also still accept value 3 to explicitly indicate the port number in the first data byte.

Also;

Code: Select all

$00-$7C                 Wait 1-125 ticks
$7D ll                  Wait ll ticks
$7E ll mm               Wait mmll ticks
$7F ll mm hh            Wait hhmmll ticks
I feel like if the CPU clock speed is going to be used as the player tick time (something which I do think is a pretty nice idea), there are going to be a lot more 2-byte or 3-byte wait commands than there are now. Maybe someone has an idea for a good solution?

One suggestion I could make is to add an additional “divider” value to the CPU clock information (also useful for chips I reckon), which would mean that the timing commands are quantised and can thus cover a longer duration of time. E.g. at 8 MHz a single-byte wait could last 0.015625 ms, with a divider of 8 it can last 0.125 ms. Benefit is that the VGM still contains the real CPU frequency so UI can display it, as compared to lowering the CPU frequency directly to achieve the same effect. Downside obviously being that you lose precision.

But maybe we should first analyze some actual VGM2 files to see how big of an issue this is before adding such a thing.
Post Reply