Skip to content

VGM 2.0 suggestions / ideas

orig. title: Multiple AY chips with stereo

Technical discussion about the VGM format, and all the software you need to handle VGM files.

Moderator: Staff

  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

Nice update! Many nice changes!

Some comments;
$6 uint16 File flags
Bit 0: 1=.VGM2 file uses external Memory Data Library
file (see Section 3). The VGM2 file's Memory
Data section only contains the name of the
Memory Data Library file.
I don’t think embedding a file name in the data is good, it won’t survive file renames, or 8.3 truncation as happens on MSX.

I still think it would be better to not have this feature at all, at least not in the first version (can be added later if needed). It just adds complications and complicated user interaction (like a manual “select data lib” prompt) for a gain that is doubtful.

Additionally, I think this flag should not be in the header, but in the memory data section.
$10 uint32 Size of Metadata section
$14 uint32 Size of Memory Data section
$18 uint32 Size of Chip Info section
$1C uint32 Size of Track Data section
I think the chip info section should come before the memory data, so that:
1. I can display chip info along with the metadata without first having to load and decompress the potentially huge data section.
2. I can process the data section with knowledge of what chips it is used for.
$20 Total VGM header size
I think it would be good if the header contained a header length value, for future extension. Otherwise, I can foresee that the metadata would be abused for non-metadata purposes, because it is the only place to compatibly add global extra information.
The metadata contains a number of "Vorbis comments" [2] attributes.
Why not simply a sequence of key-length-values? key: byte, length: word or doubleword, value: UTF-8 string. It would be easier to parse, since it would just need to do a byte comparison rather than a string comparison. It also meshes better with how chips are specified. There is no particular value in Vorbis reuse here imo.
The precise meaning of the bytes in the Memory Data section therefore depends entirely on the memory write and stream control commands in the Track Data.
I like that the data is all guaranteed to be up-front, if I’m doing on-the-fly decompression it’s good to be assured that there is not going to be any huge data block in the middle of the stream that will add a pause in the playback.

However as I mentioned earlier, I would prefer this data to be less opaque, more structured. I understand that it complicates the spec, however it would open up many possibilities for me to preload and convert the data without having to do a pre-processing pass on the track data. Processing it in real-time would add pauses to the playback, and effectively prevents me from using e.g. the OPL4 to play PCM data. The way this worked in VGM 1.0 was more suitable for my purposes, and I would rather that be expanded than simplified. I’ll write a little proposal later.
$2 uint32 Chip clock
Ok, now I think it’s a bit weird that the header specifies the clock in numerator / denumerator format and the chips do not :D. I would say pick either the one or the other. I’ve had my say about which I think is simpler, but really, both are fine for me.
needs, can be uint8, uint16 or uint32, as denoted by
I wouldn’t specify these explicitly, some implementations could interpret this as that the length will always be 1, 2 or 4, while I think a length of 3 and 5 or more would be equally valid.
$11 uint8 order, Global low-pass filter of nth order with specified
uint32 cutoff -3 dB cutoff frequency in Hz. Default is no filter.
$12 uint8 order, Global high-pass filter of nth order with specified
uint32 cutoff -3 dB cutoff frequency in Hz. Default is no filter.
Low-pass filter has purpose, since this is usually present in the sound circuitry. However, I’ve never heard about a high pass filter in the signal path?

Btw, on e.g. the OPLL in MSX there is a chain of first-order low pass filters, each at different cut-off frequencies (also varying per machine actually), so this would be an approximation. I think it’s good enough though.

I also note that you have per-channel versions of these commands as well. Maybe have one version, and use a channel no. of FFH to indicate global? Maybe would reduce the amount duplication, esp. if more of these would get added.
0 K005289 (Bubble System) 3579545 ???
1 K051649 (SCC) 1500000 aa dd
2 K052539 (SCC+) ??? aa dd
Typical clock is 3579545 for both SCC and SCC+ btw.
2 Yamaha YMF278B (OPL4, MoonSound) 33868800 pp aa dd
MoonSound is just the name given by Sunrise to their OPL4 cartridge, and Yamaha’s name OPL4 is equally well known in the MSX community, so I don’t think it is explicitly worth mentioning the former :).
$44 Yamaha OPN family, subtypes:
0 YM2203 (OPN) 3000000 aa dd
1 YM2608 (OPNA) 8000000 pp aa dd
Would it maybe be worth defining 2-byte commands for all the OPN* chips, which write to port 0? Could save a bunch of bytes.
Can take 8-bit, 16-bit, 24-bit, or 32-bit float samples as data.
I assume this means that if 4 bytes are passed in, it is interpreted as float? Could maybe phrase that case a bit more explicitly.
$10 (varies) Remap chip channels, used for files logged from sound
drivers that allocate channels dynamically.
First data byte: chip number (1-15). Further bytes:
remapped channel numbers. All channels of the chip must
be specified; no "partial remaps" are allowed.
Why are partial remaps not allowed?

Alternate option: $10 chipid aa bb Redirect commands for channel A to channel B and vice versa.

With this one would specify a series of swap commands rather than one big map. Advantage would be that the length is no longer variable, and processing is a bit easier (each command just swaps two pointers, rather than more complex processing of an inner data structure).

I think eventually this should get further precise specification what this means for every chip, how the channel numbers are assigned (e.g. for those with FM + PCM) and which registers are redirected how, but for now we can leave it be since that would have to come from practical implementation experience. Could be an appendix (possibly separate document) published later, or it could be specified in reference source code form (meh).
$21 uint8 chip, Short forms of command $21; distinguished by the "Number
uint32 to, of data bytes" field. "from" (and "to" in the shortest
uint32 len/ form) retain the values they had after the previous $21
uint8 chip, command was executed. The short forms may appear without
uint32 len a preceding long-form command; "from" and "to" are
initialized to zero when playback starts from the
beginning of the file. "from" and "to" are maintained
per-chip, not per-track nor per-file.
I don’t think the short forms are needed. I don’t really see how they would be useful, and it just complicates the implementation for a few bytes saved for (hopefully infrequently used) commands. Also, if “from” and “to” retain their previous values, is this their value with length added, or not?
"cctt": compression type word, decompressed by the VGM2
player, followed by compression-type-specific data bytes
("varies").
Meh… Just extra playback overhead for me, while the files are already compressed by gzip…
$31 uint8 chip, Set playback rate in Hz for stream playback on chip
uint16/32 rate "chip". The command accepts either 3 or 5 data bytes,
Why have this command? The ticks version ($32) is better. This one has bad precision in the low Hz range, and just requires me to do an unnecessary 16/32-bit division to determine the period that $32 would provide me directly.

Post by vampirefrog »

it would be nice if VGM files could be played by streaming from stdin, ie without any seeking.
  • ctr Offline
  • Posts: 492
  • Joined: 2013-07-17, 23:32:39

Post by ctr »

Current VGM files would actually already be, if it wasn't for the GD3 data at the end. If we move that to the beginning then it would be possible to stream from stdin (with some buffering to read data blocks and stuff first of course).
  • kirishima Offline
  • Posts: 82
  • Joined: 2015-06-18, 22:26:41

Post by kirishima »

I'm just curious. Why add AICA support? Most emulators of the chip don't sound very good, and the ones that do either have issues or are closed source. I can only think of 2 games that vgm support would benefit from since they can't be ripped with the current dsf tools due to not using the usual formats.

Post by Kaminari »

I'm not sure if we're taking requests at this stage, but I hope MSM5205 will get supported in 2.0.

Post by NewRisingSun »

I am going to post a new draft in time, but here are my replies to some points.
grauw wrote:Why not simply a sequence of key-length-values?
In my book, metadata should be the one part of a binary file that is self-explanatory. Binary keys would not be self-explanatory.
grauw wrote:I’ll write a little proposal later.
Looking forward to it. ;)
grauw wrote:Ok, now I think it’s a bit weird that the header specifies the clock in numerator / denumerator format and the chips do not :D. I would say pick either the one or the other.
No, because there is a good reason for treating them differently. As I explained earlier, the VGM timing clock must be specified precisely, otherwise rounding errors will accumulate over time. By contrast, chip clocks can be safely rounded to 1 Hz because the officially-specified tolerance for NTSC's subcarrier frequency, from which most odd chip clocks are derived, is much larger than that.
grauw wrote:However, I’ve never heard about a high pass filter in the signal path?
High-pass filters are used for DC offset removal, among other things. The NES for example has two high-pass filters and one low-pass filter in its signal path.
grauw wrote:in MSX there is a chain of first-order low pass filters,
I will include a sentence stating that the filter attribute can be specified multiple times to indicate multiple filtering stages.
grauw wrote:Why are partial remaps not allowed? Alternate option: $10 chipid aa bb Redirect commands for channel A to channel B and vice versa.
Bad idea. Remember that remapping will only work across multiple iterations of a loop if the target channel of a remap is not interpreted as the actual chip channel number, but as the index into the current remap table, which contains the actual chip channel numbers only at the beginning. This implies that remapping a chip's channels must be an atomic operation, otherwise things become ambiguous. For example, suppose the nine OPL channels are to be remapped this way:

Code: Select all

0 1 2 3 4 5 6 7 8
4 7 1 3 0 8 2 5 6
If you remapped one channel at a time, you would get this:

Code: Select all

0->4
1->7
2->1 ; BUG! Channel 1 has just been remapped to 7!
3->3
4->0 ; BUG! Channel 0 has just been remapped to 4!
5->8
6->2 ; BUG! Channel 2 has just been remapped to 1!
7->5 ; BUG! Channel 5 has just been remapped to 8!
8->6 ; BUG! Channel 6 has just been remapped to 2!
Partial remaps would only work if each remap command modified not the actual remap table, but a copy of it, a "new remap" table so to speak. You would then have a second command, one that you would specify after all the partial remap commands, that copies the new remap into the actual remap table. This would make things much more complicated than simply insisting that the remap command must include a complete remap table that includes every channel.
grauw wrote:I think eventually this should get further precise specification what this means for every chip
I will include that hopefully in the next draft version.
grauw wrote:Also, if “from” and “to” retain their previous values, is this their value with length added, or not?
With the length added.
grauw wrote:Why have this command? The ticks version ($32) is better. This one has bad precision in the low Hz range, and just requires me to do an unnecessary 16/32-bit division to determine the period that $32 would provide me directly.
It's mostly for the "Raw DAC" chip, so you don't have to make up a fictitious chip clock for that just to get the sampling rate you want.
vampirefrog wrote:it would be nice if VGM files could be played by streaming from stdin, ie without any seeking.
Yes, that will be a byproduct of ordering the file's sections in the manner that grauw described.
kirishima wrote:Why add AICA support?
I just went ahead and added any chip to the chip table that I know of. I am not actually assuming that each one will be supported, or that VGM2 players must add support for them.
Kaminari wrote:I'm not sure if we're taking requests at this stage
The purpose of this whole thread is for taking requests, so request away! :)
Kaminari wrote:but I hope MSM5205 will get supported in 2.0.
I'll add it to the chip table.
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

NewRisingSun wrote:I am going to post a new draft in time, but here are my replies to some points.
grauw wrote:Why not simply a sequence of key-length-values?
In my book, metadata should be the one part of a binary file that is self-explanatory. Binary keys would not be self-explanatory.
Well, ok, not a really good reason to make my parsing life harder in my book :). People are looking at a spec to implement these tools anyway, it’s already full of coded values, are they self-explanatory? No. There’s no reason metadata should be different, so let’s be consistent.
NewRisingSun wrote:No, because there is a good reason for treating them differently. As I explained earlier, the VGM timing clock must be specified precisely, otherwise rounding errors will accumulate over time. By contrast, chip clocks can be safely rounded to 1 Hz because the officially-specified tolerance for NTSC's subcarrier frequency, from which most odd chip clocks are derived, is much larger than that.
Note that this difference in duration can again also occur on real hardware. Record the same song on two different machines, and it will have a slightly different length.

Additionally, in real hardware, if both CPU and chip are fed by the same 3579545 clock, they will be exactly in sync. Whereas if we specify the CPU time exactly and the chip time rounded, they will drift out of sync in VGM.

I understand that being exact is maybe more relevant for the one value than for the other, but I think it’s weird to not be consistent, for me the choice is either do the exact thing or the simpler thing. If you’re not fine with small rounding error from the ideal frequency, then let’s just do the exact thing throughout.
NewRisingSun wrote:
grauw wrote:Why are partial remaps not allowed? Alternate option: $10 chipid aa bb Redirect commands for channel A to channel B and vice versa.
Bad idea. Remember that remapping will only work across multiple iterations of a loop if the target channel of a remap is not interpreted as the actual chip channel number, but as the index into the current remap table, which contains the actual chip channel numbers only at the beginning. This implies that remapping a chip's channels must be an atomic operation, otherwise things become ambiguous.

Partial remaps would only work if each remap command modified not the actual remap table, but a copy of it, a "new remap" table so to speak. You would then have a second command, one that you would specify after all the partial remap commands, that copies the new remap into the actual remap table. This would make things much more complicated than simply insisting that the remap command must include a complete remap table that includes every channel.
Nope, you can express this with the swaps, and it will also operate correctly for the next iterations. With swaps you can avoid needing a temporary copy of the table during the remapping entirely, because you just exchange two values in the table. And it doesn’t need a variable length command with special parsing. So it’s actually simpler to implement.

Code: Select all

        ; 0 1 2 3 4 5 6 7 8
0 <> 4  ; 4 1 2 3 0 5 6 7 8
1 <> 7  ; 4 7 2 3 0 5 6 1 8
2 <> 7  ; 4 7 1 3 0 5 6 2 8
5 <> 8  ; 4 7 1 3 0 8 6 2 5
6 <> 7  ; 4 7 1 3 0 8 2 6 5
7 <> 8  ; 4 7 1 3 0 8 2 5 6

Post by NewRisingSun »

grauw wrote:Whereas if we specify the CPU time exactly and the chip time rounded, they will drift out of sync in VGM.
Which is inconsequential.
grauw wrote:Nope, you can express this with the swaps, and it will also operate correctly for the next iterations.
I'm not sure I understand your example correctly. You want to go from 0 1 2 3 4 5 6 7 8 to 4 7 1 3 0 8 2 5 6, and you specify six swap commands 0->4, 1->7, 2->7, 5->8, 6->7, 7->8? That notation is completely unintuitive, exactly because of the path dependence of your swap parameters that I mentioned, and will be a nightmare to follow in a vgm2txt-like output.
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

NewRisingSun wrote:
grauw wrote:Whereas if we specify the CPU time exactly and the chip time rounded, they will drift out of sync in VGM.
Which has no consequences.
Except that it does. I made a pulse wave once by timing volume changes with the CPU exactly to the AY3 PSG, while leaving the tone free running. On the emulator there was a 1-cycle timing difference bug, and as a result the pulse wave drifted, causing a PWM effect rather than a steady pulse.

I don’t understand your resistance, I just suggest you go back to your initial proposal, which was precise and consistent, instead of this half-baked compromise.
NewRisingSun wrote:
grauw wrote:Nope, you can express this with the swaps, and it will also operate correctly for the next iterations.
I'm not sure I understand your example correctly. You want to go from 0 1 2 3 4 5 6 7 8 to 4 7 1 3 0 8 2 5 6, and you specify six swap commands 0->4, 1->7, 2->7, 5->8, 6->7, 7->8? That notation is completely unintuitive, exactly because of the path dependence of your swap parameters that I mentioned, and will be a nightmare to follow in a vgm2txt-like output.
vgm2txt can format the output however we want for improved readability, it already does so in a lot of cases. It can just show the current channel allocation after each swap command like I did here.

And it’s no different from what happens after multiple channel reallocations, which is also just one big swap command, except bigger and harder to implement and requiring a temporary buffer. But I’m repeating myself. I’m just the guy who has to implement it.

Post by NewRisingSun »

grauw wrote:I don’t understand your resistance, I just suggest you go back to your initial proposal, which was precise and consistent, instead of this half-baked compromise.
I took VB's response as rejecting the numerator/denominator format for chip clocks.

Not having programmed on the Z80, I cannot judge the overhead needed for a temporary 32-byte-or-so table for a full channel remap. It certainly adds very little on the 8088. I accept, based on your example, that it is possible to use single-channel swap commands as well, and will wait if others have an opinion on this.
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

NewRisingSun wrote:Not having programmed on the Z80, I cannot judge the overhead needed for a temporary 32-byte-or-so table for a full channel remap. It certainly adds very little on the 8088.
You’re right, I coded it out for myself and it is comparable in terms of code.

Primarily it just feels nicer to avoid the temporary if it’s not needed and keep the command as an atomic operation with fixed length. Just a clever way of doing it.

The idea stemmed from trying to avoid command lengths > 8, but currently we kind of need variable lengths for the data blocks anyway unfortunately.

Post by vampirefrog »

NewRisingSun wrote:
vampirefrog wrote:it would be nice if VGM files could be played by streaming from stdin, ie without any seeking.
Yes, that will be a byproduct of ordering the file's sections in the manner that grauw described.
It would be nice if writing was also very simple, ie write a header, then log the commands, then close file. I believe currently some back seeking is required after logging, to fill in some header fields. Perhaps make the 'Total samples' field optional.

Also, have you thought about separating a logging format and a playing format? The logging format can be bare bones and easy to write (very little code required), and the playing format can have more features, and would serve to compress the logged format and also have some homebrew-friendly features.

Edit: also, it would be nice to be able to decompile and recompile the files, it would add a useful tweaking method IMO. Currently vgm2txt only decompiles, but it would be nice to have a sister format that is text only and can be compiled. Perhaps a dialect of MML?
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

NewRisingSun wrote:
grauw wrote:I’ll write a little proposal later.
Looking forward to it. ;)
So a rough sketch;

Chips section:

$30 uint32 size
ROM size

$31 uint32 size
RAM size

(Actually something more complicated may be needed, for Y8950 the addressing differs based on ROM or RAM and its size which would be captured here I think, however e.g. OPL4 may need something more elaborate.)

Memory data section:

$00 uint8 chip, uint32 length, uint32 start, [data…]
Data block

I think this about covers the structure you already said you wanted to add, and brings it on par with the amount of structure in VGM 1.

---

So the additional part I was actually thinking about is, to add some sample structure information inside the data blocks as well, e.g.:

Memory data section:

$80 uint8 chip, uint8 bb, uint8 ch, uint32 start, uint32 loop, uint32 len, uint8 flags
Datablock sample info

Track section:

$33 uint8 chip, uint16 sample
Start stream playback for memory region

This way I could prepare the OPL4 instrument headers and upload the samples to the OPL4 without having to scan through all the track data prior to playback to determine these myself (which would take a while). These sample sections would then ideally not only be defined for stream playback commands, but also for chips with their own PCM playback capabillities.

Note that some chips already contain this sample data in the data itself (e.g. OPL4 and MultiPCM), but other chips allow the user to freely specify a start and stop address at any point during playback, so to use with OPL4 these would require a pre-scan.

However in the end, I think it’s something that would be useful to me, but thinking about it practically, maybe it puts too much burden on the VGM authoring side. Also this is enforced by the structure for stream commands, but not for chip-controlled playback. And if it’s not done consistently for all VGM2s (or worse, incorrectly), then I would need to implement the pre-scan pass anyway, so it doesn’t save any trouble.
NewRisingSun wrote:
grauw wrote:Why have this command? The ticks version ($32) is better. This one has bad precision in the low Hz range, and just requires me to do an unnecessary 16/32-bit division to determine the period that $32 would provide me directly.
It's mostly for the "Raw DAC" chip, so you don't have to make up a fictitious chip clock for that just to get the sampling rate you want.
I think Hz is too course a unit to be useful. For low note frequencies you will be off-key by many cents.

One option would be to add a fractional part, iow define frequency in 1/65536ths of a Hz. But IMO adding a second way to do the same thing is not adding value, just complexity, so I would still vote for scrapping this command. If you want to not completely leave the clock value up to arbitrary definition by pack authors, you could define a default frequency for the DAC.

(Maybe 3579545 or, the speed of sound at 40 ºC in 1/10000th metres per second units ;))
  • User avatar
  • grauw Offline
  • Posts: 150
  • Joined: 2015-02-22, 3:40:22

Post by grauw »

Something dawned on me;

The “VGM chip”, chip 0 represents the CPU, the global controlling entity.
  • The VGM chip should be defined in the chips section explicitly, rather than implicitly.
  • Time base should be specified on chip 0.
  • File flags should be specified on chip 0.
  • Global cutoff / volume / panning values can also be specified here.
  • Stream command periods are based on chip 0’s time base rather than the sound chip’s, because the CPU is what performs the DMA.
  • Chip type is 00 00. The subtype could be used for versioning if needed. Or CPU type (Z80, 68000, etc.).
This way it really clicks together, the “player” chip is really just another chip. It gets all the options chips get, and all flags, and gains the easily extensible attributes any chip has.

Post by vampirefrog »

Here are some ideas:

1. Support clipping parameters. Either per chip or per song or per vgmplay.ini. Some sound chips were meant to be clipped. See here.
2. Support clipping type: soft clipping / hard clipping / other.
3. Other clipping controls: per-chip clip level, per-chip clip enable, per-mix clip level, per-mix clip enable.
3. Support low pass filtering per chip/per song/per vgmplay.ini. Just support a few common filters. Someone would have to research the hardware to see what sort of filters were used (active, passive etc) and we'd have to find DSP equivalents. Shouldn't be too hard, though. We could ask on DSP forums.
4. A command to mark video frames (vertical interrupts). Might make it easier to compress some files. Plus it can be removed by the compression program.
Post Reply