Couple of early answers, more maybe to come later...
grauw wrote:
Please put GD3 data at the start
That's been an issue with MP3s between ID3v1 and ID3v2. At this point, we haven't even discussed whether to keep GD3, or switch to Vorbis comments, which I would prefer.
grauw wrote:
Please put the ROM/PCM block before the track data
Can be done. It will not make a difference for players that decompress the entire file anyway.
grauw wrote:
Multiple tracks, are they supposed to play back one by one or simultaneously? In the former case, why not have different VGMs then.
Simultaneously, for files with channel-specific loop points. Having to play different VGMs simultaneously would mean that one song is no longer represented by one file.
grauw wrote:
Specifying the frequency as numerator / denumerator is nice in theory, however I think it unnecessarily complicates things.
It should be pretty simple to derive integer "counter add" and "tick period" values given a player's master timer clock and a file's numerator/denominator values. Maybe I'll add some code on that issue. It will require 32-bit division though in any case, but I would not know how to avoid that.
grauw wrote:
As mentioned before please encode SCC commands as "aa dd"
Can do, after I research why it's currently listed as pp aa dd.
grauw wrote:
Why are wavetable chip command addresses not encoded in intel byte order (ll mm)?
I've just copied it from the current specification.
grauw wrote:
For me the point for a type / subtype would be that if a new subtype is added, there is a likelihood that it will be recognised by older players which have not added support for it and will still play back at least to some degree. 7a. In that vein, it is strange to me that within a type there are different command lengths. 7b. Also the OPLL does not belong in the OPL group since it is not register compatible with the others.
Well, my criterion is more about common emulation cores than compatibility with players which don't know a particular subtype.
grauw wrote:
I feel for parsing simplicity, rather than FF it would be better to use $7F for special commands, then the 16th chip can be used with data length 8.
Oh, ok.
grauw wrote:
Not a big fan of the MIDI style variable length, a length byte or encoded length byte would be easier to process, and not restrict the following values to 7 bits and the (slow on Z80) bit-shifting encoding schemes for >127 values that will undoubtedly follow.
If I can avoid putting data blocks into the code, then the length byte is unlikely to ever get beyond 127 bytes anyway.
grauw wrote:
Additionally, maybe leave a few numbers free for future extension? $70-7A? Could define some fixed lengths for some / all of them for downwards-compatibility.
I think I'll just drop the fixed-length bytes above 16/17.
grauw wrote:
Instead of 7C-7F and the “special command”, maybe introduce a special “VGM control chip” and reuse the chip command? Seems like it would add a lot of flexibility. 00-7F could be all waits and 80-FF all commands.
That would reduce the number of usable chips by one, though. Then again, who uses even 15 chips?
grauw wrote:
Channel remaps could be specified as channel swaps to fit within the 8 data bytes and reduce the amount of state.
Mmmh. I'm not sure I would want to give up the flexibility of variable length bytes beyond a length of 8 though.
grauw wrote:
Another idea, the wait commands could use an UTF-8 like encoding where the two most significant bits indicate the length (0-1 byte, 1-2 bytes, 2-3 bytes, 3-4 bytes, big endian order.
That would result in a wait command consisting of 6 data bits per byte. Not sure whether others would like that.
grauw wrote:
Please put the command data byte count in bits 0-2 for easy parsing by masking.
So basically swapping the nibbles? Sure, I can do that. That would be a switch from your UTF-8-like proposal for the wait commands though.
grauw wrote:
Channel remap will be very annoying to implement, and not all chips even define a channel that clearly,
I don't think it would ever be needed on the AY chip anyway. The feature is such that without remapping, the file will still play, but with artifacts.
grauw wrote:
For future extensibility (upwards compatibility), I think it might be good to use a chunks-like architecture of sorts. Currently I think if a chip attribute is ever added, the file can’t be parsed by older players. E.g. in the chips table, prefixing each chip by a length byte would be a good start.
If no attribute ever has more than eight bytes, and we don't need more than 16 attribute types, I could use three bits of the attribute number as a length count similar to how the track commands work.
grauw wrote:
Combine the chip type and subtype into one 16-bit value.
Maybe.

grauw wrote:
A common structure for all chip types would be easier to parse than the current ID-value based approach. At least the current attributes seem common to all.
A file should not have to specify fields that use the "default" values in my opinion, especially if we are going to add channel-specific volumes, which of course will vary between chips in size anyway.
grauw wrote:
Stream commands are just another generic DMA chip as far as I’m concerned. Could be the same as the earlier mentioned VGM control chip.
That's a good idea.
grauw wrote:
It would be good if there was support for multiple data blocks,
I should make it clearer that the ROM/PCM section doesn't represent ONE data block, but ALL data blocks.