VGM2 Format Proposal, by NewRisingSun, as of 2017-09-18 ======================================================= I. Deficiencies of the original VGM format ========================================== 1. Using two chips of the same types requires chip-specific hacks, e.g. adding $80 to the register number (SAA1099) or $50 to the command number (YM3812). 2. Using more than two chips of the exact same type is not possible and requires using a similar chip instead (e.g. 4xAY-3-8913 as 2xYM2203 + 2xYM2149), if one is available. 3. Panning the output of individual chips in a multi-chip setup is only possible with chip-specific hacks, e.g. adding $80000000 to the chip clock (YM3812) or not at all (AY-3-8913). 4. Music cues that consist of several tracks with different loop points can only be logged by including as many loops as it takes until they are in sync again, which can take quite a while and may not happen at all. 5. Music logged from sound drivers that allocate channels dynamically cannot be logged at all without audible artifacts. 6. Including raw DAC output is not directly possible and requires repurposing a YM2612's DAC channel as a workaround. 7. The fixed time base of 44,100 Hz makes higher playback output rates cumbersome to implement and invariably creates rounding errors in timing, consequential for raw DAC output. II. Outline of a new VGM2 format ================================ 1. Include a "chip table" in the header that specifies the type of chip and its attributes, including optional volume and panning attributes. 2. Use a chip-independent command structure, where the command itself only denotes its length and an index into the chip table. 3. Allow several tracks with individual loop points. This will allow music cues that consist of several tracks with different loop points to be logged accurately. The files overall length equals the length of the track with the longest loop. The overwhelming majority of files will still only have a single track, preserving simplicity. 4. Allow variable time bases by specifying a time base in the header. 5. Include a raw DAC as a chip type to allow for raw DAC output without hacks. 6. Include a special command to remap a chip's channels. The command will be placed at the end of a loop, so that loops sound clean even when logged from sound drivers that allocate channels dynamically. A tool would take an untrimmed VGM2 file together with supplied loop start and end points, compare channel usage between loop start and loop end points and determine the channel remap table accordingly. III. VGM2 specification ======================= The normal file extension is .vgm but files can also be GZip compressed into .vgz files. However, a VGM2 player should attempt to support compressed and uncompressed files with either extension. (ZLib's GZIP library makes this trivial to implement.) All integer values are *unsigned* and written in "Intel" byte order (Little Endian), so for example $12345678 is written as $78 $56 $34 $12. 1. VGM file header ------------------ Offset Type Description $0 char[4] Magic number: "VGM2" $4 uint32 Version number (BCD), e.g: $00000200 = 2.00 $8 uint32 Time base in Hz, numerator $C uint32 Time base in Hz, denominator $10 uint32 Size of chip table $14 uint32 Size of track table and data $18 uint32 Size of ROM/PCM data $1C uint32 Size of metadata (GD3 tag or Vorbis comments) --- $20 Total VGM header size In the file, the chip table must come before the track table, which must come before ROM/PCM data and the metadata. When reading GZIP-compressed files, gzread() the first 32 bytes, then add $20 plus sum of the sizes of chip table [$10], of track table and data [$14], of ROM/PCM data [$18] and metadata [$1C] to obtain the total file size. The time base defines how long one clock of the wait commands is. It is specified as a fraction for precision, as all common master ticks are actually derived from positive integers. For example, NTSC's subcarrier frequency of ~3579545 Hz is precisely 4500000/286*455/2 (see SMPTE 170M), factorized to 39375000/11. Use http://wims.unice.fr/wims/wims.cgi (Factoris) for help with factorizing master ticks. Converted VGM files will contain 44100/1 as the time base. Emulators that directly log to VGM2 format files will typically put the system's master clock as the time base, e.g. 39375000/11*6/4/3=19687500/11 ~1.79 MHz for the NTSC NES. 2. Chip table ------------- The chip table consists of a series of one-byte attribute numbers and one or several bytes of data. Every file must define at least one chip. The following chip types are defined: Type Default clock Lengths -------------------------------------------------------------------------- Programmable Sound Generators (PSG) ----------------------------------- $00 SN76489 family, subtypes: 0 Texas Instruments SN76489 39375000/11 ~3579545 ad 1 Texas Instruments SN76496 39375000/11 ~3579545 ad 2 Texas Instruments SN76489AN 39375000/11 ~3579545 ad 3 NCR 8496 39375000/11 ~3579545 ad 4 PSSJ-3 39375000/11 ~3579545 pp ad 5 T6W28 3072000/1 pp ad $01 AY-3-8910 family, subtypes: 0 General Instrument AY-3-8910 19687500/11 ~1789772 aa dd 1 General Instrument AY-3-8912 19687500/11 ~1789772 aa dd 2 General Instrument AY-3-8913 19687500/11 ~1789772 aa dd 3 General Instrument AY-3-8930 19687500/11 ~1789772 aa dd 4 Yamaha YM2149 (SSG) 19687500/11 ~1789772 aa dd 5 Yamaha YM3439 19687500/11 ~1789772 aa dd 6 Yamaha YMZ284 (SSGL) 19687500/11 ~1789772 aa dd 7 Yamaha YMZ294 (SSGLP) 19687500/11 ~1789772 aa dd $02 Atari Pokey 19687500/11 ~1789772 aa dd $03 Philips SAA1099 8000000/1 aa dd, aa $04 Nintendo RP2A03, subtypes: 0 Nintendo RP2A03 19687500/11 ~1789772 aa dd 1 Nintendo RP2A03E-H 19687500/11 ~1789772 aa dd 2 Nintendo RP2A07 53203425/32 ~1662607 aa dd $05 Nintendo RP2C33 19687500/11 ~1789772 aa dd $06 GameBoy DMG 4194304/1 aa dd $07 MOS Technology SID, subtypes: 0 MOS 6581 17734475/18 aa dd 1 MOS 8580 17734475/18 aa dd Wavetable chips --------------- $20 Namco WSG, subtypes: 0 Discrete 96000/1 ??? 1 Namco 15XX 24000 ??? 2 Namco CUS30 12000 ??? 3 Namco 106/163 19687500/11 ~1789772 aa dd $21 Konami SCC, subtypes: 0 K005289 (Bubble System) 39375000/11 ~3579545 ??? 1 K051649 (SCC1) 1500000/1 pp aa dd 2 K052539 (SCC+) ??? ??? $22 Hudson HuC6280 39375000/11 ~3579545 aa dd $23 Nintendo Virtual Boy VSU 5000000/1 mmll dd $24 Bandai WonderSwan 3072000/1 aa dd, mmll dd FM chips -------- $40 Yamaha OPL family, subtypes: 0 Yamaha YM3526 (OPL) 39375000/11 ~3579545 aa dd 1 Yamaha Y8950 (MSX-AUDIO) 39375000/11 ~3579545 aa dd 2 Yamaha YM3812 (OPL2) 39375000/11 ~3579545 aa dd 3 Yamaha YM2413 (OPLL) 39375000/11 ~3579545 aa dd 4 Yamaha YMF262 (OPL3) 157500000/11 ~14318181 pp aa dd 5 Yamaha YMF278B (OPL4, MoonSound) 33868800/1 pp aa dd 6 Konami K053982 (VRC7) 39375000/11 ~3579545 aa dd $41 Yamaha OPM family, subtypes: 0 Yamaha YM2151 (OPM) 39375000/11 ~3579545 aa dd 1 Yamaha YM2164 (OPP) 39375000/11 ~3579545 aa dd $42 Yamaha OPN family, subtypes: 0 YM2203 (OPN) 3000000 aa dd 1 YM2608 (OPNA) 8000000 pp aa dd 2 YM2610 (OPNB) 8000000 pp aa dd 3 YM2610B (OPNB) 8000000 pp aa dd 4 YM2612 (OPN2) 84375000/11 ~7670454 pp aa dd 5 YM3438 (OPN2C) ??? ??? 6 YMF288 (OPN3) ??? ??? $43 Yamaha YMF271 (OPX) 16934400 pp aa dd PCM/PWM chips ------------- $60 Sega PCM 4000000/1 bbaa dd $61 Yamaha FA1005 (Sega MultiPCM) 8053975? aa dd, cc bbaa $62 Ricoh PCM, subtypes: 0 RF5C164 (Sega Mega CD) 12500000/1 bbaa dd 1 RF5C68 12500000/1 bbaa dd $63 Sega 32x PWM 23011361? ad dd $64 Yamaha YMF292 (SCSP, Saturn) 22579200? mmll dd $65 Yamaha AICA (Dreamcast/Naomi) ??? ??? $66 Yamaha YMZ280B (PCMD8) 16934400 aa dd $67 Yamaha YM6626 (Irem GA20) 39375000/11 ~3579545 aa dd $68 Namco C140/C219, subtypes: 0 Namco C140, System 2 8000000/1 pp aa dd 1 Namco C140, System 22 8000000/1 pp aa dd 2 Namco C219 8000000/1 pp aa dd $69 Namco C352 24192000 aabb ddee $6A Konami K007232 39375000/11 ~3579545 ??? $6B Konami K053260 39375000/11 ~3579545 aa dd $6C Konami K054539 18432000/1 pp aa dd $6D Ensoniq ES5503 78750000/11 ~7159090 pp aa dd $6E Ensoniq ES5505/ES5506, subtypes: 0 Ensoniq ESS5505 16000000/1 aa dd, aa ddee 1 Ensoniq ESS5506 16000000/1 aa dd, aa ddee $6F NEC µPD7759 640000/1 aa dd $70 Capcom QSound 4000000/1 mmll rr $71 Seta X1-010 16000000/1 mmll dd $72 OKIM6258 4000000/1 aa dd $73 OKIM6295 8000000/1 aa dd Special chip types ------------------ $C0 Raw DAC, subtypes: 0 Signed linear PCM 1 A-Law 2 Mu-Law dd, dd dd, dd dd dd, dd dd dd dd Use this for raw PCM bytes for which a specific chip is neither emulated nor necessary. Can take 8-bit, 16-bit, 24-bit, or 32-bit float samples as data. $C1 Combine simultaneous writes to previously-defined chips as new chip. For example, on the Sound Blaster Pro 1, writes to port $2x8 on the Sound Blaster Pro 1 go to both the left and the right YM3812 simultaneously. The subtype defines how many of the previously-defined chips are being combined. aa: Address byte ad: Combined address/data byte bbaa: Address byte MSB..LSB cc: Channel number dd: Data byte mmll: Memory address MSB..LSB List of attributes: ------------------- Number Meaning Data Default value $00 Type dd (none) Starts a new chip definiton of the specified type with default attributes that may be modified by attributes that follow. The first chip defined will be chip 0, the second will be chip 1, and so on. $01 Subtype dd 0 $02 Flags ddccbbaa 0 Chip-specific flags (uint32) that are not conveyed by the subtype. (To be written) $03 Master clock ddccbbaa hhggffee (chip-specific) ddccbbaa: Numerator (uint32) hhggffee: Denominator (uint32) $04 Overall chip volume ddccbbaa hhggffee FFFF0000 FFFF0000 ddccbbaa: Left output channel volume (int32, linear). hhggffee: Right output channel volume (int32, linear). +65535: 100% (default), -65535: 100%, inverted. $05 Individual chip channel volume ii ddccbbaa hhggffee (chip-specific) ii Channel number (chip-specific). ddccbbaa: Chip channel volume left (int32, linear) +65535: 100% (default), -65535: 100%, inverted. hhggffee: Chip channel volume right (int32, linear) +65535: 100% (default), -65535: 100%, inverted. 3. Track table -------------- Offset Type Description $0 uint32 Number of tracks $4 uint32[tracks] Size of each track 4. Track data ------------- Offset Type Description $0 uint32 Number of ticks in track $4 uint32 Offset of loop start relative to $0 in this structure $8 uint32 Number of ticks in loop $C ... Track commands 5. Track commands ----------------- Value $00-$7B Wait 1-124 ticks. $7C Wait custom interval number of ticks. The default custom interval is 735 ticks. $7D ll Wait ll ticks. $7E ll mm Wait mmll ticks. $7F ll mm hh Wait hhmmll ticks. $80-$FE Write to chip. Bit 7: 1 Bit 6-4: Number of data bytes (0-7, meaning 1-8). Bit 3-0: Chip number (0-15), as defined in the chip table (Section 2). It is up to each chip handler to split the data bytes into port, address and register data fields; see the chip table for details. A chip handler should ignore chip writes with an unrecognized number of data bytes. $FF xx varlen Special command number xx, with a (varlen) number of data bytes. (varlen) is similar to Standard MIDI File's variable length fields: for every byte, add the lower seven bits to the sum. If bit 7 is clear, you're done; if it is set, then shift the sum to the left by seven bits, get the next byte and do the same. 6. Special commands ------------------- Value $00 End of track. $01 ddccbbaa Set custom interval length (uint32) for command $7C. $02 (varies) Remap chip channels. First byte: chip number (0-15). Further bytes: remapped channel numbers. If the command is encountered a second time, the remapped channel numbers are indices into the previous remapped channel table. $03 ddccbbaa Write ddccbbaa bytes of data from the file's ROM/PCM hhggffee section, starting at offset hhggffee, to chip ii's ii RAM at offset mmllkkjj. mmllkkjj 7. ROM/PCM data --------------- (to be written) IV. Notes --------- 1. The demarcation of chip type and subtype is based on presumed similarity in terms of all subtypes sharing a common emulation core. 2. I used size arrays instead of offset arrays so that when making small changes to a file, as few fields as possible need to be updated. 3. Chip type $C1 facilitates hardware playback of 2xYM3812 files, as separate writes with identical data to the left and right chip would cause a small amount of delay, audible as spurious time-of-arrival stereophony. 4. Chip attributes 4 and 5 combine volume, panning and phase information. V. Open questions ----------------- 1. Keep GD3 for metadata, or switch to Vorbis comments? Vorbis comments would allow for ReplayGain fields, making a separate Volume Modifier field superluous. 2. Are 16 chips enough? Is a maximum of 8 data bytes per chip write enough? One could go for 32 chips with a maximum of 4 data bytes instead.