Skip to content

Is an update for GD3 spec in order?

Technical discussion about the VGM format, and all the software you need to handle VGM files.

Moderator: Staff

  • User avatar
  • neologix Offline
  • Posts: 211
  • Joined: 2012-04-22, 4:03:45
  • Location: New York, NY, USA

Is an update for GD3 spec in order?

Post by neologix »

My update for VGMTool is going to be cross-platform; as such, GD3 reading/writing may become a potential problem on non-Windows platforms since GD3 stores in what's apparently known as UCS2 instead of normal UTF-8 Unicode. The app should be able to handle fine on Windows, as I've been told the toolkit converts Unicode to UCS2 behind-the-scenes, but I don't know how viable that will be in the long run if we want to expand the selection of command-line tagging tools.

GD3 tag already stores version, so what if we up the version and store values in UTF-8 and update GD3 handling accordingly? Current VGM apps should handle GD3 checking v1.00 fine if they were written well, so I don't expect major breakage in current apps; the only two "older" apps I expect we'd need to worry abt after such change would be VGMTool (v3 update already in development and will be on GitHub) and VGM2MID (which I'll coordinate w/ValleyBell if needed).
  • Tom Offline
  • Ragequit Member
    Ragequit Member
  • Posts: 496
  • Joined: 2011-11-30, 17:26:44
  • Location: Italy
  • Contact:

Post by Tom »

UTF-8? Bad idea. These tags contain also Japanese text, and "Japanese or Hindi could take more space in UTF-8 if there are more of these characters than there are ASCII characters", according to Wikipedia.

If we really need to move on since UCS-2 is indeed superseded, I'd suggest UTF-16, which seems to be the most natural choice.

But if you ask me, I'm fine with the way it is now. Of course I'd like to hear ValleyBell's opinion, though.
Also known as nineko.
  • User avatar
  • ValleyBell Offline
  • Posts: 4768
  • Joined: 2011-12-01, 20:20:07
  • Location: Germany

Post by ValleyBell »

UTF-8 would break compatibility with all vgm programs that read and/or write tags (they either won't display any tags at all or would show garbage), so that's not an option for me. vgm2mid would behave the same way as all other tools, btw.

I doubt that the additional characters of UTF-16 will actually be used, but I'm fine with it, because it doesn't break anything.


btw: The only problem I see with command-line tools is, that they don't support Unicode characters for their arguments. ATM I use VGMTool mainly to convert Japanese characters to HTML NCRs, which is the only way to use Unicode characters in vgm_tag.
  • Tom Offline
  • Ragequit Member
    Ragequit Member
  • Posts: 496
  • Joined: 2011-11-30, 17:26:44
  • Location: Italy
  • Contact:

Post by Tom »

So you basically agree with me.
Things are fine the way they are, but if we really can't avoid to change, utf-16 would be better than utf-8. Once again, great minds think alike.

tl;dr let's just keep ucs-2 :P
Also known as nineko.
  • User avatar
  • neologix Offline
  • Posts: 211
  • Joined: 2012-04-22, 4:03:45
  • Location: New York, NY, USA

Post by neologix »

when i first attempted to port vgmtool to mac i had no success using wchar functions to read gd3 ucs2 strings, so i had to manually loop thru strings two bytes at a time. i'll ask byuu abt utf-16 functionality.

once again, i'd like to point out that if such a change were to be implemented (whether utf-8 or utf-16) the version would be increased to accommodate and applicable programs would be updated as a result.
  • User avatar
  • Sik Offline
  • Not a musician
    Not a musician
  • Posts: 75
  • Joined: 2011-12-12, 12:43:15

Post by Sik »

Tom wrote:UTF-8? Bad idea. These tags contain also Japanese text, and "Japanese or Hindi could take more space in UTF-8 if there are more of these characters than there are ASCII characters", according to Wikipedia.

If we really need to move on since UCS-2 is indeed superseded, I'd suggest UTF-16, which seems to be the most natural choice.
Problem being... both have issues. UTF-8 causes bloat when you have Japanese characters (an issue with Japanese tags), UTF-16 causes bloat when you have ASCII characters (an issue with English tags, which literally become 200% the size they could have been). Damned if you do, damned if you don't. Shift-JIS is a good balance (ASCII characters are one byte, Japanese characters are two bytes), but doesn't work with anything that isn't ASCII or Japanese, and even then many kanjis are missing.

To be fair, tags are a minimal amount of the filesize, unless you have like hundreds of tags (in which case we have another problem), and we aren't in an era where even a KB is too much. We should probably focus more on optimizing the VGM data itself (which can easily become huge if not careful), not the tags.
Sik is pronounced like "seek", not like "sick".
http://www.mdscene.net/
  • User avatar
  • neologix Offline
  • Posts: 211
  • Joined: 2012-04-22, 4:03:45
  • Location: New York, NY, USA

Post by neologix »

my proposal isn't abt "optimizing" the tags, it's abt future extensibility of what tags get stored and how to store them. maybe we want some games to have arabic or indian names stored? maybe we want to only have japanese tags stored and allow leaving out english tags entirely, or vice versa? maybe we want to add multiple release dates for games by region? and also, maybe, JUST MAYBE, we want to be able to parse the thing in a standardized cross-platform way that allows usage on non-windows systems like iphones or android phones?

(edit - for reference, here's the thread i made on byuu forums)
Last edited by neologix on 2012-06-01, 22:27:10, edited 1 time in total.
Post Reply