vgmrips

The forum about vgm files
It is currently 2017-07-27, 16:37:46

All times are UTC + 1 hour [ DST ]




Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: 2012-05-31, 0:51:32 

Staff Staff
Programmers Programmers
Offline
User avatar

Joined: 2012-04-22, 4:03:45
Posts: 203
Location: New York, NY, USA
My update for VGMTool is going to be cross-platform; as such, GD3 reading/writing may become a potential problem on non-Windows platforms since GD3 stores in what's apparently known as UCS2 instead of normal UTF-8 Unicode. The app should be able to handle fine on Windows, as I've been told the toolkit converts Unicode to UCS2 behind-the-scenes, but I don't know how viable that will be in the long run if we want to expand the selection of command-line tagging tools.

GD3 tag already stores version, so what if we up the version and store values in UTF-8 and update GD3 handling accordingly? Current VGM apps should handle GD3 checking v1.00 fine if they were written well, so I don't expect major breakage in current apps; the only two "older" apps I expect we'd need to worry abt after such change would be VGMTool (v3 update already in development and will be on GitHub) and VGM2MID (which I'll coordinate w/ValleyBell if needed).


Top
 Profile  
 
 Post subject:
PostPosted: 2012-05-31, 13:11:19 
Offline

Joined: 2011-11-30, 17:26:44
Posts: 454
Location: Italy
UTF-8? Bad idea. These tags contain also Japanese text, and "Japanese or Hindi could take more space in UTF-8 if there are more of these characters than there are ASCII characters", according to Wikipedia.

If we really need to move on since UCS-2 is indeed superseded, I'd suggest UTF-16, which seems to be the most natural choice.

But if you ask me, I'm fine with the way it is now. Of course I'd like to hear ValleyBell's opinion, though.

_________________
My webhost decided to shut down most of my webspace without a warning. If you find any broken Digilander link in any of my posts (basically all of them should be), please inform me.


Top
 Profile  
 
 Post subject:
PostPosted: 2012-05-31, 19:12:11 

Staff Staff
Programmers Programmers
Musicians Musicians
Contributors Contributors
Offline
User avatar

Joined: 2011-12-01, 20:20:07
Posts: 2730
Location: Germany
UTF-8 would break compatibility with all vgm programs that read and/or write tags (they either won't display any tags at all or would show garbage), so that's not an option for me. vgm2mid would behave the same way as all other tools, btw.

I doubt that the additional characters of UTF-16 will actually be used, but I'm fine with it, because it doesn't break anything.


btw: The only problem I see with command-line tools is, that they don't support Unicode characters for their arguments. ATM I use VGMTool mainly to convert Japanese characters to HTML NCRs, which is the only way to use Unicode characters in vgm_tag.


Top
 Profile  
 
 Post subject:
PostPosted: 2012-05-31, 19:15:09 
Offline

Joined: 2011-11-30, 17:26:44
Posts: 454
Location: Italy
So you basically agree with me.
Things are fine the way they are, but if we really can't avoid to change, utf-16 would be better than utf-8. Once again, great minds think alike.

tl;dr let's just keep ucs-2 :P

_________________
My webhost decided to shut down most of my webspace without a warning. If you find any broken Digilander link in any of my posts (basically all of them should be), please inform me.


Top
 Profile  
 
 Post subject:
PostPosted: 2012-05-31, 23:39:25 

Staff Staff
Programmers Programmers
Offline
User avatar

Joined: 2012-04-22, 4:03:45
Posts: 203
Location: New York, NY, USA
when i first attempted to port vgmtool to mac i had no success using wchar functions to read gd3 ucs2 strings, so i had to manually loop thru strings two bytes at a time. i'll ask byuu abt utf-16 functionality.

once again, i'd like to point out that if such a change were to be implemented (whether utf-8 or utf-16) the version would be increased to accommodate and applicable programs would be updated as a result.


Top
 Profile  
 
 Post subject:
PostPosted: 2012-06-01, 1:22:45 

Programmers Programmers
Artists Artists
Offline
User avatar

Joined: 2011-12-12, 12:43:15
Posts: 75
Tom wrote:
UTF-8? Bad idea. These tags contain also Japanese text, and "Japanese or Hindi could take more space in UTF-8 if there are more of these characters than there are ASCII characters", according to Wikipedia.

If we really need to move on since UCS-2 is indeed superseded, I'd suggest UTF-16, which seems to be the most natural choice.

Problem being... both have issues. UTF-8 causes bloat when you have Japanese characters (an issue with Japanese tags), UTF-16 causes bloat when you have ASCII characters (an issue with English tags, which literally become 200% the size they could have been). Damned if you do, damned if you don't. Shift-JIS is a good balance (ASCII characters are one byte, Japanese characters are two bytes), but doesn't work with anything that isn't ASCII or Japanese, and even then many kanjis are missing.

To be fair, tags are a minimal amount of the filesize, unless you have like hundreds of tags (in which case we have another problem), and we aren't in an era where even a KB is too much. We should probably focus more on optimizing the VGM data itself (which can easily become huge if not careful), not the tags.

_________________
Sik is pronounced like "seek", not like "sick".
http://www.mdscene.net/


Top
 Profile  
 
 Post subject:
PostPosted: 2012-06-01, 1:44:45 

Staff Staff
Programmers Programmers
Offline
User avatar

Joined: 2012-04-22, 4:03:45
Posts: 203
Location: New York, NY, USA
my proposal isn't abt "optimizing" the tags, it's abt future extensibility of what tags get stored and how to store them. maybe we want some games to have arabic or indian names stored? maybe we want to only have japanese tags stored and allow leaving out english tags entirely, or vice versa? maybe we want to add multiple release dates for games by region? and also, maybe, JUST MAYBE, we want to be able to parse the thing in a standardized cross-platform way that allows usage on non-windows systems like iphones or android phones?

(edit - for reference, here's the thread i made on byuu forums)


Last edited by neologix on 2012-06-01, 22:27:10, edited 1 time in total.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC + 1 hour [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group