Let me begin by defining the concepts in the title.
Data torture: Testing a large amount of data for consistency, uniformity and correctness. I have been watching DEFCON talk videos lately (it is a yearly hacker and security convention), and came across
this, so that's where I got the name.
Redundancy failure points: A pair of places in the data, where you should have the exact same value, but there are differences. An example is the song list in the txt file which has a song with a different name than the english name in the GD3 tag in the referenced vgm file.
Normalization is when the structure of a database is optimized so that the data has no redundancy. That means that for a song in a pack, its title is stored in one place and one place only (one field in a database table). It also means that when you edit that song's name (maybe you made a mistake and want to correct it), it propagates to everywhere (the web interface,
I've been torturing the vgmrips data recently. The data sources are as follows:
1. The phpbb database, where I grab every topic from the "Official Releases" forum. There, I parse the [table] code for an initial data set, but which remains largely unused. All I use is the zip file link and the images URLs. I could also import the data in the table, but I didn't view it as necessary. But it's a valid torture point.
2. The text file in the zip. I read all of the info in the text file: Game name, System, Music hardware, Music author, Game developer and so on, the song list, with length and loop length, Notes, Package history and even size reductions.
3. The VGM files in the zip - the header and the GD3 tags.
4. The m3u files in the zip.
In this data, several points contain redundant data. For example, the english name in the GD3 tag should be exactly the same as the song listing in the txt file.
Next up, I'll list where the inconsistencies take place.