Building a VGM file index

Post by **vampirefrog** » 2016-07-03, 10:56:00

I'm working on an indexing structure for VGM files, which will be used later on to display VGM files graphically, and other operations.

Let's begin by defining its functionality:

Input params:
- Command index (Nth command in the file)
- Sample number

Output params:
- Command index
- Sample number
- Command offset (relative to file start or VGM data start)

The way I see it, the commands are separated in chunks of equal number of commands. These are the leaf nodes of a tree (as opposed to branch nodes - leaf nodes are the furthest from the root, and they have no child nodes). The tree is built on the criteria of halving the number of commands for every node, and store the sample number and the command index at each node. So let's walk through a build of the tree:

Code: Select all


Samples:                     43000
|--------------------------------------------------------------|
Commands:                    1000

             20000                           23000
|---------------------------|----------------------------------|
              500                               500

       15000          5000            18000             5000
|-----------------|---------|----------------------|-----------|
         250          250              250              250

  7000     8000    2000 3000  5000       13000      1000  4000
|-------|---------|----|----|--------|-------------|----|------|
   125       125   125  125    125          125      125  125

(visual scale is in samples)

We can go on and on until we hit some limit parameters, such as maximum tree depth or minimum commands per leaf node. The way we choose these parameters is by doing a benchmark of tree depth vs max commands, run it on a large number of VGM files (raw and trimmed), and finding the combination with best compromise between performance and memory use.

The reason I've chosen to have a similar number of commands on each node is because we are trying to minimise the number of commands to send to a chip before reaching the desired state. So for example, if we are trying to get the chip to the state at sample 53200, and the closest leaf node to it is at 53000, we have to load the chip state from that leaf node, and run the remaining commands through the chip code. So we are also storing the chip state at every leaf node.

As you can see, we are halving the number of VGM commands with each subdivision. Every node in the tree stores its sample number and command id and file offset. To find the command at a given sample, we go through each node of the tree, and see whether we are left or right of the split point, then recurse on left or right children respectively.

Post by **vampirefrog** » 2016-07-03, 11:54:44

Here is a preliminary header file.

Code: Select all

#ifndef vgmindex_h_
#define vgmindex_h_

#include "vgmfile.h"
#include "vgmparser.h"

typedef struct _VGMIndex VGMIndex;
struct _VGMIndex {
   VGMFile *f;
   VGMParser *parser;
   int max_depth, max_cmds;
   VGMIndexNode *top;
};

VGMIndex *vgm_index_new();
void vgm_index_destroy(VGMIndex *);

void vgm_index_build(VGMIndex *, VGMFile *f, int max_depth, int max_cmds);

// will retrieve leaf node data, so the output sample/cmd_num might not be the same as input
void vgm_index_get_by_sample(VGMIndex *, uint64_t sample, uint64_t *cmd_num_out, uint64_t *sample_out, uint64_t *offset_out);
void vgm_index_get_by_cmd_num(VGMIndex *, uint64_t cmd_num, uint64_t *cmd_num_out, uint64_t *sample_out, uint64_t *offset_out);

typedef struct _VGMIndexNode VGMIndexNode;
struct _VGMIndexNode {
   uint64_t sample;
   uint64_t cmd_num;
   uint64_t offset;
   void *chip_state; // replace with proper chip state instance

   int depth;
   VGMIndexNode *parent, *top;
   VGMIndexNode *left, *right;
};

#endif /* vgmindex_h_ */