ctr wrote:
The zlib compression should do a good job at reducing the file size caused by repetitive wait / chip commands.
Mmyeah, as usual I’m thinking about my MSX player…

Currently I decompress the entire file in memory so the unzipped size still matters to me. And also more bytes to process means less performance, a second 2-byte command would simply be caught by the jump table at no extra cost, while an extra port byte will require an extra buffer read, compare and branch to process. It’s not the end of the world but it’s still something that I think about.
(As for decompressing the entire file to memory, for a long time I have on my wishlist to decompress on the fly to a small buffer (and snapshot the decompressor at the loop point), but as you can imagine this is not trivial to implement.)
ctr wrote:
Dividing the sample rate (if the CPU clock is used) is already assumed, I think. Consider that the 68K and Z80 (for example) have fixed memory cycles and can't hammer memory (and thus sound chips) every cycle anyway.
What do you mean by assumed?
The Z80 has a variable number of cycles per instruction, indeed never less than 4 cycles, but usually not a multiple of 4, so it’s not quantisable.