матрёшка / matroska

English Deutsch Français 日本語 Español Português Русский Nederlands Italiano
Matroska Logo
Matroska Logo

Overhead

One of the way to compare containers is to analyze the overhead produced for the same raw data. The overhead is the amount of data added to these raw data, due to the internal structure of the format.

This document is only giving the overhead introduced by matroska. It is intended for the matroska team to better tune the format for efficiency. And also specify where matroska is best suited.

Example 1 - low bitrate audio

Low bitrate audio (like speex or vorbis) is usually using very small packets of data, all independent ones (no reference to previous or future frames). The bitrate can be as low as 8kbps (to 64kbps). For 8kbps you have 1kBps. If you cut that into 20ms parts, that makes approximately 100 bytes (to 1000 bytes) per part (packet). Low bitrate codec are usually tuned for real-time conferencing on limited bandwidth lines, so the VBR aspect is usually limited. That's why we will consider that bitrate to be constant, ie the size of packets.

bitratepkt sizeNo lacing2 in lace3 in lace4 in lace5 in lace6 in lace7 in lace
8kbps1001+1+4+0+100 (6%)1+2+4+2+200 (4.5%)1+2+4+3+300 (3.3%)1+2+4+4+400 (2.8%)1+2+4+5+500 (2.4%)1+2+4+6+600 (2.2%)1+2+4+7+700 (2%)
16kbps2001+2+4+0+200 (3.5%)1+2+4+2+400 (2.3%)1+2+4+3+600 (1.7%)1+2+4+4+800 (1.4%)1+2+4+5+1000 (1.2%)1+2+4+6+1200 (1.1%)1+2+4+7+1400 (1%)
24kbps3001+2+4+0+300 (2%)1+2+4+3+600 (1.8%)1+2+4+5+900 (1.3%)1+2+4+7+1200 (1.2%)1+2+4+9+1500 (1.1%)1+2+4+11+1800 (1%)1+2+4+13+2100 (0.95%)
32kbps4001+2+4+0+400 (1.8%)1+2+4+3+800 (1.3%)1+2+4+5+1200 (1%)1+2+4+7+1600 (0.88%)1+2+4+9+2000 (0.8%)1+2+4+11+2400 (0.67%)1+2+4+13+2800 (0.71%)
48kbps6001+2+4+0+600 (1%)1+2+4+4+1200 (0.92%)1+2+4+7+1800 (0.78%)1+2+4+10+2400 (0.71%)1+2+4+13+3000 (0.75%)1+2+4+16+3600 (0.64%)1+2+4+19+4200 (0.62%)
64kbps8001+2+4+0+800 (0.88%)1+2+4+5+1600 (0.75%)1+2+4+9+2400 (0.67%)1+2+4+13+3200 (0.63%)1+2+4+17+4000 (0.60%)1+2+4+21+4800 (0.58%)1+2+4+26+5600 (0.59%)

explanations : This is the overhead introduced by the Block level. There is more overhead in a matroska file, ie the Cluster head (that contains many Blocks) and the file header (containing various meta information).

the first number is the number is the size of the total Block+Data, ie element ID (1), size (1 to 8), Block head (4), lacing (0 to infinite), data

the second number is the pourcentage of overhead for each packet

In matroska, synchronisation and error recovery is done with a Cluster. A cluster contains 1 to many Blocks. So let's see the effect of having 1 Block per Cluster, 2 Blocks per Cluster and 3 Blocks per Cluster. These are the worst cases where synchronisation is required often, ie every 20, 40 and 60ms (with 1 block only). A value of 1s is a pretty agressive case and is the minimum supported here (just to save me from a few calculus).

1 Block per Cluster
bitratepkt sizeNo lacing2 in lace3 in lace4 in lace5 in lace (1s)6 in lace (1.2s)7 in lace (1.4s)
8kbps100    4+2+1+2+4+5+500 (3.6%)4+2+1+2+4+6+600 (3.2%)4+2+1+2+4+7+700 (2.9%)
16kbps200    4+2+1+2+4+5+1000 (1.8%)4+2+1+2+4+6+1200 (1.6%)4+2+1+2+4+7+1400 (1.4%)
24kbps300    4+2+1+2+4+9+1500 (1.5%)4+2+1+2+4+11+1800 (1.3%)4+2+1+2+4+13+2100 (1.2%)
32kbps400    4+2+1+2+4+9+2000 (1.1%)4+2+1+2+4+11+2400 (1%)4+2+1+2+4+13+2800 (0.93%)
48kbps600    4+2+1+2+4+13+3000 (0.87%)4+2+1+2+4+16+3600 (0.81%)4+2+1+2+4+19+4200 (0.76%)
64kbps800    4+2+1+2+4+17+4000 (0.75%)4+2+1+2+4+21+4800 (0.71%)4+2+1+2+4+26+5600 (0.70%)

 

2 Blocks per Cluster
bitratepkt sizeNo lacing2 in lace (0.8s)3 in lace (1.2s)4 in lace (1.6s)5 in lace (2s)6 in lace (2.4s)7 in lace (2.8s)
8kbps100 4+2+2*(1+2+4+2+200) (6%)4+2+2*(1+2+4+3+300) (4.3%)4+2+2*(1+2+4+4+400) (3.5%)4+2+2*(1+2+4+5+500) (3%)4+2+2*(1+2+4+6+600) (2.7%)4+2+2*(1+2+4+7+700) (2.4%)
16kbps200 4+2+2*(1+2+4+2+400) (3%)4+2+2*(1+2+4+3+600) (2.2%)4+2+2*(1+2+4+4+800) (1.8%)4+2+2*(1+2+4+5+1000) (1.5%)4+2+2*(1+2+4+6+1200) (1.1%)4+2+2*(1+2+4+7+1400) (1%)
24kbps300 4+2+2*(1+2+4+3+600) (2.2%)4+2+2*(1+2+4+5+900) (1.7%)4+2+2*(1+2+4+7+1200) (1.4%)4+2+2*(1+2+4+9+1500) (1.3%)4+2+2*(1+2+4+11+1800) (1.2%)4+2+2*(1+2+4+13+2100) (1.1%)
32kbps400 4+2+2*(1+2+4+3+800) (1.6%)4+2+2*(1+2+4+5+1200) (1.3%)4+2+2*(1+2+4+7+1600) (1.1%)4+2+2*(1+2+4+9+2000) (0.95%)4+2+2*(1+2+4+11+2400) (0.88%)4+2+2*(1+2+4+13+2800) (0.82%)
48kbps600 4+2+2*(1+2+4+4+1200) (1.2%)4+2+2*(1+2+4+7+1800) (0.94%)4+2+2*(1+2+4+10+2400) (0.83%)4+2+2*(1+2+4+13+3000) (0.77%)4+2+2*(1+2+4+16+3600) (0.72%)4+2+2*(1+2+4+19+4200) (0.69%)
64kbps800 4+2+2*(1+2+4+5+1600) (0.94%)4+2+2*(1+2+4+9+2400) (0.79%)4+2+2*(1+2+4+13+3200) (0.72%)4+2+2*(1+2+4+17+4000) (0.68%)4+2+2*(1+2+4+21+4800) (0.65%)4+2+2*(1+2+4+26+5600) (0.64%)

 

3 Blocks per Cluster
bitratepkt sizeNo lacing2 in lace (1.2s)3 in lace (1.8s)4 in lace (2.4s)5 in lace (3s)6 in lace (3.6s)7 in lace (4.2s)
8kbps100 4+2+3*(1+2+4+2+200) (5.5%)4+2+3*(1+2+4+3+300) (4%)4+2+3*(1+2+4+4+400) (3.3%)4+2+3*(1+2+4+5+500) (2.8%)4+2+3*(1+2+4+6+600) (2.5%)4+2+3*(1+2+4+7+700) (2.3%)
16kbps200 4+2+3*(1+2+4+2+400) (2.8%)4+2+3*(1+2+4+3+600) (2%)4+2+3*(1+2+4+4+800) (1.6%)4+2+3*(1+2+4+5+1000) (1.4%)4+2+3*(1+2+4+6+1200) (1.3%)4+2+3*(1+2+4+7+1400) (1.1%)
24kbps300 4+2+3*(1+2+4+3+600) (1.8%)4+2+3*(1+2+4+5+900) (1.3%)4+2+3*(1+2+4+7+1200) (1.2%)4+2+3*(1+2+4+9+1500) (1.1%)4+2+3*(1+2+4+11+1800) (1%)4+2+3*(1+2+4+13+2100) (0.95%)
32kbps400 4+2+3*(1+2+4+3+800) (1.5%)4+2+3*(1+2+4+5+1200) (1.2%)4+2+3*(1+2+4+7+1600) (1%)4+2+3*(1+2+4+9+2000) (0.9%)4+2+3*(1+2+4+11+2400) (0.83%)4+2+3*(1+2+4+13+2800) (0.79%)
48kbps600 4+2+3*(1+2+4+4+1200) (0.92%)4+2+3*(1+2+4+7+1800) (0.78%)4+2+3*(1+2+4+10+2400) (0.71%)4+2+3*(1+2+4+13+3000) (0.75%)4+2+3*(1+2+4+16+3600) (0.64%)4+2+3*(1+2+4+19+4200) (0.62%)
64kbps800 4+2+3*(1+2+4+5+1600) (0.88%)4+2+3*(1+2+4+9+2400) (0.75%)4+2+3*(1+2+4+13+3200) (0.69%)4+2+3*(1+2+4+17+4000) (0.65%)4+2+3*(1+2+4+21+4800) (0.63%)4+2+3*(1+2+4+26+5600) (0.63%)

explanations : the third number is the max time between 2 error recoveries (re-sync) in milliseconds. This is based on the example of all frames have a 20ms granularity.

Conclusions

Example 2 - medium bitrate audio

Example 3 - high bitrate audio

Example 4 - low bitrate video

Low bitrate video is, like low bitrate audio, the worst case for overhead. The range we will use here is 64 kbps to 256kbps. I think these are reasonable values for what can be considered as low video bitrate.

One of the major difference in video is that the number of frame per second is fixed (10 to 30 for low bitrates).

Another difference is that the frames usually depend on others to save some compression bits. One of the effect is that all frames don't have the same size. You have key frames (I) that don't depend on other frames, (P) frames that depend on an older frame and (B) frames that rely on an older and a future frame.

It is mandatory that all I frames have an actual timecode coming from the container, because the P and B frames will rely on that timecode for reference.

First example - all I frames (CBR)

With all I frames, the average data is easy to compute. For 64kbps=8kBps we have a frame=1/20s. That means each frame is 400 bytes big. So the results are similar to 32kbps audio.

Lacing should not be used for better seeking in the file. But it is still possible for use in this particular case.

1 I frame/Block per Cluster (5ms)
bitratepkt sizeNo lacing2 in lace3 in lace
64kbps4004+2+1+2+4+400 (3.3%)4+2+1+2+4+3+800 (2%)4+2+1+2+4+5+1200 (1.5%)
128kbps8004+2+1+2+4+800 (1.63%)4+2+1+2+4+5+1600 (1.1%)4+2+1+2+4+9+2400 (0.92%)
256kbps12004+2+1+2+4+1200 (1.08%)4+2+1+2+4+6+2400 (0.79%)4+2+1+2+4+11+3600 (0.67%)

 

2 I frame/Block per Cluster (10ms)
bitratepkt sizeNo lacing2 in lace3 in lace
64kbps4004+2+2*(1+2+4+400) (2.5%)4+2+2*(1+2+4+3+800) (1.6%)4+2+2*(1+2+4+5+1200) (1.3%)
128kbps8004+2+2*(1+2+4+800) (1.3%)4+2+2*(1+2+4+5+1600) (0.94%)4+2+2*(1+2+4+9+2400) (0.7%)
256kbps12004+2+2*(1+2+4+1200) (0.83%)4+2+2*(1+2+4+6+2400) (0.67%)4+2+2*(1+2+4+11+3600) (0.58%)

 

4 I frame/Block per Cluster (20ms)
bitratepkt sizeNo lacing2 in lace3 in lace
64kbps4004+2+4*(1+2+4+400) (2.1%)4+2+4*(1+2+4+3+800) (1.4%)4+2+4*(1+2+4+5+1200) (1.1%)
128kbps8004+2+4*(1+2+4+800) (1.06%)4+2+4*(1+2+4+5+1600) (0.84%)4+2+4*(1+2+4+9+2400) (0.73%)
256kbps12004+2+4*(1+2+4+1200) (0.71%)4+2+4*(1+2+4+6+2400) (0.60%)4+2+4*(1+2+4+11+3600) (0.54%)

 

8 I frame/Block per Cluster (40ms)
bitratepkt sizeNo lacing2 in lace3 in lace
64kbps4004+2+8*(1+2+4+400) (1.9%)4+2+8*(1+2+4+3+800) (1.3%)4+2+8*(1+2+4+5+1200) (1.06%)
128kbps8004+2+8*(1+2+4+800) (0.97%)4+2+8*(1+2+4+5+1600) (0.80%)4+2+8*(1+2+4+9+2400) (0.73%)
256kbps12004+2+8*(1+2+4+1200) (0.65%)4+2+8*(1+2+4+6+2400) (0.57%)4+2+8*(1+2+4+11+3600) (0.52%)

We clearly see that there is another more interresting table (lacing use is dropped) :

bitratepkt size1 Block/C (50ms)2 Blocks/C (100ms)4 Blocks/C (200ms)8 Blocks/C (400ms)16 Blocks/C (800ms)20 Blocks/C (1s)
64kbps4004+2+1*(1+2+4+400) (3.3%)4+2+2*(1+2+4+400) (2.5%)4+2+4*(1+2+4+400) (2.1%)4+2+8*(1+2+4+400) (1.9%)4+2+16*(1+2+4+400) (1.8%)4+2+20*(1+2+4+400) (1.8%)
128kbps8004+2+1*(1+2+4+800) (1.63%)4+2+2*(1+2+4+800) (1.3%)4+2+4*(1+2+4+800) (1.06%)4+2+8*(1+2+4+800) (0.97%)4+2+16*(1+2+4+800) (0.92%)4+2+20*(1+2+4+800) (0.91%)
256kbps12004+2+1*(1+2+4+1200) (1.08%)4+2+2*(1+2+4+1200) (0.83%)4+2+4*(1+2+4+1200) (1.06%)4+2+8*(1+2+4+1200) (0.65%)4+2+16*(1+2+4+1200) (0.61%)4+2+20*(1+2+4+1200) (0.61%)

Conslusions :

Second example - 1 I frame for 1 B frame

In this case, each frame has to be separated in a Block (no lacing possible). The reference timecode in matroska (of the previous frame) is written in a separate element ([BlockAddition][TimecodeReference]). So more overhead is introduced.

We will establish that the P frame is 3x smaller than the I frame (I hope this is a realistic case). That means that, at 20 frames per second, for 64 kbps the I frame is 600 bytes and the B frame is 200 bytes (2 frames are 2*400 bytes big).

bitratepkt size2 Blocks/C (100ms)4 Blocks/C (200ms)8 Blocks/C (400ms)16 Blocks/C (800ms)20 Blocks/C (1s)
64kbps600/2004+2+(1+2+4+600)+(1+2+4+200+1+1+2) (3%)4+2+2*((1+2+4+600)+(1+2+4+200+1+1+2)) (2.6%)4+2+4*((1+2+4+600)+(1+2+4+200+1+1+2)) (2.4%)4+2+8*((1+2+4+600)+(1+2+4+200+1+1+2)) (2.3%)4+2+10*((1+2+4+600)+(1+2+4+200+1+1+2)) (2.3%)
128kbps1200/4004+2+(1+2+4+1200)+(1+2+4+400+1+1+2) (1.5%)4+2+2*((1+2+4+1200)+(1+2+4+400+1+1+2)) (1.3%)4+2+4*((1+2+4+1200)+(1+2+4+400+1+1+2)) (1.2%)4+2+8*((1+2+4+1200)+(1+2+4+400+1+1+2)) (1.1%)4+2+10*((1+2+4+1200)+(1+2+4+400+1+1+2)) (1.1%)
256kbps1800/6004+2+(1+2+4+1800)+(1+2+4+600+1+1+2) (1.01%)4+2+2*((1+2+4+1800)+(1+2+4+600+1+1+2)) (0.88%)4+2+4*((1+2+4+1800)+(1+2+4+600+1+1+2)) (0.81%)4+2+8*((1+2+4+1800)+(1+2+4+600+1+1+2)) (0.78%)4+2+10*((1+2+4+1800)+(1+2+4+600+1+1+2)) (0.78%)

Conslusions :

Example 5 - medium bitrate video

Example 6 - high bitrate video

Example 7 - low bitrate video + low bitrate audio

Example 8 - medium bitrate video + medium bitrate audio

Example 9 - high bitrate video + high bitrate audio