A lengthy treatise on the relationship between filesize and perceived video quality: the real version

I posted a troll post on ggkthx.org since people had been hurfing durf about filesizes more than usual lately. Then I thought maybe there is someone out there who actually would like to be educated on this for real, and I’m very bored right now, so here goes nothing. (Note: this post is not for encoders as most of the other posts here; it’s intended for people with no prior encoding experience.)

When you are encoding video with x264 (or actually with any lossy encoder, audio or video), there are two basic ways to choose how big the resulting file will be: either you pick an average size (in bits) per second, or you pick an average quality (given by some arbitrary metric specified by the encoder). The former lets you predict how big the file will be, since you know how long the file is and how many bytes per second the encoder can use on average, while the latter lets you predict the average quality but not the filesize, since there’s no easy way to predict in advance how much space the encoder needs to achieve the desired quality.

Now, the thing here is that different source materials are easier than others to compress. Let’s explain this by example: one of the simplest possible compression algorithms is a run-length encoder (RLE). Basically, it works by taking a string like AAAAAABBBCCCC and compressing it to A6 B3 C4; i.e. the character followed by a number that says how many times that character should be repeated. Thus, a long string of a hundred A’s will compress into A100, which is just four characters compared to the original 100, while a string that consists of the entire alphabet in sequence will not be compressed at all (assuming the encoder optimizes the “letter appears once” case to omitting the number afterwards; otherwise the output would actually be twice as big as the input).

All compression algorithms, lossy or not, have the same property: some inputs are easier to compress than others. For video compressors, static “talking heads” scenes without much motion will generally compress a lot better than high-motion fight scenes. Big flat areas of the same color will also compress a lot better than images full of small details.

Back in the day pretty much all fansub encoders used to use the average bitrate approach to encoding, and make all episodes fit in evenly on a CD or DVD-R. These days, not many people uses that kind of backup anymore (and recordable media is so cheap anyway that you don’t really care about wasting a few hundred MB on a DVD-R), so most people have shifted over to the “average quality” mode of encoding. Thus, one episode of a given anime series may turn out at half the filesize of another, despite the average quality being the same (average quality as defined by x264’s metric; it does not necessarily match human perception perfectly).

In other words: in general, there is no relationship between filesize and perceived quality. Some series or episodes will look perfectly acceptable at less than 100MB per episode at 480p (see: Togainu no Chi). Others require >500MB per episode at 720p to achieve roughly the same perceived quality. Now shut the fuck up about filesizes and go get a better internet connection, fag (hint: Canada is a third world country).

Comments (37)

  1. brandon wrote:

    Hey, I learned something!!!

    Monday, March 7, 2011 at 08:47 #
  2. sukuiter wrote:

    The problem is people now tend to over encode.

    I saw some encode even using crf 13 while there’s no need to go that far. (read below)

    The funny thing is, I saw some fansub using crf but still use hex mode. They rely on big bitrate to make the quality look great.

    The good example would be suzumiya haruhi no shoushitsu, mazui version and coalgirls version. The size differs a lot but the quality not.

    Shouldn’t the more advanced the technology, the smaller the size?

    Lastt, please answer this question because I don’t understand. Should the crf value lowered, the higher the resolution is? From reading about crf mode and testing myself, 18 is enough(16 max) to get great result for 1080p.

    Monday, March 7, 2011 at 09:14 #
  3. Ataraxxia wrote:

    Sukuiter, one of the greatest things about CRF is that you don’t really have to use two different values when encoding different res’

    Monday, March 7, 2011 at 15:40 #
  4. Pacoup wrote:

    Well, Canada is only third world from a bandwidth perspective.

    We’ve got better and faster connections than the states.

    Monday, March 7, 2011 at 15:52 #
  5. Reply wrote:

    @Sukuiter: “Shouldn’t the more advanced the technology, the smaller the size?”
    No. Only in the case where you re-encode the same sourcefile to the same quality using the better compression algorithm. Compare encoding a CD to mp3 with encoding the same CD to vorbis or AAC at the same quality. The latter resulting in a smaller total filesize.

    The point in this article goes for all forms of encoding btw. For music a one minute orchestrated piece would result in a larger file than a one minute piece of dueling banjos of the same quality. Etc. Nice to see the article Fluff. It’s always fun to see what sense you slap out of a keyboard.

    Monday, March 7, 2011 at 16:52 #
  6. wohdin wrote:

    I find this offensive.

    Canada is not a country.

    Monday, March 7, 2011 at 17:23 #
  7. qaz wrote:

    Ataraxxia, this is not true at all: if you intend to view both encodes scaled to the same size (fullscreen) then for the same quality higher resolution needs higher CRF value (you can discard more details).

    Monday, March 7, 2011 at 17:53 #
  8. Zombiebeard wrote:

    Thanks for the explanation, seriously! I had always seen people bitch about encoding and I had no idea what in the hell they were bellyaching about.

    Monday, March 7, 2011 at 17:55 #
  9. psuedonymous wrote:

    Question: is there any time where using crf is NOT the best option (other than max streaming bitrate/target filesize constraints)? The whole “raise the quantiser in high motion scenes” thing still seems counter-intuitive to me.

    Monday, March 7, 2011 at 21:33 #
  10. Rune wrote:

    Thank you Fluff
    Now I know and knowing is half the battle

    Monday, March 7, 2011 at 22:31 #
  11. bria wrote:

    To someone who tend to explicit that what is said is “the truth”, I’ve waiting for better explanation.

    For example, where are the CBR ? Nowhere ? You know it’s pretty used in the world (streaming , hardware encoding, and so on) ?

    You compare one 480p video whith another 720p, and nothing is static, neither the codec, or the reference video, or …
    The conclusion can’t be good!

    And sorry, for exactly the same video, with the same parameters (quantization matrix, options, and so on) and codec and encoder, the bigger will be better.
    And after all that, there are trouble to over encode (you add quality, but the source, or the viewer are so “bad” that all you add is size).

    But all in one, there are so much option and addendum in h.264 that there are no ground to do a comparison without a well defined set of reference video, encoder, options, and perceptive video objective quality metrics.

    Monday, March 7, 2011 at 23:14 #
  12. Sapo wrote:

    @psuedonymous: crf is not costant quantizer.
    You may want to check the differences between crf and qp.
    Btw 2 pass is still useful if you are targeting a specific size, other than that I’d say no, crf is always better.

    Monday, March 7, 2011 at 23:48 #
  13. psuedonymous wrote:

    >crf is not constant quantiser
    I know, crf varies the quantiser, but instead of lowering it for high-motion scenes, it RAISES it. This fills my brain with fuck.

    Tuesday, March 8, 2011 at 00:53 #
  14. sukuiter wrote:

    >7
    so it’s true? the higher the resolution, the crf must be lower? I’ve tested but noticed nothing different except the filesize.

    >13
    if there’s high motion scene, the crf will be lowered, bitrate will increase, also frame per second will drop (longer encode time). I don’t know what encoder you use, at least my encoder show that stats.

    Tuesday, March 8, 2011 at 04:11 #
  15. penguin-fever wrote:

    You’ve picked up or intuited a few of the most basic tenets of a basic lossless algorithm (RLE), good.

    Explain the fail on Madoka 8. It’s neither the encoding options themselves (or the crf used), but a fault of the source (TS) and the filtering done pre-encoding. Much of the hurfing and durfing pertaining to “compression” should instead be rightly directed toward the devastating effects of piss-poor overgeneralized filtering. Depending on the filtering used, you can easily get shit at large size or gold at low size. Or anything in between. Even from a compressibility standpoint alone, different filters give vastly different compression results.

    Look at your own “detail vs flat color” generalization (a scenario which has been addressed by x264 and takes the form of psy and aq options — options which themselves spawn quite a bit of hurfing and durfing). The interplay between filters used and encoder options also matters; adding fake noise is a ridiculous waste of bitrate unless you’re addressing this with higher psy-rd (hell it’s still an ugly waste in my book). Deen and awarpsharp are the black marks on the very existence avs filtration, as we all figured out. But people are still shooting themselves in the foot by abusing anti-banding filters, denoisers, etc they don’t understand and fail to have observed in a wide enough range of conditions, sources, bitrates, and encoder options. The result? Trash (still) abounds, and the only relationship people think they can cling to is that “700MB gg madoka = good, 40mb gg star driver = bad” and lop things on a continuum.

    Jesus.

    Tuesday, March 8, 2011 at 05:12 #
  16. penguin-fever wrote:

    @7: CRF doesn’t even guarantee a constant quality. It simply allows the encoder to adjust to fluctuations in temporal complexity (per macroblock in the case of mbtree). CRF19 can look fine on some sources and like trash on others. This isn’t rocket science. There is no HVS goldmine of an algorithm that can definitively gauge perceived quality, and of course none that can provide the same level in all cases. SSIM can come close to a decent predictor when psy is taken out of the equation, but even then, running SSIM on different videos encoded at the same crf will give you some pretty wild variation. For that matter, different scenes in the same video can and will do likewise.

    The general rule with CRF and resolution is that the quality depends on the playback resolution/screen size. For two videos at different resolutions stretched to the same screen size (say, full-screen in your media player on your full HD monitor), it’s quite obvious that you can get away with a slightly higher CRF on the higher res encode from a detail standpoint only. But at their native resolutions, assuming a decent downscale for the SD content (or a relatively non-destructive station/source upscale for the HD content, grr), the same CRF would likely preserve the same amount of detail per unit of area. Of course, considering the macroblock-size-dependent artifacts, impact on mbtree (the strength of which can be raised via lower qcomp at higher resolutions due to accuracy considerations), and so forth all muddy the waters. Which of course bolsters the blogger’s point that the relationship between size and quality is far from definitive or even “real” between sources and resolutions.

    @14:
    No, crf with mbtree adaptively adjusts the quantizer on a macroblock basis depending on the pooled results of a lookahead pass. The longer a block can be accounted for via motion estimation, the lower the quantizer (and thus the greater bitrate, given blocks of similar determined variance and psy energy) is allotted the macroblock as a whole.

    It used to be frame-based qcomp (and the option is still available) which would do something close to what you mentioned, but I think you’re mixing up the effects of lowering and raising quants.

    Tuesday, March 8, 2011 at 05:29 #
  17. Odin wrote:

    Thank you.

    Tuesday, March 8, 2011 at 06:41 #
  18. psuedonymous wrote:

    @16
    >It used to be frame-based qcomp (and the option is still available) which would do something close to what you mentioned, but I think you’re mixing up the effects of lowering and raising quants
    Ah, I was going by the description on the x264 wiki, which only mentions varying the frame quantiser.

    I must still running on Full Retard though: if a macroblock can be accounted for over many frames, surely it makes sense to raise it’s quant (/drop it’s bitrate) and use those bits for blocks that come and go quickly and thus need the extra bits to not look like squares of blur? Wouldn’t you WANT to take bits from the ‘simple’ blocks (easily predicted) and give them to ‘complex’ blocks (short prediction duration), but crf does the opposite…
    Feels like I’m overlooking something basic here.

    Tuesday, March 8, 2011 at 10:13 #
  19. ri wrote:

    720p XviD is bullshit

    Tuesday, March 8, 2011 at 13:45 #
  20. Sapo wrote:

    @18
    >I must still running on Full Retard though:
    You’re just using a bad logic there.
    You want to give bitrate to macroblocks which are heavily referenced.
    If Macrobock A is referenced 10 times, a 5% increase in quality will mean a 5% increse in quality of 10 blocks (not only 1), clearly it is the superiore choice, and it’s the reasons I-frame have lower quantizers than the rest.
    Also high-motion scenes need higher quantizer, you don’t want to waste bits where artifacts are less visible (also lowering the quantizer would really waste A LOT of bits, not really something you would want)

    Tuesday, March 8, 2011 at 15:50 #
  21. TheFluff wrote:

    @11: what? I don’t think you understood the point at all.

    @14-15: I have no idea what point you think you’re trying to make exactly, but gg generally does not use any filtering at all on TV encodes. Madoka 08 intentionally looked like shit because I raised the CRF to contrast with matt’s previous 600mb CRF16 encode.

    Tuesday, March 8, 2011 at 16:07 #
  22. SyphilisForTheMasses wrote:

    More accurately :

    – Low perceived quality: weak relationship between filesize and video quality.
    – High perceived quality: strong relationship.

    Actually, there is no such thing as “big flat areas of the same color” in real world videos or animes. There is always subtle gradients here and there on areas that look flat at the first glance. If you end up with flat areas everywhere that compress nicely, that’s because you overused your denoising filter.

    When raising the video target quality, the most bit-consuming features become dithering/grain/residual noise, necessary to avoid colorbanding. Even with a high AQ strength setting, you’ll have to spend large amounts of bits in noise coding. And because of the stochastic nature of the noise, the bitrate dedicated to these features will not depend on the source.

    Wednesday, March 9, 2011 at 01:02 #
  23. TheFluff wrote:

    @22: oh hey look yet another person who completely missed the point
    also: “there is no such thing as big flat areas of the same color”, what? have you ever actually watched a cartoon?

    also also:
    “- Low perceived quality: weak relationship between filesize and video quality.
    – High perceived quality: strong relationship.”
    what does this even mean?

    Wednesday, March 9, 2011 at 11:03 #
  24. bria wrote:

    @21:
    >what? I don’t think you understood the point at all.

    It’s possible.
    What I wanted to say, is that : There are relation between size and quality, when we compare the same things.

    And to add sources to this affirmation, we can see the addendum G of the h.264 standard, “SVC” (Scalable Video Coding), where, from a base layer, you can add quality layer (and temporal/spatial resolution layer).

    Wednesday, March 9, 2011 at 23:06 #
  25. YukiS wrote:

    Another fantastic article yet I’d like to have seen more on how algorithm-related settings (reference frames, etc.) blow this correlation apart even more.

    I’m certain that a lot of people avoid downloading smaller releases when a larger one is available, regardless of the fact that said smaller release probably uses a more efficient choice of encoder settings and may provide equal or greater quality.

    However, the kind of people who do this are probably the same plebeians who use standalone players or GPU accelerated decoding which would choke on these streams anyway.

    Wednesday, March 9, 2011 at 23:19 #
  26. psuedonymous wrote:

    @20
    Ah, now THAT makes sense.

    @25
    >I’m certain that a lot of people avoid downloading smaller releases when a larger one is available, regardless of the fact that said smaller release probably uses a more efficient choice of encoder settings and may provide equal or greater quality.
    I’m often guilty of this, mainly because there’s often no other way of even coming close to of comparing release quality, other than:
    1- Word of mouth – FUCK NO.
    2- Past experience – only valid for groups with Only One Encoder Who Does Everything And Never Screws Up Ever.
    3- Compare same-timecode png screenshots of everything taken in both low-motion and high-motion scenes – Like that ever happens.

    There’s also no relation between encoding quality and translation ability, but that’s another issue entirely.

    Thursday, March 10, 2011 at 09:56 #
  27. TheFluff wrote:

    @24: except that has absolutely nothing to do with what I was talking about. I’m not talking about the same inputs and the same settings, I’m trying to explain why two different inputs can be of completely different size even though the quality is comparable.

    Thursday, March 10, 2011 at 13:01 #
  28. YukiS wrote:

    @26
    >I’m often guilty of this, mainly because there’s often no other way of even coming close to of comparing release quality.

    It’s fairly common for an excerpt from Mediainfo to be posted in the comments on a release blog and this can be helpful to some extent. It’s not too useful in the case a 2 Pass encode was used, but if both releases are CRF then the one with the lower number will be better quality, assuming the same set of filters was used. Yes, this is a big assumption, so it probably applies more to BD-rips than TV-rips.

    Thursday, March 10, 2011 at 15:12 #
  29. Anon wrote:

    I’m canadian and what is this?

    Thursday, March 10, 2011 at 18:36 #
  30. fag wrote:

    Im Canadian and I agree 100% (in respect to internet connections). Damned monopolistic service providers. Cap my fucking ass. Surprisingly, the average Canadian uses internet (in terms of length) more than the average American.

    Thursday, March 10, 2011 at 22:05 #
  31. Home_Despot wrote:

    For Canadians the point is that you can either determine the total number of Canadians to have or the quality of the average Canadian. Ultimately, the overall size of Canada has nothing to do with anything, as most of us already knew.

    A flag of monotone red and monotone white? We’re going to compress the hell out of you, you lossy Canadians!

    Seriously Fluff, I think a lot of people appreciated this. How about a weekly rant/tutorial? Think you can find a weekly rant to feature for a while or was that it?

    No Canadians allowed unless they get on topic.

    Friday, March 11, 2011 at 09:57 #
  32. LoMan wrote:

    The point of article is not about correlation actually but rather that lower file size is not always equal to a bad visual quality.

    Saturday, March 12, 2011 at 10:44 #
  33. Bullxyzt wrote:

    > in general, there is no relationship between filesize and perceived quality.

    I think this is what this post is all about, and that is WRONG.
    There is ALWAYS a relationship between filesize and perceived quality. The same video would look better if it was given more bitrate.

    Saturday, March 12, 2011 at 17:38 #
  34. Positron wrote:

    @33
    I thought it was pretty obvious that this is about different files.

    Sunday, March 13, 2011 at 00:34 #
  35. Bullxyzt wrote:

    @34
    Then there is no reason to make such a risky generalization.
    The author wanted to point out :
    Bigger filesize doesn’t necessary mean better perceived quality and somethings.
    But avoiding any relationship by saying “no relationship” is an ERROR.
    there is not that kind of relationship that many leechers think of, yes.
    generalization, no. There is other factors, you mentioned the “different files” one, there is also the encoding settings, filtering caine, codec…
    and they are NOT off topic and can’t be

    Sunday, March 13, 2011 at 17:34 #
  36. Desbreko wrote:

    @28
    Even assuming the same filtering, though, it’s not as easy as just looking at the CRF value. There are a number of other x264 settings that heavily influence perceptual quality that you also have to take into account. And on top of that, good values for those are highly dependent on what the source is like. So for someone who has a good amount of experience with x264, a MediaInfo log will at best give a vague idea of the video quality; for others, it’ll probably be more misleading than anything.

    Tuesday, March 15, 2011 at 15:36 #
  37. talm wrote:

    Canada third world country? lol~ We have internet speeds as low as 18kbps and as high as 1gbps possibly faster. I’m averaging right now 14mbps but I could have 50mbps or even a 100mbps if I believed I’d need it. I’m also an encoder, and I’d say he’s right about file size and perceived quality. A large file size is not an indication of great quality either. I could re encode a craptactular 480p file originally 80MB in size and force the encoder to encode it to a size of 700MB or even 10GB for all I care, it’s still going to be the same video unless I told it to go smaller than 80MB.

    Tuesday, April 19, 2011 at 18:02 #

Trackback/Pingback (1)

  1. […] here’s the real version in case you really really really want to read ~500 words on the […]