Sunday, October 03, 2010

About Google's new WebP photo format

Google has recently released a new lossy photo format called WebP. The claim is that it compresses photos more effectively than JPG, thus reducing the file size, and yet the resulting file could be mistaken for a JPG file in terms of quality. Sometimes, size gains of about 40% are claimed. How much of this is actually an advantage of WebP?

Generally speaking, there are several problems with this type of assertion. First, over time, it has become clear to me that several sources of JPG files do a terrible compression job. Usually, the problem is that, after the lossy image representation is derived, this lossy representation has to be packed losslessly to produce the JPG file, and the lossless compression method is not very good. For example, let's take Photoshop. If I save an "optimized baseline" medium quality JPG file of the mentoring course book's cover, I get a 194kb file. If I look into it with some sort of hex editor, I see several sections that are obviously not compressed very well. To prove the point, if I compress Photoshop's JPG 194kb file with rar, I get a 108kb archive. Similarly, digital cameras typically produce huge JPG files that, upon inspection, look as if the firmware was prioritizing coding speed over coding efficiency. In short, some JPG files are compressible enough that WebP's file size advantage may be a result of poor internal JPG lossless compression. But then, why not fix JPG's lossless compression?

Moreover, Google used specific JPG encoder libraries in their benchmarks. How do you know these are coding the image effectively? For instance, I remember that an old program called Image Alchemy had a normal JPG mode, and an "optimized Huffman" mode that regularly produced smaller files than the normal mode. The "Huffman optimization" was a secondary pass over the already lossy representation, so this optimization did not result in additional information loss. In addition, JPG provides for arithmetic coding, which should result in smaller files. However, arithmetic coding is not always used for compatibility reasons. And, even if it were, arithmetic coding's representation efficiency is critically dependent on the compression model driving the probability predictions. When Google compares effectively random JPG files from the web with their WebP counterparts, how much of the comparison is between a specific JPG encoder library and WebP, as opposed to between JPG's intrinsic efficiency and WebP's intrinsic efficiency? In the case of JPG with arithmetic coding, how much of the comparison is between a (probably unsophisticated) probability model driving the arithmetic coder and WebP's compression format?

Sometimes, recompressing a JPG file with a more efficient JPG compressor makes a huge difference. I have personally seen 7mb JPG Photoshop files go down to 1mb JPG files that cannot be distinguished from the original. Part of the problem with recompressing JPG files is, well, how do you know that the first JPG compression pass did not make it easier for the second compressor to make a smaller file? Would different compressors produce significantly different files if they started with the same original photo? This is a problem because Google used WebP to recompress existing JPG files. It would have been more interesting to, say, obtain a significant sample of photos stored in raw format first, and then compare the results of packing the raw files with JPG and WebP. As it is presented, Google's presentation of WebP in contrast with JPG is not a true apples to apples comparison.

Finally, different JPG encoders have varying ideas of what "quality" means. For instance, if quality in encoder A is specified with a number between 0 and 100, is using 30 equivalent to encoder B's quality of 3 in a scale from 0 to 10? How do the "quality" settings of various JPG encoders compare to that of WebP? If this is not known, then what does it mean when Google claims WebP produces smaller files? Similarly, I do not think there is a clear notion of what "photo quality" means. How can you tell what JPG's SNR is compared to WebP's SNR if you do not have access to the original photo? And if you do not have SNR information, how do you know that WebP is doing a better encoding job, and how do you assess the smaller file size claim? For instance, the chroma information in the sample photo with the guy against the blue background is obviously very different in WebP. Why? Also, note that SNR is not the only way to measure picture quality. What about the psychovisual enhancements provided by e.g.: DivX and x264? Those can make photos and video (I get to say video because WebP is derived from VP8) look better, even if the PSNR is lower.

For these reasons, it's not clear to me that WebP is necessarily all that it's claimed to be. And if you cannot tell, then why is the introduction of another compression format preferable over providing improved JPG coders that produce results similar to WebP? For example, I took one of Google's comparison JPG photos and repacked it with a more efficient JPG encoder. I got a file size essentially equal to that of WebP. Why bother with WebP if the same results can be achieved with an existing format?

I wish Google would provide a more in depth analysis in its WebP page. I do not mean to imply that WebP is not a better encoding mechanism than JPG. Given that it has the benefit of ~20 years of research compared to JPG's original 1990s specification, it probably is better. However, the main claim of WebP is that it produces smaller files thus alleviating the problem of transmitting JPG files over the internet, and I have not seen enough evidence to support this claim other than "we recompressed a random sample of photos already compressed with random JPG packers and we got smaller files". Sure, but you could have achieved that with a better JPG coder or even gzip. Thus, I can't help wondering whether the format is really an attempt to further popularize their VP8 video codec, from which WebP is derived...

Speaking of video codecs, one of JPG's criticisms is that it introduces blockiness because it uses 8x8 chunks to compress the photo. WebP seems to handle blockiness better. But WebP is derived from a video codec, and many modern video codecs do deblocking when decompressing. If WebP is using a deblocker, then the comparison with JPG is further suspect. In other words, if you added a deblocker to your standard JPG decoder, would you achieve results comparable to that of WebP? Without technical details, how do you know?

For more details, see for example here (and make sure to read the comments!). Also, for an analysis of WebP seen as a VP8 I-frame, see here. Ouch!

Meanwhile, if the goal is to speed up the web, could we please have an HTTP extension such that loading a website requires only one request? Certainly, opening numerous connections for every single page element is going to introduce round trip latencies and other problems such as suboptimal use of frames that do not reach the MTU packet size. If we switched to a single connection, with the individual files streamed over it with some sort of tar transport, then we could run some form of even a very simple compression scheme such as v.42bis or v.44 on the stream so that all the easily compressible information is crunched on the fly for faster throughput... for compression examples, see here (although note this one does not seem to provide the tar capability) and Google's own research here.

1 comment:

pancho said...

Gracias Andres - para los neofitos, como yo, encontre este sitio con una buena explicacion sobre formatos de video:

http://diveintomark.org/archives/2009/01/08/give-part-5-constraints

- Pancho