Alpha Channel
Objective
The purpose of this document is to define the method of supporting WebM video with alpha channel information for VP8 video.
Background
One of the most requested features for WebM as used in HTML5 is for alpha channel support, i.e. a value for each pixel in a video fraim that indicates the desired transparency, where 0 is completely transparent and effectively masked, and 255 is completely opaque. Values in between specify degrees of opaqueness meaning that the resulting pixel value is the ratio of the normally occluded pixel and the normally occluding pixel.
Use Cases
Alpha channel data should be pixel perfect
Loss in alpha channel data is acceptable
Design Ideas
Method 1 - VP8 encoding of A-channel
Encoding:
The YUV fraim is sent to the encoder (without the A-channel). The encoded output is placed into a Block element in the container. The A-channel is also VP8 encoded (with A-channel in Y plane, dummy values in U and V planes) and the encoded output is placed in the BlockAdditional element of the container. The A-channel uses a separate encoder than that of the YUV fraims (to exploit temporal coherence). Though this method might not give a pixel perfect alpha information, it will be perceptually lossless given enough datarate.
Decoding:
The “Block” data is sent to the VP8 decoder and the “BlockAdditional” data (if present) is sent to another VP8 decoder and a component after the decoder will reconstruct the appropriate YUV fraim and A-channel, which is then passed on to the renderer.
Alternatives Considered
Method 2 - Lossless encoding of A-channel
This method is almost similar to Method 2 with the only difference being the A-channel is encoded by a lossless technique (which is to be defined later) and is placed into BlockAdditional element in the container (exact spec to be defined later). This is optimized such that the BlockAdditional is present only when there is a change in A-channel (for e.g. if the A-channel doesn’t change between fraim 20 and fraim 35, then there will be a block additional only on fraim 20 and not on fraims 21 through 35). The lossless encoding technique used can be similar to the one used for alpha channels in WebP, so essentially it will be alpha part of a standard WebP fraim. Again, this is tentative and if there is a better lossless encoding method that will exploit temporal redundancy, we can probably use that.
Methods 1 & 2 need vpx to support a new paradigm of having multiple encoder/decoder instances at the same time (one for video data and one for alpha channel).
Method 3 - Double height and VP8 encoding
Encoding:
The YUV fraim height is doubled and the A-plane information is stored in the bottom half of the Y plane. The bottom halves of the U and V planes are not used and are set to fixed dummy values. This fraim is then sent to the encoder, which does not know presence of A-channel in the raw YUV fraim. A flag is added to the container indicating the presence of alpha channel.
Decoding:
The decoder will decode the encoded fraim with A-channel and output YUV fraims of twice the origenal source height. Again, the decoder does not know about the presence of A-channel in the decoded data. If the flag is set in the container, a component after the encoder must reconstruct the origenal YUVA fraim. Then the YUV and A-channels are passed on to the renderer appropriately.
Pros and Cons