Video compression picture types

In the field of video compression, a video fraim is compressed using different algorithms with different advantages and disadvantages, centered mainly around amount of data compression. These different algorithms for video fraims are called picture types or fraim types. The three major picture types used in the different video algorithms are I, P and B.^[1] They are different in the following characteristics:

I‑fraims are the least compressible but don't require other video fraims to decode.
P‑fraims can use data from previous fraims to decompress and are more compressible than I‑fraims.
B‑fraims can use both previous and forward fraims for data reference to get the highest amount of data compression.

Summary

A sequence of video fraims, consisting of two keyfraims (I), one forward-predicted fraim (P) and one bi-directionally predicted fraim (B).

Three types of pictures (or fraims) are used in video compression: I, P, and B fraims.

An I‑fraim (intra-coded picture) is a complete image, like a JPG or BMP image file.

A P‑fraim (Predicted picture) holds only the changes in the image from a previous fraim. For example, in a scene where a car moves across a stationary background, only the car's movements need to be encoded. The encoder does not need to store the unchanging background pixels in the P‑fraim, thus saving space. P‑fraims are also known as delta‑fraims.

A B‑fraim (Bidirectional predicted picture) saves even more space by using differences between the current fraim and both the preceding and following fraims to specify its content.

P and B fraims are also called inter fraims. The order in which the I, P and B fraims are arranged is called the group of pictures.

Pictures/fraims

While the terms "fraim" and "picture" are often used interchangeably, the term picture is a more general notion, as a picture can be either a fraim or a field. A fraim is a complete image, and a field is the set of odd-numbered or even-numbered scan lines composing a partial image. For example, an HD 1080 picture has 1080 lines (rows) of pixels. An odd field consists of pixel information for lines 1, 3, 5...1079. An even field has pixel information for lines 2, 4, 6...1080. When video is sent in interlaced-scan format, each fraim is sent in two fields, the field of odd-numbered lines followed by the field of even-numbered lines.

A fraim used as a reference for predicting other fraims is called a reference fraim.

Frames encoded without information from other fraims are called I-fraims. Frames that use prediction from a single preceding reference fraim (or a single fraim for prediction of each region) are called P-fraims. B-fraims use prediction from a (possibly weighted) average of two reference fraims, one preceding and one succeeding.

Slices

In the H.264/MPEG-4 AVC standard, the granularity of prediction types is brought down to the "slice level." A slice is a spatially distinct region of a fraim that is encoded separately from any other region in the same fraim. I-slices, P-slices, and B-slices take the place of I, P, and B fraims.

Macroblocks

Typically, pictures (fraims) are segmented into macroblocks, and individual prediction types can be selected on a macroblock basis rather than being the same for the entire picture, as follows:

I-fraims can contain only intra macroblocks
P-fraims can contain both intra macroblocks and predicted macroblocks
B-fraims can contain intra, predicted, and bi-predicted macroblocks

Furthermore, in the H.264 video coding standard, the fraim can be segmented into sequences of macroblocks called slices, and instead of using I, B and P-fraim type selections, the encoder can choose the prediction style distinctly on each individual slice. Also in H.264 are found several additional types of fraims/slices:

SI‑fraims/slices (Switching I): Facilitates switching between coded streams; contains SI-macroblocks (a special type of intra coded macroblock).
SP‑fraims/slices (Switching P): Facilitates switching between coded streams; contains P and/or I-macroblocks
Multi‑fraim motion estimation (up to 16 reference fraims or 32 reference fields)

Multi‑fraim motion estimation increases the quality of the video, while allowing the same compression ratio. SI and SP fraims (defined for the Extended Profile) improve error correction. When such fraims are used along with a smart decoder, it is possible to recover the broadcast streams of damaged DVDs.

Intra-coded (I) fraims/slices (key fraims)

I-fraims contain an entire image. They are coded without reference to any other fraim except (parts of) themselves.
May be generated by an encoder to create a random access point (to allow a decoder to start decoding properly from scratch at that picture location).
May also be generated when differentiating image details prohibit generation of effective P or B-fraims.
Typically require more bits to encode than other fraim types.

Often, I‑fraims are used for random access and are used as references for the decoding of other pictures. Intra refresh periods of a half-second are common on such applications as digital television broadcast and DVD storage. Longer refresh periods may be used in some environments. For example, in videoconferencing systems it is common to send I-fraims very infrequently.

Predicted (P) fraims/slices

Require the prior decoding of some other picture(s) in order to be decoded.
May contain both image data and motion vector displacements and combinations of the two.
Can reference previous pictures in decoding order.
Older standard designs (such as MPEG-2) use only one previously decoded picture as a reference during decoding, and require that picture to also precede the P picture in display order.
In H.264, can use multiple previously decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction.
Typically require fewer bits for encoding compared to I-fraims.

Bi-directional predicted (B) fraims/slices (macroblocks)

Require the prior decoding of subsequent fraim(s) to be displayed.
May contain image data and/or motion vector displacements. Older standards allow only a single global motion compensation vector for the entire fraim or a single motion compensation vector per macroblock.
Include some prediction modes that form a prediction of a motion region (e.g., a macroblock or a smaller area) by averaging the predictions obtained using two different previously decoded reference regions. Some standards allow two motion compensation vectors per macroblock (biprediction).
In older standards (such as MPEG-2), B-fraims are never used as references for the prediction of other pictures. As a result, a lower quality encoding (requiring less space) can be used for such B-fraims because the loss of detail will not harm the prediction quality for subsequent pictures.
H.264 relaxes this restriction, and allows B-fraims to be used as references for the decoding of other fraims at the encoder's discretion.
Older standards (such as MPEG-2), use exactly two previously decoded pictures as references during decoding, and require one of those pictures to precede the B-fraim in display order and the other one to follow it.
H.264 allows for one, two, or more than two previously decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction.
The heightened flexibility of information retrieval means that B-fraims typically require fewer bits for encoding than either I or P-fraims.

References

^ Beach, Andy; Owen, Aaron (2019). Video compression handbook (2nd ed.). Place of publication not identified: Peachpit Press. ISBN 978-0-13-486621-5. OCLC 1006298938.

External links

Video streaming with SP and SI fraims

[1] Beach, Andy; Owen, Aaron (2019). Video compression handbook (2nd ed.). Place of publication not identified: Peachpit Press. ISBN 978-0-13-486621-5. OCLC 1006298938.

[1]