Object-Based Variable-Rate Encoders/Decoders
The longer-term interest in MPEG-4, however, is the development of object-based variable- rate encoders/decoders. The objective is to deliver variable-rate constant-quality encoding/decoding. What do we mean by object coding? MPEG-4 Version II December 1999 (in parallel with 3GPP Release 99) described a standardized way of moving media objects within a coordinate system. You can have audio objects or video objects. Video images are split into component parts—for instance, a person, a chair, and a table. Atable is a primitive object. Acompletely still person is a primitive object. Aperson dancing on a table is a complex object and will incur a faster coding rate. Because MPEG-4 describes the coordinates within which an object moves, we can standardize motion estimation, motion prediction, and motion compensation techniques. An object moving across a background only changes if it deforms, moves into shadow, or rotates. We can predict the axis and direction of movement of an object and reconstruct the movement as a rendering instruction in the decoder. The direction of travel is known as the optic flow axis. This means that objects can be manipulated on arrival: We can translate, warp, or zoom objects, we can use transforms (processing algorithms) to change the geometric or acoustical properties of objects, and we can turn audio objects into (three-dimensional) surround sound. (We may not want to do this, but it’s nice to know that we can.) Thus, in the same way that we can render text in the decoder, we can render audio and video objects. The technique is sometimes described as mesh coding and borrows memory processing and algorithm prediction technology from the game console software development world. What we are trying to achieve is an increase in the apparent bandwidth available to us in the handset; we can send a small amount of information to and from the handset but turn it into an (apparently) large amount of information by using local processor bandwidth to render and post-process the content. For example, we might choose to store a generic face in the handset. The encoder has to encode a face, but in practice it only encodes the differences between the face it is seeing (the image stream from the CMOS imaging platform) and the reference face in the encoder (which is the same as the generic reference face in the decoder). The generic face will also be expressionless, so the encoder needs to send difference and animation parameters. The ability to manage objects within a coordinate system also means we can provide motion compensation. Motion compensation can be used to code out camera shake. The problem with camera shake is that it increases entropy. The codec perceives a rapidly shaking image and tries to encode the movement. Motion compensation can cancel out the movement prior to encoding—and therefore reduce the encoder rate and improve the quality of the video.
116 times read
|
Related news
|
| No matching news for this article |
|
Did you enjoy this article?
(total 0 votes)
|