Data Compression
Data compression is a way of saving more data in a certain amount of computer storage, it is trying to save a quart in a pint pot! This is possible in computer systems as most data has redundancy which can be exploited to reduce its size.
The principal disadvantage is that it requires the computer to compress and decompress the data which takes time and resource. Another disadvantage is that if part of a compressed file becomes damaged it will be virtually impossible to recover any part of the file, whereas with an uncompressed file some portion of it can usually be recovered.
Lossy and Lossless Compression
Data compression techniques come in two forms: lossy and lossless.
Generally a lossy technique means that data is saved approximately rather than exactly. If two types of data look roughly alike they are deemed the same. They are then given the same code and therefore require less space to store. However when they are recovered the original differences between the versions are lost. This type of technique is mainly employed for multimedia files. Examples of the standards are JPEG for pictures and MP3 for sounds.
In contrast lossless techniques save data exactly. They look for sequences that are identical and code these. This type of compression has a lower compression rate then a lossy technique, but when the file is recovered it is identical to the original. This type of technique is used for text and database files which must be exact although techniques do exist for multimedia files. The most used lossless standard for data is the ZIP format.
Lossy vs Lossless Compression
In an ideal world you would not wish to compress files at all, and with the plummeting cost of storage this may be the way forward as you can keep the original data pristine. With uncompressed files if you lose some part of the file the rest of it can still be seen.
Lossless files also preserve the original data but anyone who has used the ZIP format to pack files will observe that the levels of compression are generally modest. Lossless formats such as PNG for pictures generally produce big files.
Sending uncompressed and lossless files over the internet is very expensive. Even with fast broadband there is still a finite amount of bandwidth available. So although storage is cheap the cost of sending the files is very high.
The lossy techniques such as JPEG for pictures can achieve high levels of compression with virtually no noticeable loss of quality. Figures vary but compression down to 10% of the original size has virtually no loss. However there are two problems:
- If an image contains very high contrast such as black lines against a white background, then JPEG artifacts - or distortion - can be seen,
- If an image is to be edited then each time it is saved in JPEG format more distortion will occur. It is better to edit the picture in lossless formats such as PNG and then save it as a JPEG when it has completed.
JPEG Standards and Cameras
A familiar example of the tradeoffs between storage and performance is the digital camera. A camera's default method of saving pictures is JPEG. As there is little loss of quality this is a good method. It allows more pictures to be saved to the memory chips as well as speeding up the recovery time between pictures. A long recovery time is because data transfer to the chips is often the slowest part of the process.
Cameras offer a number of settings such as "normal" or "fine" for the compression ratios and for most purposes using the default one selected by the camera manufacturer will work very well for family and holiday photography. If you are tempted to store more pictures on your card and select a lower quality setting you may be disappointed - and with the low cost of memory chips is it worthwhile?
Summary
Both lossless and lossy techniques have their uses in computer systems:
- For archive, editing or where professional standards demand the highest quality obtainable then either no compression or lossless techniques are used.
- Where data transmission or computer performance is important then lossy techniques are useful.
Understanding How The Techniques Work
To understand how the techniques work there are a series of articles on both techniques:
- Simple Lossless Compression describes the basics for understanding data compression including finding patterns and creating dictionaries using a text based example.
- Complex Lossless Compression part 1 and part 2 cover how to create smaller lossless files, again using the same text example.
- Lossy Compression part 1 and part 2 deal with creating smaller files using the lossy technique resulting in some alterations in the file.