[Jump to start of article]

Home > System Issues > Data Compression

Data Compression

Data compression is a way of saving more data in a certain amount of computer storage, it is trying to save a quart in a pint pot! This is possible in computer systems as most data has redundancy which can be exploited to reduce its size.

The principal disadvantage is that it requires the computer to compress and decompress the data which takes time and resource. Another disadvantage is that if part of a compressed file becomes damaged it will be virtually impossible to recover any part of the file, whereas with an uncompressed file some portion of it can usually be recovered.



Lossy and Lossless Compression

Data compression techniques come in two forms: lossy and lossless.

Generally a lossy technique means that data is saved approximately rather than exactly. If two types of data look roughly alike they are deemed the same. They are then given the same code and therefore require less space to store. However when they are recovered the original differences between the versions are lost. This type of technique is mainly employed for multimedia files. Examples of the standards are JPEG for pictures and MP3 for sounds.

In contrast lossless techniques save data exactly. They look for sequences that are identical and code these. This type of compression has a lower compression rate then a lossy technique, but when the file is recovered it is identical to the original. This type of technique is used for text and database files which must be exact although techniques do exist for multimedia files. The most used lossless standard for data is the ZIP format.



Lossy vs Lossless Compression

In an ideal world you would not wish to compress files at all, and with the plummeting cost of storage this may be the way forward as you can keep the original data pristine. With uncompressed files if you lose some part of the file the rest of it can still be seen.

Lossless files also preserve the original data but anyone who has used the ZIP format to pack files will observe that the levels of compression are generally modest. Lossless formats such as PNG for pictures generally produce big files.

Sending uncompressed and lossless files over the internet is very expensive. Even with fast broadband there is still a finite amount of bandwidth available. So although storage is cheap the cost of sending the files is very high.

The lossy techniques such as JPEG for pictures can achieve high levels of compression with virtually no noticeable loss of quality. Figures vary but compression down to 10% of the original size has virtually no loss. However there are two problems:



JPEG Standards and Cameras

A familiar example of the tradeoffs between storage and performance is the digital camera. A camera's default method of saving pictures is JPEG. As there is little loss of quality this is a good method. It allows more pictures to be saved to the memory chips as well as speeding up the recovery time between pictures. A long recovery time is because data transfer to the chips is often the slowest part of the process.

Cameras offer a number of settings such as "normal" or "fine" for the compression ratios and for most purposes using the default one selected by the camera manufacturer will work very well for family and holiday photography. If you are tempted to store more pictures on your card and select a lower quality setting you may be disappointed - and with the low cost of memory chips is it worthwhile?

Summary

Both lossless and lossy techniques have their uses in computer systems:

Understanding How The Techniques Work

To understand how the techniques work there are a series of articles on both techniques: