Slide 6.4: Unix tools for sequential processing
Slide 6.6: Record modifications
Home

Data Compression


Data compression is the coding of data to save storage space or transmission time. For example, run-length encoding replaces strings of repeated characters (or other units of data) with a single character and a count. There are many compression algorithms and utilities. The standard Unix compression utility is called compress though GNU's superior gzip has largely replaced it. Other compression utilities include pack, zip and PKZIP. The following procedure shows how to compress and decompress a file named Sequential.txt by using Unix compression tool: gzip and decompression tool: gunzip.

shell> cat Sequential.txt
   File Structures:     An Object-Oriented Approach with C++ |0201874016|94.80|360|
    Learning WML   &   WMLScript |1565929470|17.48|12|
         XML in a Nutshell,     2nd Edition|0596002920|39.95|39|
     Java and XSLT |0596001436|26.37|890|
   WAP Servlets:   Developing Dynamic Web Content With Java and WML|047139307|32.99|4|
      WAP Development with WML and WMLScript|0672319462|18.99|56|
   Advances in Security and Payment Methods for Mobile Commerce|1591403456|89.95|182|
   M Commerce:   Technologies, Services, and Business Models |0471135852|23.09|5|
    Mobile       Commerce                 |0521797561|29.51|93|
   Dynamic WAP Application Development     |1930110081|34.59|18|

shell> ls -l Sequential.txt
-rw-r--r--   1 wenchen  faculty     665 Feb 12 14:19 Sequential.txt

shell> gzip  Sequential.txt
shell> ls -l Sequential.txt.gz
-rw-r--r--   1 wenchen  faculty     457 Feb 12 14:19 Sequential.txt.gz

shell> cat Sequential.txt.gz
   ØBSequential.txt]ËnÛ0E÷þYuTàKvç8í¢°  lej3)C¢]ïq´H ¹wæ\êÚ87köúÂZ7ùceUZeM-HÖ
6c  iH*Öäzå~ _§8ïqîógzøÞûèÇ@¢´Ê*f [ò;vç:î{m×/píFW¤ªB×ÔX^l»|§ó1c>âñ6}¼
    îàlq«yC mâ¼3  SKmµ¨I«ÂZ2yñd{µ:$]Î'¡_eà$ªZiiM¥H6I]V´Xöç.8vî4ùxɺ
  çîÍ6÷c?Ãïq͸Kw9rV¡ ÿG®dm벤lQJ²·X9"ïºtÿÄÕSRF6EiÙ.¯Èשõi

shell> gunzip Sequential.txt.gz

Compressed data must be decompressed before it can be used. When compressing several similar files, it is usually better to join the files together into an archive of some kind (using tar for example) and then compress them, rather than to join together individually compressed files.