Chapter 7: File Compression — zip, tar, gzip, bzip2, xz


Two operations are often confused: compression reduces the size of a file; archiving bundles multiple files into one. Most Linux workflows do both at once — tar handles the bundling, and a compressor like gzip or xz shrinks the result. The zip format is an exception: it archives and compresses in a single step.


zip and unzip

zip is the most portable format — Windows, macOS, and Linux all handle it natively.

# Create a zip archive
zip archive.zip file1.txt file2.txt
zip -r archive.zip directory/          # -r: include subdirectories recursively

# Extract
unzip archive.zip                      # extract to current directory
unzip archive.zip -d /target/dir/      # extract to a specific directory

# List contents without extracting
unzip -l archive.zip

# Extract a single file
unzip archive.zip specific-file.txt

gzip

gzip compresses a single file — it does not bundle directories. It replaces the original file with a .gz version by default.

gzip file.txt                          # creates file.txt.gz, deletes file.txt
gzip -k file.txt                       # -k: keep the original
gunzip file.txt.gz                     # decompress
gzip -d file.txt.gz                    # same as gunzip

# Read a gzip file without decompressing
zcat file.txt.gz
zless file.txt.gz                      # page through it

gzip is fast and widely supported. It is the default compressor paired with tar.


bzip2

bzip2 provides better compression than gzip at the cost of speed. Usage is nearly identical:

bzip2 file.txt                         # creates file.txt.bz2
bzip2 -k file.txt                      # keep original
bunzip2 file.txt.bz2                   # decompress
bzcat file.txt.bz2                     # view without decompressing

When to use it: bzip2 archives are common in Linux source distributions and older package files. For new work, xz generally offers better compression at similar speed.


xz

xz offers the highest compression ratio of the three, at the cost of more CPU time and memory. It is the format used by Linux kernel source releases and many package distributions.

xz file.txt                            # creates file.txt.xz
xz -k file.txt                         # keep original
xz -d file.txt.xz                      # decompress (same as unxz)
unxz file.txt.xz                       # same
xzcat file.txt.xz                      # view without decompressing

Compression level: xz -9 for maximum compression, xz -1 for fastest. The default (-6) is a good balance.


tar: Archiving and Combining with Compression

tar (tape archive) bundles files and directories into a single .tar file. On its own it does not compress; the compression flag triggers a compressor in a pipeline.

Creating archives

# .tar.gz (most common, fast)
tar -czf archive.tar.gz directory/

# .tar.bz2 (better compression, slower)
tar -cjf archive.tar.bz2 directory/

# .tar.xz (best compression, slowest)
tar -cJf archive.tar.xz directory/

# .tar (no compression)
tar -cf archive.tar directory/

Flag mnemonic: create, zip/gzip, j = bzip2, J = xz, f = filename.

Extracting archives

tar -xzf archive.tar.gz               # extract gzip
tar -xjf archive.tar.bz2              # extract bzip2
tar -xJf archive.tar.xz               # extract xz
tar -xf archive.tar.gz                # auto-detect compression

# Extract to a specific directory
tar -xzf archive.tar.gz -C /target/dir/

# List contents without extracting
tar -tzf archive.tar.gz

Flag mnemonic: extract, tist.

Useful extras

# Extract a single file from an archive
tar -xzf archive.tar.gz path/to/file.txt

# Add verbose output to see each file as it is processed
tar -xzvf archive.tar.gz

# Create an archive excluding certain files
tar -czf archive.tar.gz directory/ --exclude='*.log' --exclude='.git'

Quick Reference Table

Format Extension Create Extract List Best for
zip .zip zip -r out.zip dir/ unzip out.zip unzip -l Cross-platform sharing
gzip .gz gzip -k file gunzip file.gz Single files, pipelines
bzip2 .bz2 bzip2 -k file bunzip2 file.bz2 Legacy archives
xz .xz xz -k file unxz file.xz Maximum compression
tar+gzip .tar.gz tar -czf out.tar.gz dir/ tar -xzf out.tar.gz tar -tzf General archiving
tar+bzip2 .tar.bz2 tar -cjf out.tar.bz2 dir/ tar -xjf out.tar.bz2 tar -tjf Legacy source archives
tar+xz .tar.xz tar -cJf out.tar.xz dir/ tar -xJf out.tar.xz tar -tJf Distribution packages

Choosing a Format


Key Takeaways


← Chapter 6: rpm and dnf Table of Contents Chapter 8: Essential File and Text Utilities →