Scripting : useful compression commands

Scripting : useful compression commands

  • mdo  DigitalBox
  •   Scripting
  •   March 20, 2025

When working with files in Haiku, compression and decompression utilities are essential for managing archives efficiently.

Haiku provides a variety of command-line tools for handling compressed files, whether you're reducing storage space, extracting archives, or inspecting their contents.

This article covers some of the most useful compression-related commands available.

Commands overview

Command Description
bzip2 Compresses files into .bz2 format
bunzip2 Decompresses .bz2 files
gzip Compresses files into .gz format
gunzip Decompresses .gz files
xz Compresses files into .xz format
unxz Decompresses .xz files
zip Creates .zip archives
unzip Extracts files from .zip archives
funzip Extracts a single file from a .zip archive
zipinfo Displays details about a .zip archive
lha Creates and extracts archives in .lha (LZH) format
7z Creates and extracts archives in 7z and other supported formats
unrar Extracts .rar archives
tar Archives files and directories, often used with gzip/bzip2/xz
zcat / bzcat / xzcat Displays contents of a compressed file without extracting
zdiff / bzdiff / xzdiff Compares the contents of two compressed files

Sample data file

Let's work with a big CSV file from the Feed Grains Database.

If you want to do the tests by yourself, download the csv file and extract it on your Desktop :

As you can see the file is big enough to do some compression tests :

We will review how to compress this file, inspect the content of the archive for each format and its compression ratio, and extract the archive to retrieve the original file.

gzip

gzip compresses the files using the DEFLATE algorithm combining LZ77 (Lempel-Ziv) and Huffman coding which gives a quick compression with a correct compression rate.

We will combine the usage of the tar command to use the format used by gzip.

If you display the help for the tar utility, there's a create mode defined with the "-c" parameter:


In order to create a tar archive and compress it under the "gz" format, type the below line in a Terminal :

tar -czf FeedGrains.tar.gz FeedGrains.csv

You can also proceed in two steps for the compression :

tar -cf FeedGrains.tar FeedGrains.csv
gzip FeedGrains.tar

Once completed, let's check the compressed size, the original size and the ratio :

gzip -l FeedGrains.tar.gz

The obtained compression ratio for this file is 94% which is fine compared to the time needed to do it.

Later in this article, I will provide a quick comparison between the various compression commands.

How about decompressing the file ?

For that it's very simple, you just need to type :

tar -xvf FeedGrains.tar.gz

In case you have used gzip without the tar command, you can decompress the archive with gunzip.

If you need to handle the details of the compression, you can use the various option of the gzip command :

gzip --help

xz

This compression command can used in combination with the tar utility also.

Type the below line to create a tar archive and compress it under the "xz" format:

tar -cJf FeedGrains.tar.xz FeedGrains.csv

You can also proceed in two steps :

tar -cf FeedGrains.tar FeedGrains.csv
xz FeedGrains.tar

Let's check the compressed size and the ratio :

xz -l FeedGrains.tar.xz

As you can see the compressed ratio is 95.1% (100%-0.49%) which is better than gzip.

However the time to proceed is longer.

To decompres the file, just type :

tar -xvf FeedGrains.tar.xz

In case you have used xz without the tar command, you can decompress the archive with unxz.

All the details of the xz features are available :

xz --help

bzip2

bzip2 compresses files using the Burrows-Wheeler algorithm, typically achieving better compression rates than gzip.

However the time needed is longer :)

Tar can be used again in combination with the "bz2" format:

tar -cjf FeedGrains.tar.bz2 FeedGrains.csv

It can also be done in two steps :

tar -cf FeedGrains.tar FeedGrains.csv
b2zip FeedGrains.tar

Let's check the compressed size and the ratio :

ls -l 

As you can see the "bz" archive is the best compressed file : 96.1% !

To decompress the file, just type :

tar -xvf FeedGrains.tar.xz

In case you have used bzip2 without the tar command, you can decompress the archive with bunzip2.

As for the previous commands, you can check also the features available :

bzip2 --help

7z

7z is using LZMA2 for the compression.

It's not usable directly with tar, so you will need to type the below commands if you need a compressed archive : 

tar -cf FeedGrains.tar FeedGrains.csv
7z a FeedGrains.tar.7z FeedGrains.tar

Let's check the compression ratio :

The ratio is 94.9% which is near the xz result.

If you need to extract the archive, it's easy :

7z e FeedGrains.tar.7z

Then :

tar -xvf FeedGrains.tar

You can display the full help for 7z :

7z --help

zip

zip creates compressed archives in ".zip" format.

You can combine tar and zip as per below :

tar -cf FeedGrains.tar FeedGrains.csv
zip FeedGrains.tar.zip FeedGrains.tar



The compression ratio is OK, but one of the lowest with 94%.

You can unzip the archive with :

unzip FeedGrains.tar.zip
tar -xvf FeedGrains.tar

What about the features ?

It's there :

zip --help

lha

lha creates compressed archives in ".lha" format which is an old format but which can be useful sometimes.

To create an lha archive type :

tar -cf FeedGrains.tar FeedGrains.csv
lha a FeedGrains.tar.lha FeedGrains.tar

The compression ratio is 94.3% which is really correct and better for this test case than zip and gz format.

Decompressing a file or an archive is quite simple :

lha x FeedGrains.tar.lha
tar -xvf FeedGrains.tar

You can get also all the details of the lha command :

lha --help

Others commands

Let's review a few additional commands which can be useful

zcat

zcat displays the contents of a .gz file without creating a file.

The equivalent exists for bz2 and xz format : bzcat and xzcat commands.

zdiff

zdiff compares the contents of two compressed files.

The equivalent exists for bz2 and xz format : bzdiff and xzdiff commands


funzip

funzip extracts a single file from a .zip archive without creating a file.

It's a kind of zcat :)

zipinfo

zipinfo provides detailed information about a .zip archive.

The details include compression ratio and size :

unrar

In some specific cases, you might need to extract .rar archives.

For that, the unrar command can be useful :

unrar x archive.tar

Compression ratio and time

Let's finish this article with a bit of metrics.

Please note, this is an empirical approach and the compression ratio will depend on the data themselves.

In this example, it's relative to a big CSV file, so the results might differ if the compressed data are not CSV.

The methodology is quite simple, the time command was used to retrieve the real time of each compression :

First criteria : the compression ratio.

This is quite good for "bz2" and "xz" as per below size of the archive :

Second criteria : time needed.

You can see that "bz2" is requesting a lot of processing time :

That's where "gz" or "zip" are good : compressing files without too much time.

Now let's do a last comparison.

Suppose that I give :

  • 50% of a note for an algorithm which compresses my file to a size near 10 bytes of data (which is impossible)
  • another 50% of a note for an algorithm which compresses my file in less that 0.1 sec

The total will give 100% of the note when both criteria are met, and less when time and/or file compression are worst.

Below are the results :

As you can see the "gz", "zip" and "lha" are good candidate when you mix time needed and compression ratio :)

I hope you have found this article interesting.

There will be other scripting articles in the coming weeks, so stay tuned !


Powered by Bludit - Hosted by Planet Hoster
© 2025 Haiku Insider