When working with files in Haiku, compression and decompression utilities are essential for managing archives efficiently.
Haiku provides a variety of command-line tools for handling compressed files, whether you're reducing storage space, extracting archives, or inspecting their contents.
This article covers some of the most useful compression-related commands available.
Command | Description |
bzip2 | Compresses files into .bz2 format |
bunzip2 | Decompresses .bz2 files |
gzip | Compresses files into .gz format |
gunzip | Decompresses .gz files |
xz | Compresses files into .xz format |
unxz | Decompresses .xz files |
zip | Creates .zip archives |
unzip | Extracts files from .zip archives |
funzip | Extracts a single file from a .zip archive |
zipinfo | Displays details about a .zip archive |
lha | Creates and extracts archives in .lha (LZH) format |
7z | Creates and extracts archives in 7z and other supported formats |
unrar | Extracts .rar archives |
tar | Archives files and directories, often used with gzip/bzip2/xz |
zcat / bzcat / xzcat | Displays contents of a compressed file without extracting |
zdiff / bzdiff / xzdiff | Compares the contents of two compressed files |
Let's work with a big CSV file from the Feed Grains Database.
If you want to do the tests by yourself, download the csv file and extract it on your Desktop :
As you can see the file is big enough to do some compression tests :
We will review how to compress this file, inspect the content of the archive for each format and its compression ratio, and extract the archive to retrieve the original file.
gzip compresses the files using the DEFLATE algorithm combining LZ77 (Lempel-Ziv) and Huffman coding which gives a quick compression with a correct compression rate.
We will combine the usage of the tar command to use the format used by gzip.
If you display the help for the tar utility, there's a create mode defined with the "-c" parameter:
In order to create a tar archive and compress it under the "gz" format, type the below line in a Terminal :
tar -czf FeedGrains.tar.gz FeedGrains.csv
You can also proceed in two steps for the compression :
tar -cf FeedGrains.tar FeedGrains.csv
gzip FeedGrains.tar
Once completed, let's check the compressed size, the original size and the ratio :
gzip -l FeedGrains.tar.gz
The obtained compression ratio for this file is 94% which is fine compared to the time needed to do it.
Later in this article, I will provide a quick comparison between the various compression commands.
How about decompressing the file ?
For that it's very simple, you just need to type :
tar -xvf FeedGrains.tar.gz
In case you have used gzip without the tar command, you can decompress the archive with gunzip.
If you need to handle the details of the compression, you can use the various option of the gzip command :
gzip --help
This compression command can used in combination with the tar utility also.
Type the below line to create a tar archive and compress it under the "xz" format:
tar -cJf FeedGrains.tar.xz FeedGrains.csv
You can also proceed in two steps :
tar -cf FeedGrains.tar FeedGrains.csv
xz FeedGrains.tar
Let's check the compressed size and the ratio :
xz -l FeedGrains.tar.xz
As you can see the compressed ratio is 95.1% (100%-0.49%) which is better than gzip.
However the time to proceed is longer.
To decompres the file, just type :
tar -xvf FeedGrains.tar.xz
In case you have used xz without the tar command, you can decompress the archive with unxz.
All the details of the xz features are available :
xz --help
bzip2 compresses files using the Burrows-Wheeler algorithm, typically achieving better compression rates than gzip.
However the time needed is longer :)
Tar can be used again in combination with the "bz2" format:
tar -cjf FeedGrains.tar.bz2 FeedGrains.csv
It can also be done in two steps :
tar -cf FeedGrains.tar FeedGrains.csv
b2zip FeedGrains.tar
Let's check the compressed size and the ratio :
ls -l
As you can see the "bz" archive is the best compressed file : 96.1% !
To decompress the file, just type :
tar -xvf FeedGrains.tar.xz
In case you have used bzip2 without the tar command, you can decompress the archive with bunzip2.
As for the previous commands, you can check also the features available :
bzip2 --help
7z is using LZMA2 for the compression.
It's not usable directly with tar, so you will need to type the below commands if you need a compressed archive :
tar -cf FeedGrains.tar FeedGrains.csv
7z a FeedGrains.tar.7z FeedGrains.tar
Let's check the compression ratio :
The ratio is 94.9% which is near the xz result.
If you need to extract the archive, it's easy :
7z e FeedGrains.tar.7z
Then :
tar -xvf FeedGrains.tar
You can display the full help for 7z :
7z --help
zip creates compressed archives in ".zip" format.
You can combine tar and zip as per below :
tar -cf FeedGrains.tar FeedGrains.csv
zip FeedGrains.tar.zip FeedGrains.tar
The compression ratio is OK, but one of the lowest with 94%.
You can unzip the archive with :
unzip FeedGrains.tar.zip
tar -xvf FeedGrains.tar
What about the features ?
It's there :
zip --help
lha creates compressed archives in ".lha" format which is an old format but which can be useful sometimes.
To create an lha archive type :
tar -cf FeedGrains.tar FeedGrains.csv
lha a FeedGrains.tar.lha FeedGrains.tar
The compression ratio is 94.3% which is really correct and better for this test case than zip and gz format.
Decompressing a file or an archive is quite simple :
lha x FeedGrains.tar.lha
tar -xvf FeedGrains.tar
You can get also all the details of the lha command :
lha --help
Let's review a few additional commands which can be useful
zcat displays the contents of a .gz file without creating a file.
The equivalent exists for bz2 and xz format : bzcat and xzcat commands.
zdiff compares the contents of two compressed files.
The equivalent exists for bz2 and xz format : bzdiff and xzdiff commands
funzip extracts a single file from a .zip archive without creating a file.
It's a kind of zcat :)
zipinfo provides detailed information about a .zip archive.
The details include compression ratio and size :
In some specific cases, you might need to extract .rar archives.
For that, the unrar command can be useful :
unrar x archive.tar
Let's finish this article with a bit of metrics.
Please note, this is an empirical approach and the compression ratio will depend on the data themselves.
In this example, it's relative to a big CSV file, so the results might differ if the compressed data are not CSV.
The methodology is quite simple, the time command was used to retrieve the real time of each compression :
First criteria : the compression ratio.
This is quite good for "bz2" and "xz" as per below size of the archive :
Second criteria : time needed.
You can see that "bz2" is requesting a lot of processing time :
That's where "gz" or "zip" are good : compressing files without too much time.
Now let's do a last comparison.
Suppose that I give :
The total will give 100% of the note when both criteria are met, and less when time and/or file compression are worst.
Below are the results :
As you can see the "gz", "zip" and "lha" are good candidate when you mix time needed and compression ratio :)
I hope you have found this article interesting.
There will be other scripting articles in the coming weeks, so stay tuned !