Unzipping The Mystery - How ZIP Files Work
Unzipping The Mystery - How ZIP Files Work
Unzipping The Mystery - How ZIP Files Work
If youve ever had to email, upload or download several large files or programs, youve most likely encountered ZIP files. Also known as compressed or archived files, ZIP files condense multiple files into a single location with the extension .zip or .ZIP, reducing the overall size and making them easier to transmit. Phillip Katz invented the ZIP file in 1986, and it was first implemented with the PKZip program for Katzs company, PKWare, Inc. Eventually, Katzs compression method became common usage within popular operating systems. Microsoft Windows and Apples Mac OS include built-in utilities to compress and unzip files, and programs like WinRAR, WinZip and StuffIt can expand them. But how does it all work? What kind of technological magic is at play that makes your files smaller while maintaining all of the information for later? That magic is actually a pretty straightforward algorithm that takes the redundant aspects of a file and breaks it into smaller parts.
For an easy-to-understand example, lets take the sentence, Mashable can help make readers smarter; readers can help make Mashable smarter, and pretend its a file. Every word in the example sentence appears twice. If each character and space in this sentence made up one unit of memory, the whole thing would have a file size of 78 units. If we created a numbered code or dictionary for this sentence, it could go something like this:
1. Mashable 2. can 3. help 4. make 5. readers 6. smarter 1 2 3 4 5 6; 5 2 3 4 1 6 This new sentence has only 24 units. Therefore, the compressed file would have only 24 units of memory in addition to another file that lists our numbered code, so that the compression program knows how to apply each unit of information. This is called lossless compression; all of the original information is retained. The way in which an actual compression program works is a little bit more complicated than the previous example it would recognize patterns. An example is the letter e and a space after Mashable and make. But since there arent many instances in which this particular pattern occurs, the program would most likely overwrite it with a more apparent pattern. The actual program is able to find a much more efficient dictionary and compressed file than we could. According to educational and instructional website HowStuffWorks, its common for languages to have redundant patterns, which is why text files are easily compressed. But the file reduction ratio depends on several factors, including the files type and size and how the program chooses to compress it. In contrast, images and MP3 files contain more unique information without many patterns. Thats where lossy compression comes in compression programs get rid of what they deem unnecessary information. If you had a scanned image, for example, with a blue sky, a compression program could pick one color of blue used for every pixel. If the compression scheme works well, the change wouldnt be very noticeable, but the file size would be significantly smaller. The issue with lossy compression, though, is that you cant get the original file from the compressed file, making it less ideal than lossless compression when you need to retain all of the original information, such as when youre downloading databases and certain applications. Mashable composite image courtesy of iStockphoto, tose, Auris.