Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

DarkBASIC Professional Discussion / Huffman compression.

Author
Message
kkzgreen
21
Years of Service
User Offline
Joined: 12th Apr 2004
Location: Nashville, TN
Posted: 20th Jun 2005 13:50
Ok. I know how to compress using this algorithm. You just count up the frequency each character appears and replace it with a shorter character.

For example in

ABACABBAACD

A appears 5 times; B appears 3 times; C appears 2 times; and D appears once.

So instead of the 8 bits per character you could replace it with 2 bits each so that A=00 (or 0); B=01; C=10; D=11 so

ABACABBAACD=00101000101001011
so instead of 88 bits we have 17 bits.

Please correct me if I'm wrong so far.

If I'm correct Then I have 2 questions.

1. 17 bits cannot be changed into bytes so would you just add 7 additional bits?

2. How in the world would you go about decompressing it? Do you have to put some kind a key at the end of a compressed file?
JoelJ
21
Years of Service
User Offline
Joined: 8th Sep 2003
Location: UTAH
Posted: 20th Jun 2005 14:57
Quote: "2. How in the world would you go about decompressing it? Do you have to put some kind a key at the end of a compressed file?"

i believe zip files use that algorithm, and they put the key at the beginning of the file...i could be wrong
G Man
20
Years of Service
User Offline
Joined: 17th Nov 2004
Location:
Posted: 20th Jun 2005 16:03 Edited at: 21st Jun 2005 06:48
Actually, that's not how the huffman alg. works... At its simplest, the most frequently used character is assigned a single bit to indicate its occurrence. The next most frequently used character is assigned two bits... and so on... So the bit patterns for your example would look like:

a=0
b=10
c=110
d=1110

Then you need a "virtual end of file" character. Which in this case would be 11110. The table of characters needs to be written at the beginning of the file so that you know how to decode the data that follows.

Intel Pentium 4, 3.4GHz, 1280MB RAM, NVidia Quadro FX3000/256MB, 240GB HD, XP Pro
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 20th Jun 2005 22:56
What?

Huffman doesn't assign codes like that, and it also doesn't need an EOF code - when you run out of data to decode, then you have hit EOF.

A reasonable example can be found here : http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node59.html

Ignore the 'proofs' at the tail end of the page - just look at how codes are assigned by assembling a tree, keeping the more frequently used characters at the top of the tree.

*** Coming soon - Network Plug-in - Check my site for info ***
For free Plug-ins and source code http://www.matrix1.demon.co.uk
Keaz
21
Years of Service
User Offline
Joined: 22nd Sep 2003
Location: Somewhere in south Texas
Posted: 21st Jun 2005 00:18
Thanks that help me out as I plan on compessing alot to deal with my game and adding some ecryption to protect it as well, but this will help with my research.

Breaking Stuff=Fun!,Bug Testing<>Fun!, Bug Testing=Breaking Stuff, so...
Bug Testing=Fun! Hmmmm....
DOES NOT COMPUTE! SYSTEM MALFUNTION!
kkzgreen
21
Years of Service
User Offline
Joined: 12th Apr 2004
Location: Nashville, TN
Posted: 21st Jun 2005 02:38
Thanks everyone for helping me out. Especially IanM for the link.
G Man
20
Years of Service
User Offline
Joined: 17th Nov 2004
Location:
Posted: 21st Jun 2005 04:10 Edited at: 21st Jun 2005 06:47
Here are a number of sites that detail the Huffman Algorithm and describe what I am saying:

http://www.geocities.com/SiliconValley/2151/huffman.htmlhttp://www.arturocampos.com/cp_ch3-1.html
http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/huff-op.html
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Greedy/huffman.htm

You do need a virtual EOF character as the actual end of the file might not actually occur at the conclusion of a byte boundary. If each of your bit codes for characters are 5 bits long (pursuing your contention that the bit value lengths remain constant) and you have 7 bytes of data to compress... that means that your compressed data is going to occupy 35 bits. There will however be 5 bits remaining (since you can only write whole bytes to a file)... How do you know when you are decoding that these remaining empty bits are actually empty bits that are to be ignored without some coded EOF sequence...

Following your logic then further... if each character is assigned a fixed number of bits to encode it, how do you realize any compression on a file that contains all 256 characters of the ASCII table? Since in order to store these values you would need 8 bits (the same quantity as is already being used). Without varying the number of bits used to encode a character, the frequency of its occurrence in the data stream is irrelevant.

Intel Pentium 4, 3.4GHz, 1280MB RAM, NVidia Quadro FX3000/256MB, 240GB HD, XP Pro
IanM
Retired Moderator
22
Years of Service
User Offline
Joined: 11th Sep 2002
Location: In my moon base
Posted: 21st Jun 2005 07:45
You are right - ideally, there would be no padding needed for the bit stream, but of course in the real world, we are stuck with multiples of 8 bits. Adding padding bits to signal EOF is a little more efficient (storage-wise) than holding size of the expanded byte stream.

*** Coming soon - Network Plug-in - Check my site for info ***
For free Plug-ins and source code http://www.matrix1.demon.co.uk

Login to post a reply

Server time is: 2025-06-06 16:03:42
Your offset time is: 2025-06-06 16:03:42