Huffman compression. - GameCreators Forum

Author

Message

kkzgreen

21

Years of Service

User Offline

Joined: 12th Apr 2004

Location: Nashville, TN

Posted: 20th Jun 2005 13:50

Link

Ok. I know how to compress using this algorithm. You just count up the frequency each character appears and replace it with a shorter character.

For example in

ABACABBAACD

A appears 5 times; B appears 3 times; C appears 2 times; and D appears once.

So instead of the 8 bits per character you could replace it with 2 bits each so that A=00 (or 0); B=01; C=10; D=11 so

ABACABBAACD=00101000101001011
so instead of 88 bits we have 17 bits.

Please correct me if I'm wrong so far.

If I'm correct Then I have 2 questions.

1. 17 bits cannot be changed into bytes so would you just add 7 additional bits?

2. How in the world would you go about decompressing it? Do you have to put some kind a key at the end of a compressed file?

Back to top

Profile PM

JoelJ

21

Years of Service

User Offline

Joined: 8th Sep 2003

Location: UTAH

Posted: 20th Jun 2005 14:57

Link

Quote: "2. How in the world would you go about decompressing it? Do you have to put some kind a key at the end of a compressed file?"

i believe zip files use that algorithm, and they put the key at the beginning of the file...i could be wrong

Back to top

Profile PM Website

G Man

20

Years of Service

User Offline

Joined: 17th Nov 2004

Location:

Posted: 20th Jun 2005 16:03 Edited at: 21st Jun 2005 06:48

Link

Actually, that's not how the huffman alg. works... At its simplest, the most frequently used character is assigned a single bit to indicate its occurrence. The next most frequently used character is assigned two bits... and so on... So the bit patterns for your example would look like:

a=0
b=10
c=110
d=1110

Then you need a "virtual end of file" character. Which in this case would be 11110. The table of characters needs to be written at the beginning of the file so that you know how to decode the data that follows.

Intel Pentium 4, 3.4GHz, 1280MB RAM, NVidia Quadro FX3000/256MB, 240GB HD, XP Pro

Back to top

Profile PM

IanM

Retired Moderator

22

Years of Service

User Offline

Joined: 11th Sep 2002

Location: In my moon base

Posted: 20th Jun 2005 22:56

Link

What?

Huffman doesn't assign codes like that, and it also doesn't need an EOF code - when you run out of data to decode, then you have hit EOF.

A reasonable example can be found here : http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node59.html

Ignore the 'proofs' at the tail end of the page - just look at how codes are assigned by assembling a tree, keeping the more frequently used characters at the top of the tree.

*** Coming soon - Network Plug-in - Check my site for info ***
For free Plug-ins and source code http://www.matrix1.demon.co.uk

Back to top

Profile PM Email Website

Keaz

21

Years of Service

User Offline

Joined: 22nd Sep 2003

Location: Somewhere in south Texas

Posted: 21st Jun 2005 00:18

Link

Thanks that help me out as I plan on compessing alot to deal with my game and adding some ecryption to protect it as well, but this will help with my research.

Breaking Stuff=Fun!,Bug Testing<>Fun!, Bug Testing=Breaking Stuff, so...
Bug Testing=Fun! Hmmmm....
DOES NOT COMPUTE! SYSTEM MALFUNTION!

Back to top

Profile PM Email

kkzgreen

21

Years of Service

User Offline

Joined: 12th Apr 2004

Location: Nashville, TN

Posted: 21st Jun 2005 02:38

Link

Thanks everyone for helping me out. Especially IanM for the link.

Back to top

Profile PM

G Man

20

Years of Service

User Offline

Joined: 17th Nov 2004

Location:

Posted: 21st Jun 2005 04:10 Edited at: 21st Jun 2005 06:47

Link

Here are a number of sites that detail the Huffman Algorithm and describe what I am saying:

http://www.geocities.com/SiliconValley/2151/huffman.html http://www.arturocampos.com/cp_ch3-1.html
http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/huff-op.html
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/Greedy/huffman.htm

You do need a virtual EOF character as the actual end of the file might not actually occur at the conclusion of a byte boundary. If each of your bit codes for characters are 5 bits long (pursuing your contention that the bit value lengths remain constant) and you have 7 bytes of data to compress... that means that your compressed data is going to occupy 35 bits. There will however be 5 bits remaining (since you can only write whole bytes to a file)... How do you know when you are decoding that these remaining empty bits are actually empty bits that are to be ignored without some coded EOF sequence...

Following your logic then further... if each character is assigned a fixed number of bits to encode it, how do you realize any compression on a file that contains all 256 characters of the ASCII table? Since in order to store these values you would need 8 bits (the same quantity as is already being used). Without varying the number of bits used to encode a character, the frequency of its occurrence in the data stream is irrelevant.

Intel Pentium 4, 3.4GHz, 1280MB RAM, NVidia Quadro FX3000/256MB, 240GB HD, XP Pro

Back to top

Profile PM

IanM

Retired Moderator

22

Years of Service

User Offline

Joined: 11th Sep 2002

Location: In my moon base

Posted: 21st Jun 2005 07:45

Link

You are right - ideally, there would be no padding needed for the bit stream, but of course in the real world, we are stuck with multiples of 8 bits. Adding padding bits to signal EOF is a little more efficient (storage-wise) than holding size of the expanded byte stream.

*** Coming soon - Network Plug-in - Check my site for info ***
For free Plug-ins and source code http://www.matrix1.demon.co.uk

Back to top

Profile PM Email Website

Sorry your browser is not supported!

DarkBASIC Professional Discussion / Huffman compression.