Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Code Snippets / [DBP] Fuzzy string compare algorithm - Levenshtein distance

Author
Message
sladeiw
15
Years of Service
User Offline
Joined: 16th May 2009
Location: UK
Posted: 19th Jun 2009 14:47
I thought this might be useful to someone doing text parsing, I can't find anything similar in the forum. Basically It compares two strings and returns the Levenshtein distance between them. This is a measure of the differences between the two strings based on certain rules such as: added, missing, wrong & swapped letters within the string. Each difference adds 1 to the total difference `cost`.

It is not my original work, I translated the algorithm from VB/C++ at http://www.merriampark.com/ld.htm#VB

Public domain, free to use in any way. Please point out any bugs you find!

You need IanM's matrix utils because the routine uses his mid$() and min() commands.

Jeff Miller
19
Years of Service
User Offline
Joined: 22nd Mar 2005
Location: New Jersey, USA
Posted: 20th Jun 2009 02:08
This looks like it would be useful in a spell checker application that offers a recommendation of what the user should have intended to type if the user has typed a word that does not appear in a dictionary array. In that case I take it that the program would compare against words in the dictionary array, and if finding nothing that compares exactly would flag the error and offer a recommendation, or a list of several words inversely ordered by the "cost".

What other types of applications use this algorithm?
sladeiw
15
Years of Service
User Offline
Joined: 16th May 2009
Location: UK
Posted: 20th Jun 2009 12:25
I'm using it for searching an index and returning close matches if no exact ones are found. As it stands you can only test word against word, but you could easily modify it to search for a word or phrase within a long piece of text.

Wikipedia has an article on the subject, it mentions spam filtering, and dna-matching as other uses!

Login to post a reply

Server time is: 2024-05-18 14:47:28
Your offset time is: 2024-05-18 14:47:28