The Levenshtein distance is a measure of the difference between two strings. It indicates how many single-character operations are needed to transform one string into the other. The allowed operations are:
Insertion of a character
Deletion of a character
Substitution of one character with another
The Levenshtein distance between "house"
and "mouse"
is 1, since only one letter (h → m) needs to be changed.
Levenshtein distance is used in many areas, such as:
Spell checking (suggesting similar words)
DNA sequence comparison
Plagiarism detection
Fuzzy searching in databases or search engines
For two strings a
and b
of lengths i
and j
:
lev(a, b) = min(
lev(a-1, b) + 1, // deletion
lev(a, b-1) + 1, // insertion
lev(a-1, b-1) + cost // substitution (cost = 0 if characters are equal; else 1)
)
There are also more efficient dynamic programming algorithms to compute it.