Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I implemented the Levenshtein algorithm in Java and am now getting the corrections made by the algorithm, a.k.a. the cost. This does help a little but not much since I want the results as a percentage.

So I want to know how to calculate those similarity points.

I would also like to know how you people do it and why.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
193 views
Welcome To Ask or Share your Answers For Others

1 Answer

The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. (Wikipedia)

  • So a Levenshtein distance of 0 means: both strings are equal
  • The maximum Levenshtein distance (all chars are different) is max(string1.length, string2.length)

So if you need a percentage, you have to use this to points to scale. For example:

"Hallo", "Hello" -> Levenstein distance 1 Max Levenstein distance for this two strings is: 5. So the 20% of the characters do not match.

String s1 = "Hallo";
String s2 = "Hello";
int lfd = calculateLevensteinDistance(s1, s2);
double ratio = ((double) lfd) / (Math.max(s1.length, s2.length));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...