Aquablue Posted July 5, 2014 Share Posted July 5, 2014 (edited) I'm using GetTextAtEx to read names from a client. Since the text is messy (never monochrome, a lot of shadow, color gradients etc.) and GetTextAtEx isn't fully foolproof, often a perfect match is not found. Therefore I'm using a routine which loops through a long list of strings, which has all known exact possible matches, and compares each entry to 'raw data' from GetTextAtEx. In the hope of improving performance somewhat, the routine has three stages: 1. first check if a perfect match is found (by simply comparing the result from GetTextAtEx with each entry from the list; if a match is found, break off searching) 2. if no match is found, try to find a partial match using CountStr (since often GetTextAtEx misses a single letter at the beginning or end of the result; if a match is found, break off searching) 3. if still no match is found, use a more rigorous approach with a subroutine which determines the Levenshtein distance (LD) between the two strings (see below), then assigns the match with the smallest LD to the result. If a LD of less than 2 is found, the result is regarded as a perfect match, and searching breaks off Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance) is a measure to determine the similarity between two (slightly) different strings. The routine I'm using is below: Function GetLevenshteinDistance(const Str1 : string; Str2 : string) : smallint; // measures similarity between two inexact strings using Levenshtein distance var Strlen1, Strlen2, i, j : smallint; var Result1, Result2, Result3 : smallint; var LDArray : Array of Array of smallint; begin Str1 := LowerCase(Str1); Str2 := LowerCase(Str2); Strlen1 := Length(Str1); Strlen2 := Length(Str2); SetLength(LDArray,Strlen1+2); for i := Low(LDArray) to High(LDArray) do SetLength(LDArray[i],Strlen2+1); for i := 0 to Strlen1 do LDArray[i][0] := i; for j := 0 to Strlen2 do LDArray[0][j] := j; for i := 1 to Strlen1 do begin for j := 1 to Strlen2 do begin if Copy(Str1,i,1) = Copy(Str2,j,1) then LDArray[i][j] := LDArray[i-1][j-1] else begin Result1 := LDArray[i-1][j] + 1; Result2 := LDArray[i][j-1] + 1; Result3 := LDArray[i-1][j-1] + 1; LDArray[i][j] := Min(Result1,Min(Result2,Result3)); end; end; end; Result := LDArray[strlen1][strlen2]; end; Now, my question is as follows: is the CountStr function efficient in terms of resources when compared to the GetLevenshteinDistance routine shown above? This question may be a long shot to answer, but I'm looking for input from experienced programmers whether the second step (using CountStr) is necessary to speed up the searching routine. The CountStr step has been causing a lot of false positives, and if it's not overly efficient with resources compared with the GetLevenshteinDistance routine, I want to skip it and proceed to the GetLevenshteinDistance routine right away. Sorry for the long story, any input is much appreciated. Edited July 5, 2014 by Aquablue Quote Link to comment Share on other sites More sharing options...