Jump to content
Aquablue

CountStr Question

Recommended Posts

I'm using GetTextAtEx to read names from a client. Since the text is messy (never monochrome, a lot of shadow, color gradients etc.) and GetTextAtEx isn't fully foolproof, often a perfect match is not found. Therefore I'm using a routine which loops through a long list of strings, which has all known exact possible matches, and compares each entry to 'raw data' from GetTextAtEx.

 

In the hope of improving performance somewhat, the routine has three stages:

1. first check if a perfect match is found (by simply comparing the result from GetTextAtEx with each entry from the list; if a match is found, break off searching)

2. if no match is found, try to find a partial match using CountStr (since often GetTextAtEx misses a single letter at the beginning or end of the result; if a match is found, break off searching)

3. if still no match is found, use a more rigorous approach with a subroutine which determines the Levenshtein distance (LD) between the two strings (see below), then assigns the match with the smallest LD to the result. If a LD of less than 2 is found, the result is regarded as a perfect match, and searching breaks off

 

Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance) is a measure to determine the similarity between two (slightly) different strings. The routine I'm using is below:

 

Function GetLevenshteinDistance(const Str1 : string; Str2 : string) : smallint; // measures similarity between two inexact strings using Levenshtein distance
var Strlen1, Strlen2, i, j : smallint;
var Result1, Result2, Result3 : smallint;
var LDArray : Array of Array of smallint;
begin
   Str1 := LowerCase(Str1);
   Str2 := LowerCase(Str2);
   Strlen1 := Length(Str1);
   Strlen2 := Length(Str2);
   SetLength(LDArray,Strlen1+2);
   for i := Low(LDArray) to High(LDArray) do SetLength(LDArray[i],Strlen2+1);
   for i := 0 to Strlen1 do LDArray[i][0] := i;
   for j := 0 to Strlen2 do LDArray[0][j] := j;

   for i := 1 to Strlen1 do
   begin
       for j := 1 to Strlen2 do
       begin
           if Copy(Str1,i,1) = Copy(Str2,j,1) then LDArray[i][j] := LDArray[i-1][j-1]
           else begin
               Result1 := LDArray[i-1][j] + 1;
               Result2 := LDArray[i][j-1] + 1;
               Result3 := LDArray[i-1][j-1] + 1;
               LDArray[i][j] := Min(Result1,Min(Result2,Result3));
           end;
       end;
  end;
   Result := LDArray[strlen1][strlen2];
end;

 

Now, my question is as follows: is the CountStr function efficient in terms of resources when compared to the GetLevenshteinDistance routine shown above? This question may be a long shot to answer, but I'm looking for input from experienced programmers whether the second step (using CountStr) is necessary to speed up the searching routine. The CountStr step has been causing a lot of false positives, and if it's not overly efficient with resources compared with the GetLevenshteinDistance routine, I want to skip it and proceed to the GetLevenshteinDistance routine right away.

 

Sorry for the long story, any input is much appreciated. :D

Edited by Aquablue
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
  • Create New...