FHannes Posted September 5, 2012 Share Posted September 5, 2012 (edited) Brute-force The brute-force substring search algorithm checks if a string matches, by trying to match it at every position. This is the worst possible algorithm you could use to do this, but here is it anyway. [scar]function StrPos_Brute(const SubStr, Str: string): Integer; var I, J, M, N: Integer; begin Result := 0; M := Length(SubStr); N := Length(Str); for I := 1 to N - M + 1 do begin for J := 1 to M do if Str[i + J - 1] <> SubStr[J] then Break; if J = M + 1 then begin Result := I; Exit; end; end; end;[/scar] Knuth-Morris-Pratt The KMP substring search algorithm uses a deterministic finite-state automaton, represented by a 2 dimensional array of jumpback positions for each character at each position in the substring. You may wonder why this uses AnsiString, this is because the algorithm requires an array that is the size of the number of different characters you use, which is a lot easier for 256 ansi characters than 65536 or more unicode characters. [scar]function StrPos_KMP(const SubStr, Str: AnsiString): Integer; var C: Byte; I, J, N, M, X: Integer; DFA: array[0..255] of TIntArray; begin Result := 0; M := Length(SubStr); if M = 0 then Exit; N := Length(Str); for I := 0 to 255 do SetLength(DFA, M); DFA[Ord(SubStr[1])][0] := 1; X := 0; for I := 2 to M do begin for J := 0 to 255 do DFA[J][i - 1] := DFA[J][X]; C := Ord(SubStr); DFA[C][i - 1] := I; X := DFA[C][X]; end; J := 0; for I := 1 to N do begin if J >= M then Break; J := DFA[Ord(Str)][J]; end; if J = M then begin Result := I - M; Exit; end; end;[/scar] Boyer-Moore The BM substring search algorithm creates a shifting table prior to running the algorithm. The shifting table will indicate how much the substring has to be shifted to the right to potentially match up with the subject string. If for example the character on which your substring match failed is part of the substring, you will shift the substring until the character lines up with the subject string and try to match at that position. Keep in mind that this algorithm matches the substring from the right to the left. Also note that this also uses AnsiString because it builds a table containing data for each character. [scar]function StrPos_BM(const SubStr, Str: AnsiString): Integer; var I, J, N, M, Skip: Integer; Right: array[0..255] of Integer; begin Result := 0; M := Length(SubStr); if M = 0 then Exit; N := Length(Str); for I := 0 to 255 do Right := -1; for I := 1 to M do Right[Ord(SubStr)] := I - 1; I := 0; while I <= N - M + 1 do begin Skip := 0; for J := M downto 1 do if SubStr[J] <> Str[i + J - 1] then begin Skip := J - Right[Ord(Str[i + J - 1])] - 1; if Skip < 1 then Skip := 1; Break; end; if Skip = 0 then begin Result := I; Exit; end; IncEx(I, Skip); end; end;[/scar] Rabin-Karp The RB substring search algorithm searches for a string by creating a hash of the number of characters of the substring from the substring and the string. When these hashes match up, it will indicate the substring matches the string. To avoid having to recalculate the entire hash over and over again, the algorithm "subtracts" the first character from the hash and adds the next one, as such, moving the hash over the entire string until it finds a match or reaches the end. The number of available characters is important in the hashing process to avoid collisions, so this is once again for AnsiString. The 997 prime number used in the hash can be changed to any (preferably large) prime number, the reason for this is also to avoid collisions, but you should look into how hashing works to find out why. [scar]function Hash(const Text: AnsiString; const Len: Integer): Integer; var I: Integer; begin Result := 0; for I := 1 to Len do Result := (Result shl 8 + Ord(Text)) mod 997; end; function StrPos_RK(const SubStr, Str: AnsiString): Integer; var I, N, M, RM, StrHash, SubStrHash: Integer; begin Result := 0; M := Length(SubStr); if M = 0 then Exit; N := Length(Str); RM := 1; for I := 1 to M - 1 do RM := (256 * RM) mod 997; StrHash := Hash(Str, M); SubStrHash := Hash(SubStr, M); if StrHash = SubStrHash then begin Result := 1; Exit; end; for I := M + 1 to N do begin StrHash := (StrHash + 997 - RM * Ord(Str[i - M]) mod 997) mod 997; StrHash := (StrHash * 256 + Ord(Str)) mod 997; if StrHash = SubStrHash then begin Result := I - M + 1; Exit; end; end; end;[/scar] Edited September 6, 2012 by Freddy Quote Link to comment Share on other sites More sharing options...
LordJashin Posted September 6, 2012 Share Posted September 6, 2012 Interesting, always wondered about the methods, and now you showed how to use them in SCAR, Nice! another bookmark I can add . Quote Link to comment Share on other sites More sharing options...
Janilabo Posted September 6, 2012 Share Posted September 6, 2012 Brute-force is easy piece of a cake, but those other 2 are AWESOME functions! Thanks for sharing em, Freddy. Quote Link to comment Share on other sites More sharing options...
FHannes Posted September 6, 2012 Author Share Posted September 6, 2012 (edited) I'll also add the Rabin-Karp substring search algorithm in a bit. If any of them require more explanation btw, feel free to ask. Basically, you should look at the KMP like this... It builds that DFA, which you can visualize like this: It is a map that tells you, for example, if you're at position 5 in the string, and the string has the character A in that position, where you have to jump from there to continue matching. If you had to match an A at that position, it will jump you to the next position in the string, if you had to match something else, it will jump to wherever it has to start over, which isn't always from the start, depending on your search string. Edited September 6, 2012 by Freddy Quote Link to comment Share on other sites More sharing options...
shadowrecon Posted September 6, 2012 Share Posted September 6, 2012 This is awesome Freddy, i think having examples of functions like these can really help others out! You are a professional! lol Quote Link to comment Share on other sites More sharing options...
MarkD Posted September 7, 2012 Share Posted September 7, 2012 I've experimented with something similar in the past in VHDL. KMP was the big winner there. Quote Link to comment Share on other sites More sharing options...
FHannes Posted September 7, 2012 Author Share Posted September 7, 2012 Hmm, I'm not sure about that... It will depend on the entered strings, but I beleive that generally Boyer-Moore is faster than KMP. Quote Link to comment Share on other sites More sharing options...
MarkD Posted September 7, 2012 Share Posted September 7, 2012 I'm not sure we tried Boyer-Moore, so that could be. Also, FPGA is one big emulation. It could be one works quicker on FPGA and another on PC. I'm not that good at it, just glad I know how it works and how to deal with it. Quote Link to comment Share on other sites More sharing options...