Thursday, July 17, 2008

String matching

The naive brute force approach:

It is very simple to code but there are little things to look for (as least I have to) with respect to optimization. The outer loop runs for text, the inner loop runs for pattern with one more condition -  i+ len(pattern) <= len(text).

int bruteForce (text, pattern)
    for (i =0; i < len(text) - len(pattern) + 1; i++):
        for (j = 0; j < len(pattern); j++):
            if text[i+j] != pattern[j]:
                break;
        if (j==len(pattern)) return i; //match found
    return -1 //match not found

With i+j < len(text) (2+0 < 3), the inside loop runs this the last, even though not required i.e. i=2 is run.
 

012    i
aab    len(text)= 3
  ax   len(pattern)= 2
  01
   j

With
i+ len(pattern) <= len(text) (1+2 <= 3), the inside loop runs this the last i.e. i=2 is not run.
 

012   i
aab   len(text)= 3
 ax   len(pattern)= 2
 01   j
 


Update: A simpler way is to run the outer loop for condition: i <= len(text) - len(pattern)

[Hat tip to EE, L at topcoder]


No comments:

Post a Comment