Monday, October 20, 2008

superstring

Some code for superstring given two strings. It assumes that first string is longer than second string. If the lengths of the strings first and second are m and n, it takes O(mn) time (as m >= n, the other two loops run for O(n^2) time). There are three loops, first loop looks for second string within the first string, second loop looks for the second string before the first string and third loop looks for after. A list of results is maintained from which the string which is least lengthy and lexicographically smallest is returned as the superstring. I am not sure if this is the best way.

string getSuperString(char[] first, char[] second):
    list results
    int i, j
    
    // abcdefgh
    // abc -->|
    for i = 0 to first.length-1 :
        for j = 0 to second.length-1 && (i+second.length <= first.length) :
            if second[j] != first[i+j] :
                break
        j--
        if j == second.length-1 : //inner for did not break
            results.add(first) // best case

    //     abcdefgh
    // <--abc
    for i = 1 to second.length-1 :
        for j = i to second.length-1 :
            if second[j] != first[j-i] :
                break
        j--
        if j == second.length-1 : // inner for loop did not break
            results.add(
                second[0..i] + //before
                second [i..second.length-i] + //matched part
                first[j..first.length-j] //after
            )
    
    // abcdefgh
    //       abc -->
    for i = second.length-2 to 0 :
        for j = 0 to i :
            if second[j] != first[first.length-1 -i+j] :
                break
        j--
        if j == i : //inner for did not break
            results.add(
                first[0..first.length-2 -i+j] + //before
                second[0..j] + //matched part
                second[j..second.length-1] //after
            )
    
    results.add(first + second)
    results.add(second + first)
    
    //find string that is least lengthy and lexicographically smallest
    string superstring = results.get(0)
    for curr in results :
        if curr.length <= superstring.length :
            if curr < superstring :
                superstring = curr
    
    return superstring


The second and third loop explained below.

Second loop:
    abcdefgh
<--abcd


for i = 1 to 3
    for j = i to 3
        print i, j, (j-i)


i    j (second)     j-i (first)
-----------------------------------
1    1, 2, 3        0, 1, 2
2    2, 3           0, 1
3    3              0

    012
    abcdefgh
<--abcd
    123

     01
     abcdefgh
<--abcd
     23

      0
      abcdefgh
<--abcd
      3


Third loop:

abcdefgh
     abcd -->


for i = 4-2 to 0
    for j = 0 to i
        print i, j, (8-1 -i+j)


i    j (second)    8-1 -i+j (first)
----------------------------------------
2    0, 1, 2       5, 6, 7
1    0, 1          6, 7
0    0             7

     567
abcdefgh
     abcd -->
     012

      67
abcdefgh
      abcd -->
      01

       7
abcdefgh
       abcd -->
       0

No comments:

Post a Comment