**Longest common subsequence**:

xm: A

**B**

**C**

**B**D

**A**B

yn:

**B**D

**C**A

**B**

**A**

The LCS is

**B C B A**

For arbitrary inputs, it is NP-hard. For constant inputs, DP gives the solution in polynomial time ex: O(n^k). Also, there can be more than one LCSs and the solution for that is exponential time ex: O(k^n).

Here we maintain an LCS matrix of m*n size. Initially, for i= 0 or j = 0, LCSij = 0. Then we use the following optimal substructure.

LCSij = 0 for i = 0 or j = 0

= LCSi-1j-1 + 1 for xi = yj

= max(LCSi-1j, LCSij-1) for xi != yj

I thought of the following on my try which also works:

LCSij = 0 for i = 0 or j = 0

= max(LCSi-1j-1 + Aij, LCSi-1j, LCSij-1)

where

Aij = 1 for xi = yj

= 0 for xi != yj

It works as LCS

**i-1j-1**would be lesser than LCSi-1j or LCSij-1 by not more than 1. If xi = yj then we consume characters from both strings (LCS

**i-1j-1**+ 1) and if not then we consume character from either one (max(LCSi-1j, LCSij-1)).

*"The rationale for this recurrence is that, if the last characters of two sequences are equal, they must be part of the LCS. You could never get a larger LCS by matching x*

_{m}to y_{j}where j < n, and vice versa." (from wiki)incorrect:

` ... A`

. x x x

. 1 *0* 2

A 1 2 *1*

corrected:

` ... A OR ... A`

. x x x . x x x

. 1 *1* 2 . 1 *1* 2

A 1 2 __2__ B 1 __2__ **2**

Below is an example from wiki:

| 0 1 2 3 4 5 6 7

| *M* *Z* *J* *A* *W* *X* *U*

----|-----------------

0 | __0__ 0 0 0 0 0 0 0

1 *X* | __0__ 0 0 0 0 0 1 1

2 *M* | 0 __1__ __1__ 1 1 **1 1** 1

3 *J* | 0 1 1 __2__ 2 **2 2** 2

4 *Y* | 0 1 1 __2__ 2 2 2 2

5 *A* | 0 1 1 **2 **__3__ __3__ __3__ 3

6 *U* | 0 1 1 2 3 3 3 __4__

7 *Z* | 0 1 2 2 3 3 3 __4__

**getLongestCommonSubsequence**(x, y):
lcs[m,n]
for i = 0 to m:
lcs[i,0] = 0
for j = 0 to n:
lcs[0,j] = 0
for i = 1 to m:
for j = 1 to n:
if x[i] = y[j]:
lcs[i,j] = lcs[i-1,j-1] + 1
else:
lcs[i,j] = max(lcs[i-1,j], lcs[i,j-1])

After LCS matrix is calculated in O(n*m) time, we just need to trace back from LCS[n,m] location.

**printLCS**(lcs, x, y, i, j):
if i = 0 or j = 0:
return ""
else:
if x[i] = y[j]:
return printLCS(lcs, x, y, i-1, j-1) + x[i]
else:
if lcs[i,j-1] > lcs[i-1,j]:
return printLCS(lcs, x, y, i, j-1)
else:
return printLCS(lcs, x, y, i-1, j)

There can be more than one LCS of the same length. The above would print one of them in O(n+m) time. So by DP, we get one LCS in O(n*m) time.

**Edit-Distance / String Mutations**:

Distance between two strings is the

*minimum*mutations required to transform one string to another (aka Levenshtein distance). In below case, to transform x to y, it is 1 (delete first

*B*from x to get y). Applications are spell-checking, plagiarism detection etc.

xm:

**A**

*B*

**AB**

yn:

*A*ABd('' , '') = 0

d(s , '') = d('' , s) = len(s)

d( s1 + ch1, s2 + ch2 )

= min ( d(s1, s2) + d(ch1, ch2) , d(s1 + ch1) + 1, d(s2 + ch2) + 1) d(ch1, ch2) = 1 when ch1 != ch2 else 0

( from here )

Optimal substructure is below:

mutij

= i for i >= 0, j = 0 and j for j >= 0, i = 0

= min( muti-1j-1 + (if xi = yj then 0 else 1), muti-1j + 1 , mutij-1 + 1 )

And mut[m,n] gives the mutations / distance for xm and yn.

` | 0 1 2 3`

| *A* B *C* x

----|---------

0 | __0__ 1 2 3

1 A | __1__ 0 1 2

2 *A* | 2 __1__ 1 2

3 *B* | 3 2 __1__ 2

4 *B* | 4 3 2 __2__

` `*y*

**getDistance**(x, y):
mut[m,n]
for i = 0 to m:
mut[i,0] = i
for j = 0 to n:
mut[0,j] = j
for i = 1 to m:
for j = 1 to n:
mut[i,j] = min(
mut[i-1, j-1] + (x[i] = y[j] ? 0 : 1)
, mut[i-1, j] + 1
, mut[i, j-1] + 1 )

Once we have the mut matrix in O(n*m) time, below function prints the mutations by backtracking from mut[m,n] to mut[0,0] with respect to sequence x in O(m+n) time. There are 3 types of mutations (update in x y, delete in x, delete in y). When backtracking, we check which of the three values - muti-1j-1 (update in x y or no mutation if xi = yj), mutij-1 (delete in y), muti-1j (delete in x) - is same as mutij and print accordingly.

**printMutations**(mut, x, y, i, j):
if i = 0 and j = 0:
return ""
else:
if mut[i-1,j-1] + (x[i] = y[j] ? 0 : 1) = mut[i,j]:
if x[i] = y[j]:
return printMutations(mut, x, y, i-1, j-1)
else:
return printMutations(mut, x, y, i-1, j-1)
+ print update x[i] at i in x
else:
if mut[i,j-1] +1 = mut[i,j]:
return printMutations(mut, x, y, i, j-1)
+ print delete y[j] at j in y
else: //mut[i-1,j] +1 = mut[i,j]
return printMutations(mut, x, y, i-1, j)
+ print delete x[i] at i in x

For x = AABB and y = ABC, it would print:

`delete A at 1 in x`

update B at 4 in x

Again, there can be more than one ways to mutate the sequences.

## No comments:

## Post a Comment