Dynamic Programming
In this tutorial we will learn to find the Longest Common Subsequence for a given sequence using Dynamic Programming.
Given two sequence say "ABCD" and "ACF".
Longest Common Subsequence or LCS is a sequence that appears in the same relative order in both the given sequences but not necessarily in a continuous manner.
LCS for the given sequences is AC and length of the LCS is 2.
Given two sequence say "ABACCD" and "ACDF"
Find Longest Common Subsequence or LCS
Given two sequences:
ABACCD ACDF
^ ^
SAME (so we mark them and move to next letters)
[A]BACCD [A]CDF
^ ^
DIFFERENT (so move to next letter)
[A]BACCD [A]CDF
^ ^
DIFFERENT (so move to next letter)
[A]BACCD [A]CDF
^ ^
SAME (so we mark them and move to next letters)
[A]BA[C]CD [A][C]DF
^ ^
DIFFERENT (so move to next letter)
[A]BA[C]CD [A][C]DF
^ ^
SAME (so we mark them)
[A]BA[C]C[D] [A][C][D]F
So, LCS for the given sequences is ACD and length of the LCS is 3.
Consider following two sequences.
Length (number of characters) of sequence X is XLen = 4
And length of sequence Y is YLen = 3
Now create a Length array L. It will contain the length of the required longest common subsequence. L is a two dimensional array.
Having rows = XLen + 1 = 4+1 = 5
And columns = YLen + 1 = 3+1 = 4
Fill the row 0 with 0 and col 0 with 0.
if (X[r-1] == Y[c-1]) then
L[r][c] = L[r-1][c-1] + 1
else
L[r][c] = max { L[r-1][c], L[r][c-1] }
i.e., if current character in X and Y are same
then they are in the resulting LCS.
After filling up the L array it will look as follows.
To know the length of the longest common subsequence for X and Y we have to look at the value L[XLen][YLen],
i.e., L[4][3] = 3
So, Length of LCS = L[4][3] = 3
Create an array LCS of size 3, this will hold the characters in the LCS for the given two sequences X and Y.
Set r = XLen, c = YLen, i = L[r][c]
while ( r > 0 && c > 0) do
if (X[r-1] == Y[c-1]) then
LCS[i-1] = X[r-1]
i = i - 1, r = r - 1, c = c - 1
else if (L[r-1][c] > L[r][c-1]) then
r = r - 1
else
c = c - 1
endwhile
After filling up the LCS array it will look like the following.
Required Longest Common Subsequence [LCS] is ACF.
#include<stdio.h>
#include<string.h>
int max(int a, int b);
void findLCS(char *X, char *Y, int XLen, int YLen);
int max(int a, int b) {
return (a > b)? a : b;
}
void findLCS(char *X, char *Y, int XLen, int YLen) {
int L[XLen + 1][YLen + 1];
int r, c, i;
/*
* find the length of the LCS
*/
for(r = 0; r <= XLen; r++) {
for(c = 0; c <= YLen; c++) {
if(r == 0 || c == 0) {
L[r][c] = 0;
} else if(X[r - 1] == Y[c - 1]) {
L[r][c] = L[r - 1][c - 1] + 1;
} else {
L[r][c] = max(L[r - 1][c], L[r][c - 1]);
}
}
}
/*
* Print LCS
*/
r = XLen;
c = YLen;
i = L[r][c];
char LCS[i+1];
/*
* setting the NULL character at the end of LCS character array.
* as we know in C programming language, NULL character is used
* to denote the end of a string
*/
LCS[i] = '\0';
while(r > 0 && c > 0) {
if(X[r - 1] == Y[c - 1]) {
LCS[i - 1] = X[r - 1];
i--;
r--;
c--;
} else if(L[r - 1][c] > L[r][c - 1]) {
r--;
} else {
c--;
}
}
//print result
printf("Length of the LCS: %d\n", L[XLen][YLen]);
printf("LCS: %s\n", LCS);
}
int main(void) {
//the two sequences
char X[] = "ABCF";
char Y[] = "ACF";
//length of the sequences
int XLen = strlen(X);
int YLen = strlen(Y);
findLCS(X, Y, XLen, YLen);
return 0;
}
Output
Length of the LCS: 3
LCS: ACF
Time Complexity is O(XLen x YLen).
ADVERTISEMENT