Searching Algorithms

Sunday, April 5, 2015

String Matching Techniques: KMP Algorithm

Code (GeeksForGeeks):

import java.util.Arrays;

class KMPAlgo {
static public int[] getLPS(char[] pattern) {
int m = pattern.length;
int[] lps = new int[m];

int i = 1;
lps[0] = 0;

int len = 0;

while(i < m) {
if(pattern[i] == pattern[len]) {
len++;
lps[i] = len;
++i;
} else {
if(len == 0) {
lps[i] = 0;

++i;
} else
len = lps[len-1];
}
}

return lps;
}

static public void searchForPattern(char[] pattern, char[] text, int[] lps) {
int m = pattern.length, n = text.length;

int i = 0, j = 0;

while(i < n) {
if(pattern[j] == text[i]) {
++i;
++j;
}

if(j == m) {
System.out.println("Pattern found at: " + (i-j));

j = lps[j-1];
}

if(i<n && pattern[j] != text[i]) {
if(j != 0)
j = lps[j-1];
else
++i;
}
}
}

static public void main(String[] args) {
String pattern = "ABABCABAB", text = "ABABDABACDABABCABAB";

int[] lps = getLPS(pattern.toCharArray());

System.out.println(pattern);
System.out.println(text);
System.out.println(Arrays.toString(lps));

searchForPattern(pattern.toCharArray(), text.toCharArray(), lps);
}
}

---------------------------------------------------------

In this algorithm, we first build a Fail/Prefix function with the given pattern to be matched.

Algorithm to build the Fail function:
1. Initialize the initial indices of i and j as: i = 0, j = 1
2. Define the fail function integer array and initialize the the first two elements as -1 and 0.
F[0] = -1 and F[1] = 0
3. Now compare the two characters at indices i and j of Pattern P. Two cases are possible
Case 1: P[i] == P[j]
==>Then, increment fail function value by 1 of previous value, i.e. F[j+1] = F[j] + 1
and increment both i and j by 1
Case 2: P[i] != P[j]
==> Now reinitialize the value of i to 0 and compare the elements again
Case 2 i: P[i] == P[j]
==> Keep the fail function value as previous one, i.e. F[j+1] = F[j]
increment both i and j by 1
Case 2 ii: P[i] != P[j]
==> Keep the fail function value as zero, i.e. F[j+1] = 0
increment only j value.

Observations:
If characters are equal, then we increment both i and j, if not equal, then we increment only j
j value is always incremented

(Source: https://www.youtube.com/watch?v=1k2KDhcO_uo)

Java Program to build the Fail function:

import java.util.Arrays;

class KMPAlgo {
static public int[] getFailFunction(String pattern) {
int i = 0, j = 1;
int[] failFunc = new int[pattern.length()];

char[] P = pattern.toCharArray();

failFunc[0] = -1;
failFunc[1] = 0;

while(j<pattern.length()-1) {
if(P[i] == P[j]) {
//increment element by 1
failFunc[j+1] = failFunc[j] + 1;
++i;
++j;
} else {
//make i to 0 and do the comparision again
i = 0;

if(P[i] == P[j]) {
failFunc[j+1] = failFunc[j];
++i;
++j;
} else {
failFunc[j+1] = 0;
++j;
}
}
}

return failFunc;
}

static public void main(String[] args) {
String pattern = "abaababaabc";
//pattern = "abcabdaab";

int[] failFunc = getFailFunction(pattern);

System.out.println(Arrays.toString(failFunc));
}
}

eg:
Pattern: abaababaabc

> i=0, j=1, F[0] = -1, F[1] = 0
Compare P[i] and P[j]: P[0] and P[1], a != b
So falls into Case 2 and since i is already 0, so we keep F[j+1] = F[1+1] = F[2] = 0
and increment j by 1

> i=0, j=2, F[2] = 0
Compare P[i] and P[j]: P[0] and P[2], a == a
So falls into Case 1, so increment fail function value: F[j+1] = F[2+1] = F[3] = 1
and increment both i and j

> i=1, j=3, F[3] = 1
Compare P[i] and P[j]: P[1] and P[3]: b != a
So falls into Case 2. reinitialize i to 0 and Compare P[i] with P[j]
P[0] and P[3]: a == a
So keep fail function value same as previous: F[j+1] = F[j]: F[4] = F[3] = 1
and increment both i and j

> i=2, j=4, F[4] = 1

...

Saturday, September 6, 2014

Binary Search: Searching Algorithm

The prerequisite for Binary search is that the input array elements must be in Sorted order.

Binary search maintains a contiguous subsequence of the sorting sequence where the target value is surely located. This is called "Search space".
The search space is initially the entire sequence.

At each step, the algorithm compares the median value in the search space to the target value.
Based on the comparison, and because the sequence is sorted, it can then eliminate half of the search space. By doing this repeatedly, it will eventually be left with a search space consisting of a single element, the target value.

Java code:

import java.util.Arrays;

class BinarySearch {
static public int binarySearch(int[] in, int low, int high, int key) {
if(low > high)
return -1;

int mid = (low + high)/2;

if(in[mid] == key)
return mid;
else if(key < in[mid])
return binarySearch(in, low, mid-1, key);
else
return binarySearch(in, mid+1, high, key);
}

static public int nonRecBinarySearch(int[] in, int key) {
int low = 0, high = in.length-1;
int mid = -1;

while(low <= high) {
mid = (low + high)/2;

if(in[mid] == key)
break;
else if(key < in[mid])
high = mid - 1;
else
low = mid + 1;
}

return mid;
}

static public void main(String[] args) {
int[] in = {1, 2, 3, 5, 9, 12, 19, 25, 32};

int pos = nonRecBinarySearch(in, 25);

System.out.println(Arrays.toString(in));
System.out.println("Position of: " + 25 + " is: " + pos);
}
}

Notes:
If the array is not sorted, then Linear search is better compared to binary search.
In Binary search, after each iteration, the length of the array gets cut into half.

Efficiency:

Worst Case:
To get the worst case complexity of Binary search, we need to check the number of iterations that are performed before "low>high"
After 1 st iteration : n/2 items remaining
After 2nd iteration : n/4 items remaining
.
.
.
After kth iteration : n/2^k items remaining

The last iteration occurs when n/2^k >= 1 and n/2^(k+1) < 1 items remaining
n/2^k >=1
=> n >= 2^k
=> log(n) >= log(2^k)
k<=log(n)
Worst case running time = O(log n)

Best Case:
Best case in binary search happens when the search element is in the middle of the array, so that it can be found in the first attempt itself.

Number of comparisons = 1

Therefor, Worst case time Complexity = O(1)

Average Case:
//TODO:

Linear Search: Searching Algorithm

The aim of any search algorithm is to determine in a given value appears within a sequence of n values.

When the values are unsorted, the standard approach to solve this problem is to compare each element, with the given value until either find the element or you have examined all the elements.

Code:

class LinearSearch {
static public void linearUnordSearch(int[] in, int key) {
for(int i=0; i<in.length; i++) {
if(in[i] == key) {
System.out.println("Element: " + key + ", found at index: " + i);
return;
}
}

System.out.println("Element: " + key + " not found");
}

static public void linearOrdSearch(int[] in, int key) {
for(int i=0; i<in.length; i++) {
if(in[i] > key)
break;

if(in[i] == key) {
System.out.println("Element: " + key + ", found at index: " + i);
return;
}
}

System.out.println("Element: " + key + " not found");
}

static public void main(String[] args) {
int[] ar = {8, 2, 5, 1, 7, 4, 0};

// linearUnordSearch(ar, 7);
// linearUnordSearch(ar, 10);

linearOrdSearch(new int[]{1, 3, 4, 6, 7, 9}, 5);
linearOrdSearch(new int[]{1, 3, 4, 6, 7, 9}, 7);
}

}

Efficiency:

1. Worst Case:
For Linear Search, the worst case happens when the element to be searched is not present in the input array at all.
When element is not present, comparison is done on all elements of the array.
Therefore, for n number of input elements in the array,
Worst case time Complexity = O(n)

2. Average Case:
Possible cases = (at position 1), (at position 2), ..., (at position n-1), (at position n), not present
So total number of possible cases = n+1

Therefore,

3. Best Case:
Here the best case occurs when the required element is present at the first location of the given input array.
The number of operations in the best case is constant, i.e., independent of number of input elements.
Therefore Best case time Complexity = O(1)

Introduction to Searching Algorithms

An Algorithm is a step by step process to find the solution for a problem.

A Search algorithm is an algorithm to find an item with specified properties among a collection of objects(usually similar type of objects).

Measuring the efficiency of an algorithm:
3 types:

1. Worst Case:
This is the case that causes maximum number of operations to be executed.
Here we calculate "Upper Bound" on running time of the algorithm.

2. Average Case:
We need to take all possible inputs and calculate computing time for all of the inputs. Then sum all the calculated values and divide the sum by total number of inputs.
We must know or predict the distribution of all cases here.

3. Best Case:
Here we calculate "Lower Bound" on running time of an algorithm.
This is the case that causes minimum number of operations to be executed.