Implementation Of Diamond Search (DS) Algorithm For Motion Estimation Using MATLAB.

(1)

IMPLEMENTATION OF DIAMOND SEARCH (DS) ALGORITHM FOR MOTION ESTIMATION USING MATLAB

SITI HAJAR BINTI AHMAD

This report is submitted in partial fulfillment of the requirements for the award of Bachelor of Electronic Engineering (Telecommunication Electronics) With Honours

Faculty of Electronic and Computer Engineering Universiti Teknikal Malaysia Melaka


(2)

UNIVERSTI TEKNIKAL MALAYSIA MELAKA

FAKULTI KEJURUTERAAN ELEKTRONIK DAN KEJURUTERAAN KOMPUTER

BORANG PENGESAHAN STATUS LAPORAN PROJEK SARJANA MUDA II

Tajuk Projek :

……… ……… Sesi Pengajian : ………

Saya ………..

(HURUF BESAR)

mengaku membenarkan Laporan Projek Sarjana Muda ini disimpan di Perpustakaan dengan syarat-syarat kegunaan seperti berikut:

1. Laporan adalah hakmilik Universiti Teknikal Malaysia Melaka. 2. Perpustakaan dibenarkan membuat salinan untuk tujuan pengajian sahaja.

3. Perpustakaan dibenarkan membuat salinan laporan ini sebagai bahan pertukaran antara institusi pengajian tinggi. 4. Sila tandakan ( ) :

SULIT*

(Mengandungi maklumat yang berdarjah keselamatan atau kepentingan Malaysia seperti yang termaktub di dalam AKTA RAHSIA RASMI 1972)

TERHAD* (Mengandungi maklumat terhad yang telah ditentukan oleh organisasi/badan di mana penyelidikan dijalankan)

TIDAK TERHAD

Disahkan oleh:

__________________________ ___________________________________

(TANDATANGAN PENULIS) (COP DAN TANDATANGAN PENYELIA)

Alamat Tetap:49, TMN SERI WANGSA, GUAR CHEMPEDAK, 08800 GURUN, KEDAH

Tarikh: 30 APRIL 2009

Tarikh: 30 APRIL 2009

IMPLEMENTATION OF DIAMOND SEARCH (DS) ALGORITHM FOR MOTION ESTIMATION USING MATLAB

2005/2009


(3)

―I hereby declare that this report is the result of my own work except for quotes as cited in the reference‖

Signature :

Author : SITI HAJAR BINTI AHMAD Date : 30 APRIL 2009


(4)

―I hereby declare that I have read this report and in my opinion this report is sufficient in terms of the scope and quality for the award of Bachelor of Electronic

Engineering (Telecommunication Electronics) With Honours‖

Signature :

Supervisors‘s Name : REDZUAN BIN ABD. MANAP


(5)

To

My loving parents,

En. Ahmad bin Hj. Md. Noh and Pn. Eshah bt. Md. Arof My brother and sisters


(6)

ACKNOWLEDGEMENT

First and foremost, I would like to express my gratitude to Allah for giving me wisdom and guidance throughout my life and providing me the blessings to complete this work.

This project would not have been possible without the support of many people. I would like to give a sincere thank to my supervisor, En. Redzuan bin Abd. Manap who was abundantly helpful and offered invaluable assistance, support and guidance to me.

Special thanks also to my friends, especially Hashela, Irwan, Elya and Ada for sharing the literature and similar research interest and invaluable assistance. I also would like to thank all my BENT classmates especially Liyana and Wani for supporting and encouraging me during this research.

Finally, I would like to thank my dearest family for giving me fully support on daily cares, finance, for their understanding and endless love, throughout the duration of my studies.


(7)

ABSTRACT

The aim of this project is to implement a Diamond Search (DS) algorithm which is one type of Block Matching Algorithm (BMA) for block motion estimation in video compression by using MATLAB. In block motion estimation, search patterns with different shapes or sizes of motion vector distribution have a large impact on the searching speed and quality of performance. DS algorithm employs two search patterns which are large diamond search pattern (LDSP) and small diamond search pattern (SDSP). DS algorithm will finds the small motion vector with fewer search points. Simulation results demonstrate that the proposed DS algorithm achieves close performance but requires less computation complexity compared to Full Search (FS), New Three Step Search (NTSS), Four Step Search (4SS), Cross Search (CS) and Cross Diamond Search (CDS) algorithm. Experimental results also show that the DS algorithm is better than FS, NTSS, 4SS and CS in terms of required number of search points.


(8)

ABSTRAK

Projek ini bertujuan untuk melaksanakan algoritma pencarian berlian di mana ianya adalah salah satu jenis algoritma penyesuaian blok untuk penganggaran gerakan blok di dalam kemampatan video dengan menggunakan MATLAB. Di dalam penganggaran gerakan blok, corak pencarian dengan bentuk dan saiz berbeza bagi pengagihan gerakan vektor akan memberi kesan ke atas kelajuan dan juga kualiti pencapaian. Algoritma bentuk belian menggunakan dua jenis corak pencarian iaitu corak pencarian berlian besar dan corak pencarian berlian kecil. Algoritma bentuk berlian akan mencari gerakan vektor yang kecil dengan titik pencarian yang sedikit. Keputusan dari simulasi menunjukkan algortima bentuk berlian mempunyai pencapaian yang hampir sama namun memerlukan kompleksiti pengiraan yang lebih sedikit jika dibandingkan dengan algoritma carian penuh (FS), tiga langkah carian baru (NTSS), empat langkah carian (4SS), carian bentuk silang (CS) dan juga carian bentuk silang-berlian (CDS). Keputusan juga menunjukkan bahawa carian bentuk berlian lebih bagus daripada algoritma algoritma FS, NTSS, 4SS, CS dari segi bilangan titik carian.


(9)

TABLE OF CONTENTS

CHAPTER TITLE PAGE

PROJECT TITLE VERIFYING FORM DECLARATION SUPERVISOR APPROVAL DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK

TABLE OF CONTENTS LIST OF TABLE

LIST OF FIGURE

LIST OF ABBREVIATIONS LIST OF APPENDICES

i ii iii iv v vi vii viii ix xii xiii xv xvii

I INTRODUCTION

1.1 Introduction 1.2 Objective

1.3 Problem Statement 1.4 Project Scope 1.5 Thesis Structure

1 2 2 2 3


(10)

II LITERATURE REVIEW

2.1 Video Compression and Coding Techniques 2.2 Hybrid Video Compression System

2.3 Motion Estimation 2.4 Motion Compensation 2.5 Motion Vector

2.6 Block Matching Algorithm 2.6.1 Full Search Algorithm

2.6.2 The New Three Step Search Algorithm 2.6.3 The Four Step Search Algorithm 2.6.4 Cross Search Algorithm

2.6.5 Cross Diamond Search Algorithm 2.7 MATLAB

2.7.1 M-Files 2.8 Video Sequence

4 6 7 9 10 11 14 14 16 20 23 25 26 26

III METHODOLOGY

3.1 Literature Review 3.2 Software Development

3.2.1 Upload the video sequence in MATLAB 3.2.2 Extraction of video into frames

3.2.3 Block Construction

3.2.4 Implementation of Block Matching Algorithm 3.2.5 Predicted frame construction

3.2.6 Performance Analysis

29 29 30 30 30 31 31 32

IV DIAMOND SEARCH ALGORITHM

4.1 Introduction

4.2 Diamond Search Algorithm

33 35


(11)

V RESULT AND DISCUSSION

5.1 First Stage Analysis 5.2 Second Stage Analysis

5.3 Image for original and predicted frames in difference BMAs

5.4 Comments on DS algorithm

40 45 51

54

VI CONCLUSION AND SUGGESTIONS

6.1 Conclusion

6.2 Suggestion for future work

55 56

REFERENCES 58


(12)

LIST OF TABLE

NO TITLE PAGE

2.1 Types of video file 27

5.1 Comparison between different BMAs in term of PSNR 41

5.2 Comparison between different BMAs in term of search points 43

5.3 Comparison between different BMAs in term of PSNR 45

5.4 Characteristics of video file 46

5.5 Comparison between different BMAs in term of search points 48


(13)

LIST OF FIGURE

NO TITLE PAGE

2.1 Image and video compression for visual transmission and storage 5 2.2 Encoder Block Diagram of a Typical Block-Based Hybrid 7 2.3 Motion Estimation for Reference and Predicted Frame 9 2.4 Motion estimation and motion vector 11 2.5 The current and previous frames in a search window 12 2.6(a) Example of NTSS Algorithm 15 2.6(b) Block diagram of the NTSS algorithm 16

2.7 Search patterns of the 4SS 18

2.8 Flowchart of 4SS algorithm 19 2.9 An example of the CSA search for w=8pels/frame 21

2.10 Flowchart of CS algorithm 22

2.11 Search patterns used in the CDS algorithm 24 2.12 Flowchart of the CDS algorithm 25


(14)

3.2 Example of blocks in one frame which have 99 blocks 31 3.3 Example of one block taken out from the frame and perform

block matching algorithm using motion estimation 32 4.1 An appropriate search pattern support 36 4.2 Two search pattern derieved from Fig. 4.1 36 4.3 Three cases of checking-point overlapping in LDSP 37

4.4 Search path example 37

4.5 Flowchart of DS algorithm 39 5.1 The graph for PSNR vs the number of frame for ‗Akiyo‘ sequence 42 5.2 The graph for PSNR vs the number of frame for ‗Foreman‘ sequence 43 5.3 The graph for search points vs number of frame for ‗Akiyo‘ sequence 44 5.4 The graph for search points vs number of frame for ‗Foreman‘

sequence 45

5.5 The graph for PSNR vs the number of frame for ‗Akiyo‘ sequence 47 5.6 The graph for PSNR vs the number of frame for ‗Foreman‘ sequence 47 5.7 The graph for search points vs number of frame for ‗Akiyo‘ sequence 49 5.8 The graph for search points vs number of frame for ‗Foreman‘

sequence 49


(15)

LIST OF ABBREVIATIONS

4SS - Four Step Search

BBGDS - Block-Based Gradient Descent Search BDM - Block Distortion Measure

BMA - Block Matching Algorithm CCB - Cross-Center Biased CDS - Cross Diamond Search CS - Cross Search

CSP - Cross Shape Pattern DCT - Discrete Cosine Transform DS - Diamond Search

ES - Exhaustive Search FS - Full Search

JPEG - Joint Photographic Experts Group LDSP - Large Diamond Search Pattern MAD - Mean Absolute Difference MATLAB - Matrix Laboratory

MB - Macroblock

MBD - Minimum Block Distortion ME - Motion Estimation


(16)

MSE - Mean Squared Error MV - Motion Vector

NCCF - Normalized Cross-Correlation Function NTSS - New Three Step Search

PSNR - Peak Signal-to-Noise Ratio

QCIF - Quarter Common Intermediate Format SAD - Sum-of-Absolute Difference

SDSP - Small Diamond Search Pattern SSD - Sum of square error

TSS - Three Step Search VLC - Variable Length Coding


(17)

LIST OF APPENDIX

NO TITLE PAGE


(18)

CHAPTER 1

INTRODUCTION

1.1Introduction

To achieve high compression ratio in video coding, a technique known as Block Matching Motion Estimation has been widely adopted in various coding standards. This technique is implemented conventionally by exhaustively testing all the candidate blocks within the search window. This type of implementation, called Full Search (FS) Algorithm, gives the optimum solution. However, substantial amount of computational workload is required in this algorithm. To overcome this drawback, many fast Block Matching Algorithms (BMA‘s) have been proposed and developed. Different search patterns and strategies are exploited in these algorithms in order to find the optimum motion vector with minimal number of required search points. One of these fast BMA‘s, which is proposed to be implemented in this project, is called Diamond Search (DS) Algorithm.


(19)

1.1 Objective

The objective of this project is to implement the DS algorithm in MATLAB and to compare its performance to FS algorithm as well as to other fast BMAs which are New Three Step Search (NTSS), Four Step Search (4SS), Cross Search (CS) and Cross Diamond Search (CDS).

1.2 Problem Statement

A substantial amount of computational workload is required during the execution of Full Search algorithm; however this drawback can be overcome by many types of fast BMAs which have been proposed and developed. Different types of search patterns and strategies are exploited in these fast BMAs‘ algorithms in order to find the optimum motion vector with minimal number of required search point. However, there is a need to determine which of these fast BMAs perform the best as well as to identify their suitable characteristic when implemented on different type of video sequences.

1.3 Project Scope

This project will focus on 3 main areas which first include literature review on video coding, BMAs and DS Algorithm. Second is the development and implementation of DS algorithm using MATLAB platform. Last but not least is the performance analysis of DS algorithm to Full Search FS algorithm and DS algorithm to other BMAs‘.


(20)

1.4 Thesis Structure

Chapter 1 will discuss on the introduction. It provides information regarding to the project background, objectives, scope of project.

Chapter 2 is about the literature review on video compression and coding technique. It will also cover an overall view on the other BMAs and motion estimation.

Chapter 3 discusses methodology. It will describe the method employed in this project that is using MATLAB. It will be started by first uploading video technique, then extract the video into frames as well as the implementation of Block Matching Algorithm (BMA) and the analysis.

Chapter 4 will cover on the DS algorithm in details. Here, all the theory about this algorithm such as the step of this algorithm is further explained and described.

Chapter 5 is the result and discussion. DS algorithm is compared against five other BMAs which are FS, 4SS, NTSS, CS and CDS in term of PSNR, search points and speed up ratio.

Chapter 6 is conclusion and suggestion. It this chapter, this thesis will be concluded with a critical review and recommendations for possible future work.


(21)

CHAPTER 2

LITERATURE REVIEW

2.1 Video Compression and Coding Techniques

Video compression coding is the enabling technology behind a new wave of communication applications. From streaming internet video to broadcast digital television and digital cinema, the video codec is a key building block for a host of new multimedia applications and services. Video data compression is a process in which the amount of data used to represent video and image is reduced to meet a bit rate requirement, while the quality of the reconstructed image or video satisfies a requirement for a certain application and the complexity of computation involved is affordable for the application.

The block diagram in the Figure 2.1 below shows the functionality of image and video data compression in visual transmission and storage. Image and video data


(22)

compression has been found to be necessary in these important applications because the huge amount of data involved in these and other applications usually greatly exceeds the capability of today‘s hardware despite rapid advancements in the semiconductor, computer and other related industries.

Figure 2.1 Image and video compression for visual transmission and storage

It is noted that information and data are two closely related yet different concepts. Data represent information, and the quantity of data can be measured. In the context of digital image and video, data are usually measured by the number of binary units (bits). The required quality of the reconstructed image and video is application dependent. In medical diagnoses and some scientific measurements, the reconstructed image and video may be needed to mirror the original image and video. In other words, only reversible, information-preserving schemes are allowed. This type of compression is referred to as lossless compression. In applications such as motion pictures and television, a certain amount of information loss is allowed. This type of compression is called lossy compression.

From its definition, one can see that image and video data compression involves several fundamental concepts including information, data, visual quality of image and video, and computational complexity.

The key ideas in video coding techniques is predict a new frame from a previous frame and only code the prediction error or known as inter prediction. The prediction error will be coded using an image coding method such as Discrete Cosine Transform (DCT) which based as in Joint Photographic Experts Group

Image and Video Compression Transmission and Storage Data Reconstruction or Data Retrieval


(23)

(JPEG). Prediction error will have smaller energy than the original pixel values and can be coded with fewer bits. While those regions that cannot be predicted well will be coded directly using DCT that based in intra coding or intra prediction. Predicting a current block from previously coded blocks in the same frame is called intra prediction. The most popular video coding method is known as hybrid video compression system.

2.2 Hybrid Video Compression System [1]

The most widely accepted video compression standard, such as MPEG-1, MPEG-2 and H.263, adopt a hybrid video compression approach. The approach is ‗Hybrid‘ because spatial redundancies in a video sequence are removed by transform-based method and temporal redundancies are removed by motion compensation prediction. Figure 2.2 shows a system overview of a video encoder employing hybrid video compression approach.

The spatial redundancy of a video frame in the video sequence is exploited by a transform-based coding. The input video frame is transformed using (DCT). To reduce the computation complexity, the video frame is divided into a grid of 8x8 blocks and are transformed to 8x8 blocks of transform coefficients. After the transformation, most of the video signal energy is concentrated in the coefficients corresponding to the lower frequencies. Quantization is then applied to the coefficients to reduce the number of bits required to represent the coefficients. There are different quantization schemes specified in the video compression standards.

Generally speaking all schemes try to retain most information in the energy-concentrated lower frequency coefficients with small quantization steps and minimize the bit required to store the lower energy higher frequency coefficients with large quantization steps. Quantization of transform coefficients leads to block of data with a few non-zero coefficients corresponding to a low frequency


(24)

Motion Estimation Motion Compensation Previous Frame memory IDCT Inverse Quantization DCT Quantization Control (inter/intra) VLC -+ + -Input Block 0 Coded bit stream Quantized Coefficient Side Information Motion Vector Reconstructed block

components and long chains of zeros corresponding to high frequency components. The block of data is coded with variable length coding (VLC) scheme to minimize the number of bits to be transmitted or stored.

Figure 2.2 Block Diagram of a Typical Block-Based Hybrid Encoder

It works on each macro block (MB) which is 16 x 16 pixels independently for reduced complexity. Motion compensation are done at the MB level while DCT coding of error at the block level which is 8 x 8 pixels. Finally, the encoded bit stream is sent to the video multiplex along with the coded motion vector information. As described in the JPEG description, the quantizer‘s step sizes can be adjusted based on desired picture quality and coding efficiency.

2.3 Motion Estimation

Motion Estimation (ME) is an important part of any video compression system, since it can achieve significant compression by exploiting the temporal redundancy existing in a video sequence. Unfortunately it is also the most


(1)

1.1 Objective

The objective of this project is to implement the DS algorithm in MATLAB and to compare its performance to FS algorithm as well as to other fast BMAs which are New Three Step Search (NTSS), Four Step Search (4SS), Cross Search (CS) and Cross Diamond Search (CDS).

1.2 Problem Statement

A substantial amount of computational workload is required during the execution of Full Search algorithm; however this drawback can be overcome by many types of fast BMAs which have been proposed and developed. Different types of search patterns and strategies are exploited in these fast BMAs‘ algorithms in order to find the optimum motion vector with minimal number of required search point. However, there is a need to determine which of these fast BMAs perform the best as well as to identify their suitable characteristic when implemented on different type of video sequences.

1.3 Project Scope

This project will focus on 3 main areas which first include literature review on video coding, BMAs and DS Algorithm. Second is the development and implementation of DS algorithm using MATLAB platform. Last but not least is the performance analysis of DS algorithm to Full Search FS algorithm and DS algorithm to other BMAs‘.


(2)

1.4 Thesis Structure

Chapter 1 will discuss on the introduction. It provides information regarding to the project background, objectives, scope of project.

Chapter 2 is about the literature review on video compression and coding technique. It will also cover an overall view on the other BMAs and motion estimation.

Chapter 3 discusses methodology. It will describe the method employed in this project that is using MATLAB. It will be started by first uploading video technique, then extract the video into frames as well as the implementation of Block Matching Algorithm (BMA) and the analysis.

Chapter 4 will cover on the DS algorithm in details. Here, all the theory about this algorithm such as the step of this algorithm is further explained and described.

Chapter 5 is the result and discussion. DS algorithm is compared against five other BMAs which are FS, 4SS, NTSS, CS and CDS in term of PSNR, search points and speed up ratio.

Chapter 6 is conclusion and suggestion. It this chapter, this thesis will be concluded with a critical review and recommendations for possible future work.


(3)

CHAPTER 2

LITERATURE REVIEW

2.1 Video Compression and Coding Techniques

Video compression coding is the enabling technology behind a new wave of communication applications. From streaming internet video to broadcast digital television and digital cinema, the video codec is a key building block for a host of new multimedia applications and services. Video data compression is a process in which the amount of data used to represent video and image is reduced to meet a bit rate requirement, while the quality of the reconstructed image or video satisfies a requirement for a certain application and the complexity of computation involved is affordable for the application.

The block diagram in the Figure 2.1 below shows the functionality of image and video data compression in visual transmission and storage. Image and video data


(4)

compression has been found to be necessary in these important applications because the huge amount of data involved in these and other applications usually greatly exceeds the capability of today‘s hardware despite rapid advancements in the semiconductor, computer and other related industries.

Figure 2.1 Image and video compression for visual transmission and storage

It is noted that information and data are two closely related yet different concepts. Data represent information, and the quantity of data can be measured. In the context of digital image and video, data are usually measured by the number of binary units (bits). The required quality of the reconstructed image and video is application dependent. In medical diagnoses and some scientific measurements, the reconstructed image and video may be needed to mirror the original image and video. In other words, only reversible, information-preserving schemes are allowed. This type of compression is referred to as lossless compression. In applications such as motion pictures and television, a certain amount of information loss is allowed. This type of compression is called lossy compression.

From its definition, one can see that image and video data compression involves several fundamental concepts including information, data, visual quality of image and video, and computational complexity.

The key ideas in video coding techniques is predict a new frame from a previous frame and only code the prediction error or known as inter prediction. The prediction error will be coded using an image coding method such as Discrete Cosine Transform (DCT) which based as in Joint Photographic Experts Group

Image and Video Compression

Transmission and Storage

Data Reconstruction

or Data Retrieval


(5)

(JPEG). Prediction error will have smaller energy than the original pixel values and can be coded with fewer bits. While those regions that cannot be predicted well will be coded directly using DCT that based in intra coding or intra prediction. Predicting a current block from previously coded blocks in the same frame is called intra prediction. The most popular video coding method is known as hybrid video compression system.

2.2 Hybrid Video Compression System [1]

The most widely accepted video compression standard, such as MPEG-1, MPEG-2 and H.263, adopt a hybrid video compression approach. The approach is ‗Hybrid‘ because spatial redundancies in a video sequence are removed by transform-based method and temporal redundancies are removed by motion compensation prediction. Figure 2.2 shows a system overview of a video encoder employing hybrid video compression approach.

The spatial redundancy of a video frame in the video sequence is exploited by a transform-based coding. The input video frame is transformed using (DCT). To reduce the computation complexity, the video frame is divided into a grid of 8x8 blocks and are transformed to 8x8 blocks of transform coefficients. After the transformation, most of the video signal energy is concentrated in the coefficients corresponding to the lower frequencies. Quantization is then applied to the coefficients to reduce the number of bits required to represent the coefficients. There are different quantization schemes specified in the video compression standards.

Generally speaking all schemes try to retain most information in the energy-concentrated lower frequency coefficients with small quantization steps and minimize the bit required to store the lower energy higher frequency coefficients with large quantization steps. Quantization of transform coefficients leads to block of data with a few non-zero coefficients corresponding to a low frequency


(6)

Motion Estimation Motion Compensation Previous Frame memory IDCT Inverse Quantization DCT Quantization Control (inter/intra) VLC -+ + -Input Block 0 Coded bit stream Quantized Coefficient Side Information Motion Vector Reconstructed block

components and long chains of zeros corresponding to high frequency components. The block of data is coded with variable length coding (VLC) scheme to minimize the number of bits to be transmitted or stored.

Figure 2.2 Block Diagram of a Typical Block-Based Hybrid Encoder

It works on each macro block (MB) which is 16 x 16 pixels independently for reduced complexity. Motion compensation are done at the MB level while DCT coding of error at the block level which is 8 x 8 pixels. Finally, the encoded bit stream is sent to the video multiplex along with the coded motion vector information. As described in the JPEG description, the quantizer‘s step sizes can be adjusted based on desired picture quality and coding efficiency.

2.3 Motion Estimation

Motion Estimation (ME) is an important part of any video compression system, since it can achieve significant compression by exploiting the temporal redundancy existing in a video sequence. Unfortunately it is also the most