Implementation Of Cross Search Algorithm For Motion Estimation Using Matlab.

(1)

(2)

i

IMPLEMENTATION OF CROSS SEARCH (CS) ALGORITHM FOR MOTION ESTIMATION USING MATLAB

RAUDZATUL ADAWIAH BINTI YUNOS

This report is submitted in partial fulfillment of the requirements for the award of Bachelor of Electronic Engineering (Telecommunication Electronics) With Honours

Faculty of Electronic and Computer Engineering Universiti Teknikal Malaysia Melaka


(3)

ii

DECLARATION

I hereby, declared this thesis entitled “Implementation of Cross Search (CS) Algorithm for Motion Estimation using MATLAB” is the results of my own research

except as cited in references.

Signature : ……….

Author‟s Name : RAUDZATUL ADAWIAH BINTI YUNOS


(4)

iii


(5)

iv

ACKNOWLEDGEMENTS

Alhamdulillah, praise to God, with the deepest sense of gratitude of the Almighty ALLAH who gives strength and ability to complete this project and thesis. First of all, I would like to thank my family who has constantly been supportive throughout my studies. I would like to express my sincere appreciation to my project supervisor, Mr. Redzuan bin Abd. Manap for his support, advice and guidance in completing this project. Finally, I would like to thank all my friends who have given me a lot of guidance and cooperation to complete this project.


(6)

v

ABSTRACT

This thesis presents the study of techniques to achieve high compression ratio in video coding. One of these techniques known as Block Matching Algorithm (BMA) for Motion Estimation has been widely adopted in various coding standards. This technique is implemented conventionally by exhaustively testing all the candidate blocks within the search window. This type of implementation, called Full Search (FS) Algorithm, gives the optimum solution. However, substantial amount of computational workload is required in this algorithm. To overcome this drawback, many fast BMAs have been proposed and developed. Different search patterns and strategies are exploited in these algorithms in order to find the optimum motion vector with minimal number of required search points. The objective of this project is to study one of these fast BMA‟s which is called Cross Search (CS) Algorithm. The working concept of CS is taking less time than the FS. It is because; the search window will only search some area in the frame around the reference points due to the algorithm itself. To make it works, the algorithm is implemented in MATLAB and then its performance is compared against FS algorithm as well as to other fast BMA‟s in terms of the average peak signal-to-noise ratio (PSNR) produced, number of search points required, computational complexity and elapse processing time.


(7)

vi

ABSTRAK

Projek ini adalah merupakan kajian mengenai salah satu teknik pengkodan video ataupun dikenali sebagai Algoritma Padanan Blok (Block Matching Algorithm). Dalam kajian ini, tumpuan diberikan kepada beberapa aspek utama iaitu kajian mengenai BMA secara amnya, teknik terawal yang digunakan dalam BMA adalah Full Search Algorithm (FS). Algoritma di dalam teknik padanan FS merangkumi pencarian bagi setiap koordinat di dalam setiap tetingkap bagi sesuatu video. Proses ini akan mengambil masa yang lama untuk mendapatkan hasil sebelum video dapat dipadankan kerana proses pengiraan ralat yang banyak perlu dilakukan. Objektif kajian ini adalah untuk mengkaji potensi kaedah Cross Search Algorithm untuk menggantikan teknik FS sebelum ini. Segala proses bagaimana algoritma CS bekerja telah dikaji. Tidak seperti teknik FS, kaedah CS hanya melibatkan beberapa kawasan carian sahaja untuk mengenalpasti kedudukan ralat di dalam setiap tetingkap. Kaedah carian CS telah diaplikasi mengunakan perisian MATLAB dan prestasi alroritma ini dibandingkan dengan FS serta algoritma-algoritma lain yang disenaraikan di dalam laporan ini dari segi nisbah puncak isyarat terhadap hingar (PSNR), purata titik carian, kerumitan pengiraan dan masa pemprosesan algoritma.


(8)

vii

CONTENTS

CHAPTER TITLE PAGES

PROJECT TITLE i

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

ABSTRAK vi

CONTENTS vii

LIST OF TABLE ix

LIST OF FIGURE x

LIST OF ACRONYMS xii

1 INTRODUCTION

1.1 Project Background 1

1.2 Objective Project 2

1.3 Problem Statement 2

1.4 Scope of Project 3

2 LITERATURE REVIEW


(9)

viii

2.1.1 Introduction on Video Compression 4

2.1.2 Coding Technique 5

2.1.3 Video 6

2. 2 Motion Estimation 6

2.2.1 Identifies the True Motion 7

2.2.2 Removing Temporal Redundancy 8

2.3 Block Matching Algorithm 9

2.4 Searching Method 10

2.4.1 Full Search Algorithm 11

2.4.2 New Three Steps Search (NTSS) 12

2.4.3 Diamond Search (DS) 13

2.4.4 Cross Diamond Search (CDS) 16

2.4.5 Four Step Search (FSS) 19

3 METHODOLOGY

3.1 Project Planning 22

3.1.2 Data Acquisition on Literature Review 23

3.1.3 Development and Implementation 23

3.1.4 Performance Analysis 24

3.1.5 Presentation and Seminar Matlab

24

3.1.6 Thesis Writing Submission 25

3.2 Project Flow Chart 25

4 CROSS SEARCH ALGORITHM (CS)

4.1 Introduction to Cross Search Algorithm 26

4.2 CS Steps and Method of Search 27

4.4 CS Flowchart 31


(10)

ix

5 RESULTS AND DISCUSSIONS

5.1 Performance of CS for single frame sequence 32

5.1.1 Akiyo sequence for frame no. 1 to no. 2. 33 5.1.2 Claire sequence for frame no. 1 to no. 2. 33 5.1.3 Coastguard sequence for frame no. 1 to no. 2. 34 5.1.3 Foreman sequence for frame no. 1 to no. 2. 34 5.1.5 News sequence for frame no. 1 to no. 2. 35 5.1.6 Salesman sequence for frame no. 1 to no. 2. 35 5.1.7 Tennis sequence for frame no. 1 to no. 2. 36 5.2 Average Search Points and PSNR for 1 frame sequence. 37

5.3 Comparison of CS Against all Algorithms 38

5.3.1 Average Search Points for all Algorithms 39

5.3.2 Average PSNR for all Algorithms 44

5.3.3 Elapse Time for all Algorithms 49

5.3.4 Search Points Speed 49

6 CONCLUSION 51

7 REFERENCES 52


(11)

x

LIST OF TABLES

NO TITLE PAGE

5.1 Average PSNR and Search Points of CSA for 1 Frame 38 5.2 Average Search Points for 1st to 30th frame 43

5.3 Average PSNR for all Algorithms 47

5.4 Elapse Time for 1-30 frames simulation (s) 49


(12)

xi

LIST OF FIGURES

NO TITLE PAGES

2.1 Video Coding Layer 5

2.2 Predictive sources coding with motion compensation 8

2.3 Macro Block 9

2.4 NTSS Flowchart 13

2.5 Steps of DS 15

2.6 DS Flowchart 15

2.7 CDS steps 17

2.8 CDS Flowchart 18

2.9 Search patterns of the 4SS. 20

2.10 Two different search paths of 4SS. 20

2.11 4SS Flowchart 21

3.1 Flow of the Project 25

4.1 Illustration 1 for CSA steps 29

4.2 Illustration 2 for CSA steps 29

4.3 Illustrations 3 for CSA 30

4.4 CSA Flowchart 31

5.1

(a) Original Image 33

(b) Predicted Image 33

5.2


(13)

xii

(b) Predicted Image 33

5.3

(a) Original Image 34

(b) Predicted Image 34

5.4

(a) Original Image 34

(b) Predicted Image 34

5.5

(a) Original Image 35

(b) Predicted Image 35

5.6

(a) Original Image 35

(b) Predicted Image 35

5.7

(a) Original Image 36

(b) Predicted Image 36

5.8 Average Search Points for Akiyo (1-30) 39

5.9 Average Search Points for Claire (1-30) 39

5.10 Average Search Points for Coastguard (1-30) 40

5.11 Average Search Points for Foreman (1-30) 40

5.12 Average Search Points for News (1-30) 41

5.13 Average Search Points for Salesman (1-30) 41

5.14 Average Search Points for Tennis (1-30) 42

5.15 Average PSNR (dB) for Akiyo sequence 44

5.16 Average PSNR (dB) for Claire sequence 45

5.17 Average PSNR (dB) for Coastguard sequence 45

5.18 Average PSNR (dB) for Foreman sequence 46

5.19 Average PSNR (dB) for Salesman sequence 46


(14)

xiii

LIST OF ACRONYMS

AVI - Audio Video Interleave WMV - Windows Media Format MPEG - Moving Pictures Expert Group BDM – Block Distortion Measure BMA – Block Matching Algorithm CCB – Cross Centre Biased

CCITT – International Telegraph & Telephone Consultative Committee CDS – Cross Diamond Search

CS – Cross Search

DCT – Discrete Cosine Transform DS – Diamond Search

FS – Full Search

FSS – Four Step Search GOP – Group Of Picture

IDCT – Inverse Discrete Cosine Transform JPEG – Joint Photographic Experts Group LDSP – Large Diamond Search Pattern LSI – Large Scale Integration

MAC – Media Access Control MAD – Mean Absolute Difference


(15)

xiv

MAE – Mean Absolute Error MBD – Minimum Block Distortion ME – Motion Estimation

MPEG – Moving Picture Expert Group MSE – Mean Square Error

MV- Motion Vector

NTSS – New Three Step Search PC – Personal Computer

PSNR – Peak Signal To Noise Ratio SDSP – Small Diamond Search Pattern VLC – Video LAN Client


(16)

1

CHAPTER 1

INTRODUCTION

1.1 Background

Motion Estimation (ME) is an important part of any video compression system, since it can achieve significant compression by exploiting the time-taken redundancy existing in a video sequence. Unfortunately it is also the most computationally intensive function of the entire encoding process. In fast search algorithms, the ME process follows special pattern that checks less point number, such as diamond pattern and cross pattern. Smaller motion compensation block sizes can produce better ME results. However, a smaller block size leads to increased complexity (more search operations must be carried out) and increases in the number of MV that need to be transmitted. Sending each MV requires bits to be sent and the extra overhead for vectors may overbalance the benefit of reduced residual energy. An effective compromise is to adapt the block size to the picture characteristics, for example choosing a large block size in flat, homogeneous regions of a frame and choosing a small block size around areas of high detail and complex motion.


(17)

2

1.2 Objectives

The main aim of this project is to implement the Cross Search (CS) Algorithm that can overcome the problem faced when using the Full Search (FS) Algorithm in achieving high compression ratio in video coding. To achieve this main aim, the objectives of this project are as follow:

1. To study how the Block Matching Algorithm (BMA), FS Algorithm and Cross Search Algorithm works as they been implemented into MATLAB.

2. To understand and observe the difference between the FS and CS on their way of process, time-taken and the quality of output produced in various types of video. 3. To know and understand the basic functions of the others fast BMAs with CS

and compare their performances with CS in difference aspects.

4. To conclude and justify the best algorithm developed due to some aspects of assessments.

1.3 Problem Statement

A substantial amount of computational workload is required during the execution of Full Search algorithm; however this drawback can be overcame by many types of fast BMA‟s which have been proposed and developed. Different search patterns and strategies are exploited in these fast BMA algorithms in order to find the optimum MV with minimal number of required search point.


(18)

3

1.4 Scope

This project will focus on three main areas which are literature review on video coding, BMAs and CS, the development and implementation of CS algorithm using MATLAB platform and the performance analysis of CS to FS algorithm and CS to other BMAs‟. To undergo all of these scopes, there are some sorts of stuff that need to be considered. The literature review on video coding, BMAs and CS will be discussed further in Chapter 2. Chapter 3 will be discussing the methodology of the project including the development and implementation of CS algorithm using MATLAB. All the performance analysis and result of the implementation will be discussed in the results and discussion of Chapter 4. Finally, the conclusion and justification of the project will be stated in Chapter 5.


(19)

4

CHAPTER 2

LITERATURE REVIEW

In this chapter, the background study of the project will be evaluated. The important features in this project such as video and the algorithm details are going to be described further.

2.1 Video Compression and Coding Technique

In this subchapter, the needs of video compression, the coding technique and some explanation about selected video also will be included.

2.1.1 Introduction on Video Compression

A video is produced by two elements which are image and video data itself. To compress a video is exactly to compress these two elements. Image and video data compression are a process in which the amount of data used to represent image and video, is reduced to meet a bit rate requirement, below or at most equal to the maximum available bit rate. Although the data are reduced, the quality of the complexity of computation involved is affordable for the application.


(20)

5

Image and video data compression has been found to be necessary in several important applications such as visual transmission and storage. This is because, the huge amount of data involved in these and other applications, usually very much exceeds the capability of existing hardware although the technologies in related industries are growing up.

Data representing information carried and the quantity of data exactly can be measured. In the context of digital image and video, data are usually measured by the number of binary units or bits. The bit rate which also known as the coding rate, is an important parameter in image and video compression and is frequently expressed in a unit of bits per pixel (bpp). The term pixel is an abbreviation for picture element as is sometimes referred to as pel. In information source coding, the bit rate is sometimes expressed in a unit of bits per symbol.

2.1.2 Coding Technique

The video coding layer consists of a hybrid of temporal and spatial prediction, in conjunction with transform coding. Figure 2.1 shows a block diagram of the video coding layer for a macroblock. In summary, the picture is split into blocks. The first picture of a sequence or a random access point is typically “Intra” coded, i.e., without using information other than that contained in the picture itself.


(21)

6

Figure 2.1 Video Coding Layer [1]

2.1.3 Video

There are many formats of video that have developed. Some common types been uses are as follows:

i. Audio Video Interleave (AVI) format. Videos stored in the AVI format havethe extension .avi.

ii. Windows Media Format (WMV). Videos stored in the WMV format have the extension .wmv.

iii. Moving Pictures Expert Group (MPEG). Videos stored in the MPEG format have the extension .mpg or mpeg.

iv. Quick Time format. Videos stored in this format have the extension .mov.

v. RealVideo format. Videos stored in this format have the extension .rm or ram.


(22)

7

For this project, the videos that have been chosen for implementation are in AVI format. The standard Common Intermediate Format (CIF) video sequences used in this kind of project are Akiyo.avi, Claire.avi, Coastguard.avi, Foreman.avi, Salesman.avi and Tennis.avi. All these videos have been used as the standard reference video in ME research.

2.2 Motion Estimation

ME is a process to estimate the pels or pixels of the current frame from reference frame(s). The temporal prediction technique used in video is based on ME. The basic premise of ME is that in most cases, consecutive video frames will be similar except for changes induced by objects moving within the frames.

These techniques is using the block matching technique which exploit different search patterns and search strategies for finding the optimum MV for particular motion estimation which reduced the number of search points. It efficiently removes the temporal redundancy between successive frames by BMA.

Block-based ME is the most practical approach to obtain motion compensated prediction frames. It divides frames into equally sized rectangular blocks and finds out the displacement of the best-matched block from previous frame as the MV to the block in the current frame within a search window.

The benefits of long-term memory motion compensated prediction (LTMCP) [2] have been emphasized in recent years. Consequently, these tools have been adopted by several recent standards like H.263+ and H.264iMPEG-4 AVC [3]. As continuously dropping the costs of semiconductors, notably higher prediction gain can be achieved by estimating more reference frames in the memory buffer. Nevertheless, an obvious drawback is the complexity will increase proportionally. Extra data are also needed to describe the reference indices.


(23)

8

In the early 1980s, some conventional fast algorithms were proposed, such as the Three Step Search (TSS), the 2D logarithmic search, etc.[4] Among the algorithms, TSS becomes the most popular one for low bit-rate video application, owing to its simplicity and effectiveness. However, TSS uses a uniformly allocated search pattern in its first step, which is not very efficient to catch small motion appearing in stationary or quasi-stationary blocks.

To remedy this problem, several adaptive techniques have been suggested to make the search more adaptable to motion scale and uncertainty. The uncertainty is estimated by the difference of block distortion measure among the checked points. A smaller difference indicates a large uncertainty and hence the search scope will be increased in the next step.

2.2.1 Identifies the True Motion

The first type of ME algorithms targets to accurately track the true motion of objects/features in video sequences. Video sequences are generated by projecting a 3D real world onto a series of 2D images. When objects in the 3D real world move, the brightness or pixel intensity of the 2D images change correspondingly. The 2D motion projected from the movement of a point in the 3D real world is referred to as the “true motion” [5]. One of the many potential applications of true motion is in computer vision, the goal of which is to identify the unknown environment via the moving camera.

2.2.2 Removing Temporal Redundancy

The other type of ME algorithm target is to remove temporal redundancy in video compression. A natural way to exploit redundancy between frames is for current


(24)

9

frame t determines predicted frame t from the frame (t-Δt) or from the frame (t+Δt). Motion estimation and compensation are used to predict frame t to be coded between successive frames. Motion compensation works by estimating motion between two image frames. The motion is described by motion field of motion vectors. Consequently, the prediction error is transmitted instead of the frame itself as shown in Figure 2.2. Along with the prediction error, the motion information is also transmitted to the decoder, for it to be able to estimate the motion. The very good proportion between motion overhead and prediction error has block-based motion representation. It uses one MV per one macroblock.

Figure 2.2 Predictive sources coding with motion compensation [6]

In this project, the kind of ME preferred is the second type which is removing temporal redundancy. For the MV detection and calculation, BMA technique will be implemented. The criteria of BMA also will be described in this chapter.


(1)

CHAPTER 2

LITERATURE REVIEW

In this chapter, the background study of the project will be evaluated. The important features in this project such as video and the algorithm details are going to be described further.

2.1 Video Compression and Coding Technique

In this subchapter, the needs of video compression, the coding technique and some explanation about selected video also will be included.

2.1.1 Introduction on Video Compression

A video is produced by two elements which are image and video data itself. To compress a video is exactly to compress these two elements. Image and video data compression are a process in which the amount of data used to represent image and video, is reduced to meet a bit rate requirement, below or at most equal to the maximum available bit rate. Although the data are reduced, the quality of the complexity of computation involved is affordable for the application.


(2)

Image and video data compression has been found to be necessary in several important applications such as visual transmission and storage. This is because, the huge amount of data involved in these and other applications, usually very much exceeds the capability of existing hardware although the technologies in related industries are growing up.

Data representing information carried and the quantity of data exactly can be measured. In the context of digital image and video, data are usually measured by the number of binary units or bits. The bit rate which also known as the coding rate, is an important parameter in image and video compression and is frequently expressed in a unit of bits per pixel (bpp). The term pixel is an abbreviation for picture element as is sometimes referred to as pel. In information source coding, the bit rate is sometimes expressed in a unit of bits per symbol.

2.1.2 Coding Technique

The video coding layer consists of a hybrid of temporal and spatial prediction, in conjunction with transform coding. Figure 2.1 shows a block diagram of the video coding layer for a macroblock. In summary, the picture is split into blocks. The first picture of a sequence or a random access point is typically “Intra” coded, i.e., without using information other than that contained in the picture itself.


(3)

Figure 2.1 Video Coding Layer [1]

2.1.3 Video

There are many formats of video that have developed. Some common types been uses are as follows:

i. Audio Video Interleave (AVI) format. Videos stored in the AVI format havethe extension .avi.

ii. Windows Media Format (WMV). Videos stored in the WMV format have the extension .wmv.

iii. Moving Pictures Expert Group (MPEG). Videos stored in the MPEG format have the extension .mpg or mpeg.

iv. Quick Time format. Videos stored in this format have the extension .mov.

v. RealVideo format. Videos stored in this format have the extension .rm or ram.


(4)

For this project, the videos that have been chosen for implementation are in AVI format. The standard Common Intermediate Format (CIF) video sequences used in this kind of project are Akiyo.avi, Claire.avi, Coastguard.avi, Foreman.avi, Salesman.avi and Tennis.avi. All these videos have been used as the standard reference video in ME research.

2.2 Motion Estimation

ME is a process to estimate the pels or pixels of the current frame from reference frame(s). The temporal prediction technique used in video is based on ME. The basic premise of ME is that in most cases, consecutive video frames will be similar except for changes induced by objects moving within the frames.

These techniques is using the block matching technique which exploit different search patterns and search strategies for finding the optimum MV for particular motion estimation which reduced the number of search points. It efficiently removes the temporal redundancy between successive frames by BMA.

Block-based ME is the most practical approach to obtain motion compensated prediction frames. It divides frames into equally sized rectangular blocks and finds out the displacement of the best-matched block from previous frame as the MV to the block in the current frame within a search window.

The benefits of long-term memory motion compensated prediction (LTMCP) [2] have been emphasized in recent years. Consequently, these tools have been adopted by several recent standards like H.263+ and H.264iMPEG-4 AVC [3]. As continuously dropping the costs of semiconductors, notably higher prediction gain can be achieved by estimating more reference frames in the memory buffer. Nevertheless, an obvious drawback is the complexity will increase proportionally. Extra data are also needed to describe the reference indices.


(5)

In the early 1980s, some conventional fast algorithms were proposed, such as the Three Step Search (TSS), the 2D logarithmic search, etc.[4] Among the algorithms, TSS becomes the most popular one for low bit-rate video application, owing to its simplicity and effectiveness. However, TSS uses a uniformly allocated search pattern in its first step, which is not very efficient to catch small motion appearing in stationary or quasi-stationary blocks.

To remedy this problem, several adaptive techniques have been suggested to make the search more adaptable to motion scale and uncertainty. The uncertainty is estimated by the difference of block distortion measure among the checked points. A smaller difference indicates a large uncertainty and hence the search scope will be increased in the next step.

2.2.1 Identifies the True Motion

The first type of ME algorithms targets to accurately track the true motion of objects/features in video sequences. Video sequences are generated by projecting a 3D real world onto a series of 2D images. When objects in the 3D real world move, the brightness or pixel intensity of the 2D images change correspondingly. The 2D motion projected from the movement of a point in the 3D real world is referred to as the “true motion” [5]. One of the many potential applications of true motion is in computer vision, the goal of which is to identify the unknown environment via the moving camera.

2.2.2 Removing Temporal Redundancy

The other type of ME algorithm target is to remove temporal redundancy in video compression. A natural way to exploit redundancy between frames is for current


(6)

frame t determines predicted frame t from the frame (t-Δt) or from the frame (t+Δt). Motion estimation and compensation are used to predict frame t to be coded between successive frames. Motion compensation works by estimating motion between two image frames. The motion is described by motion field of motion vectors. Consequently, the prediction error is transmitted instead of the frame itself as shown in Figure 2.2. Along with the prediction error, the motion information is also transmitted to the decoder, for it to be able to estimate the motion. The very good proportion between motion overhead and prediction error has block-based motion representation. It uses one MV per one macroblock.

Figure 2.2 Predictive sources coding with motion compensation [6]

In this project, the kind of ME preferred is the second type which is removing temporal redundancy. For the MV detection and calculation, BMA technique will be implemented. The criteria of BMA also will be described in this chapter.