Springer Embedded Computer Vision Oct 2008 ISBN 184800303X pdf

  

Advances in Pattern Recognition

For other titles published in this series, go to

  • Branislav Kisa˘canin Shuvra S. Bhattacharyya Sek Chai

  Editors

Embedded Computer Vision

  Editors Branislav Kisa˘canin, PhD Shuvra S. Bhattacharyya, PhD Texas Instruments University of Maryland Dallas, TX, USA College Park, MD, USA Sek Chai, PhD Motorola Schaumburg, IL, USA Series editor Professor Sameer Singh, PhD Research School of Informatics, Loughborough University, Loughborough, UK Advances in Pattern Recognition Series ISSN 1617-7916

  ISBN 978-1-84800-303-3 e-ISBN 978-1-84800-304-0 DOI 10.1007/978-1-84800-304-0 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2008935617 Springer-Verlag London Limited 2009 c

Apart from any fair dealing for the purposes of research or private study, or criticism or

review, as permitted under the Copyright, Designs and Patents Act 1988, this publication

may only be reproduced, stored or transmitted, in any form or by any means, with the

prior permission in writing of the publishers, or in the case of reprographic reproduction in

accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries

concerning reproduction outside those terms should be sent to the publishers.

The use of registered names, trademarks, etc. in this publication does not imply, even in the

absence of a specific statement, that such names are exempt from the relevant laws and regu-

lations and therefore free for general use.

The publisher makes no representation, express or implied, with regard to the accuracy of the

information contained in this book and cannot accept any legal responsibility or liability for

any errors or omissions that may be made. Printed on acid-free paper Springer Science+Business Media springer.com

  To Saˇska, Milena, and Nikola BK

To Milu, Arpan, and Diya

SSB

  To Ying and Aaron SC

  Foreword

  As a graduate student at Ohio State in the mid-1970s, I inherited a unique com- puter vision laboratory from the doctoral research of previous students. They had designed and built an early frame-grabber to deliver digitized color video from a (very large) electronic video camera on a tripod to a mini-computer (sic) with a (huge!) disk drive—about the size of four washing machines. They had also de- signed a binary image array processor and programming language, complete with a user’s guide, to facilitate designing software for this one-of-a-kind processor. The overall system enabled programmable real-time image processing at video rate for many operations.

  I had the whole lab to myself. I designed software that detected an object in the field of view, tracked its movements in real time, and displayed a running description of the events in English. For example: “An object has appeared in the upper right

  

corner . . . It is moving down and to the left . . . Now the object is getting closer. . . The

object moved out of sight to the left” —about like that. The algorithms were simple,

  relying on a sufficient image intensity difference to separate the object from the background (a plain wall). From computer vision papers I had read, I knew that vision in general imaging conditions is much more sophisticated. But it worked, it was great fun, and I was hooked.

  A lot has changed since! Dissertation after dissertation, the computer vision re- search community has contributed many new techniques to expand the scope and reliability of real-time computer vision systems. Cameras changed from analog to digital and became incredibly small. At the same time, computers shrank from mini- computers to workstations to personal computers to microprocessors to digital sig- nal processors to programmable digital media systems on a chip. Disk drives became very small and are starting to give way to multi-gigabyte flash memories.

  Many computer vision systems are so small and embedded in other systems that we don’t even call them “computers” anymore. We call them automotive vision sensors, such as lane departure and blind spot warning sensors. We call them smart cameras and digital video recorders for video surveillance. We call them mobile phones (which happen to have embedded cameras and 5+ million lines of wide- ranging software), and so on. viii Foreword

  Today that entire computer vision laboratory of the 1970s is upstaged by a battery-powered camera phone in my pocket. So we are entering the age of “embedded vision.” Like optical character recog- nition and industrial inspection (machine vision) applications previously became sufficiently useful and cost-effective to be economically important, diverse embed- ded vision applications are emerging now to make the world a safer and better place to live. We still have a lot of work to do!

  In this book we look at some of the latest techniques from universities and com- panies poking outside the envelope of what we already knew how to build. We see emphasis on tackling important problems for society. We see engineers evaluating many of the trade-offs needed to design cost-effective systems for successful prod- ucts. Should I use this processor or design my own? How many processors do I need? Which algorithm is sufficient for a given problem? Can I re-design my algo- rithm to use a fixed-point processor?

  I see all of the chapters in this book as marking the embedded vision age. The lessons learned that the authors share will help many of us to build better vision sys- tems, align new research with important needs, and deliver it all in extraordinarily small systems.

  May 2008

  Bruce Flinchbaugh

  Dallas, TX

  Preface Embedded Computer Vision

  We are witnessing a major shift in the way computer vision applications are im- plemented, even developed. The most obvious manifestation of this shift is in the platforms that computer vision algorithms are running on: from powerful worksta- tions to embedded processors. As is often the case, this shift came about at the intersection of enabling technologies and market needs. In turn, a new discipline has emerged within the imaging/vision community to deal with the new challenges: embedded computer vision (ECV).

  Building on synergistic advances over the past decades in computer vision al- gorithms, embedded processing architectures, integrated circuit technology, and electronic system design methodologies, ECV techniques are increasingly being deployed in a wide variety of important applications. They include high volume, cost-centric consumer applications, as well as accuracy- and performance-centric, mission-critical systems. For example, in the multi-billion dollar computer and

  TM

  video gaming industry, the Sony EyeToy camera, which includes processing to detect color and motion, is reaching out to gamers to play without any other in- terfaces. Very soon, new camera-based games will detect body gestures based on movements of the hands, arms, and legs, to enhance the user experience. These games are built upon computer vision research on articulated body pose estimation and other kinds of motion capture analysis. As a prominent example outside of the gaming industry, the rapidly expanding medical imaging industry makes extensive use of ECV techniques to improve the accuracy of medical diagnoses, and to greatly reduce the side effects of surgical and diagnostic procedures.

  Furthermore, ECV techniques can help address some of society’s basic needs for safety and security. They are well suited for automated surveillance applications, which help to protect against malicious or otherwise unwanted intruders and activi- ties, as well as for automotive safety applications, which aim to assist the driver and improve road safety. x Preface

  Some well-established products and highly publicized technologies may be seen as early examples of ECV. Two examples are the optical mouse (which uses a hard- ware implementation of an optical flow algorithm), and NASA’s Martian rovers,

  

Spirit and Opportunity (which used computer vision on a processor of very lim-

  ited capabilities during the landing, and which have a capability for vision-based self-navigation).

  In addition to the rapidly increasing importance and variety of ECV applications, this domain of embedded systems warrants specialized focus because ECV appli- cations have a number of distinguishing requirements compared to general-purpose systems and other embedded domains. For example, in low- to middle-end general- purpose systems, and in domains of embedded computing outside of ECV, perfor- mance requirements are often significantly lower than what we encounter in ECV. Cost and power consumption considerations are important for some areas of ECV, as they are in other areas of consumer electronics. However, in some areas of ECV, such as medical imaging and surveillance, considerations of real-time performance and accuracy dominate. Performance in turn is strongly related to considerations of buffering efficiency and memory management due to the large volumes of pixel data that must be processed in ECV systems. This convergence of high-volume, multidimensional data processing; real-time performance requirements; and com- plex trade-offs between achievable accuracy and performance gives rise to some of the key distinguishing aspects in the design and implementation of ECV sys- tems. These aspects have also helped to influence the evolution of some of the major classes of embedded processing devices and platforms—including field pro- grammable gate arrays (FPGAs), programmable digital signal processors (DSPs), graphics processing units (GPUs), and various kinds of heterogeneous embedded multiprocessor devices—that are relevant to the ECV domain.

  Target Audience

  This book is written for researchers, practitioners, and managers of innovation in the field of ECV. The researchers are those interested in advancing theory and ap- plication conception. For this audience, we present the state of the art of the field today, and provide insight about where major applications may go in the near fu- ture. The practitioners are those involved in the implementation, development, and deployment of products. For this audience, we provide the latest approaches and methodologies to designing on the different processing platforms for ECV. Lastly, the managers are those tasked with leading the product innovation in a corporation. For this audience, we provide an understanding of the technology so that necessary resources and competencies can be put in place to effectively develop a product based on computer vision.

  For designers starting in this field, we provide in this book a historical perspective on early work in ECV that is a necessary foundation for their work. For those in the midst of development, we have compiled a list of recent research from industry Preface xi

  and academia. In either case, we hope to give a well-rounded discussion of future developments in ECV, from implementation methodology to applications.

  The book can also be used to provide an integrated collection of readings for specialized graduate courses or professionally oriented short courses on ECV. The book could, for example, help to complement a project-oriented emphasis in such a course with readings that would help to give a broader perspective on both the state of the art and evolution of the field.

  Organization of the Book

  Each chapter in this book is a stand-alone exposition of a particular topic. The chap- ters are grouped into three parts:

  

Part I: Introduction, which comprises three introductory chapters: one on hard-

  ware and architectures for ECV, another on design methodologies, and one that introduces the reader to video analytics, possibly the fastest growing area of appli- cation of ECV.

  

Part II: Advances in Embedded Computer Vision, which contains seven chap-

  ters on the state-of-the art developments in ECV. These chapters explore advan- tages of various architectures, develop high-level software frameworks, and develop algorithmic alternatives that are close in performance to standard approaches, yet computationally less expensive. We also learn about issues of implementation on a fixed-point processor, presented on an example of an automotive safety application.

  

Part III: Looking Ahead, which consists of three forward-looking chapters de-

  scribing challenges in mobile environments, video analytics, and automotive safety applications.

  Overview of Chapters

  Each chapter mimics the organization of the book. They all provide introduction, results, and challenges, but to a different degree, depending on whether they were written for Part I, II, or III. Here is a summary of each chapter’s contribution:

  Part I: Introduction

  • Chapter 1: Hardware Considerations for Embedded Vision Systems by Mathias K¨olsch and Steven Butner. This chapter is a gentle introduction into the compli- cated world of processing architectures suitable for vision: DSPs, FPGAs, SoCs, ASICs, GPUs, and GPPs. The authors argue that in order to better understand the trade-offs involved in choosing the right architecture for a particular applica- tion, one needs to understand the entire real-time vision pipeline. Following the pipeline, they discuss all of its parts, tracing the information flow from photons on the front end to the high-level output produced by the system at the back end.
xii Preface

  • Chapter 2: Design Methodology for Embedded Computer Vision Systems by

  Sankalita Saha and Shuvra S. Bhattacharyya. In this chapter the authors provide a broad overview of literature regarding design methodologies for embedded com- puter vision.

  • Chapter 3: We Can Watch It for You Wholesale by Alan J. Lipton. In this chapter the reader is taken on a tour of one of the fastest growing application areas in embedded computer vision—video analytics. This chapter provides a rare insight into the commercial side of our field.

  Part II: Advances in Embedded Computer Vision

  • Chapter 4: Using Robust Local Features on DSP-based Embedded Systems by

  Clemens Arth, Christian Leistner, and Horst Bischof. In this chapter the authors present their work on robust local feature detectors and their suitability for em- bedded implementation. They also describe their embedded implementation on a DSP platform and their evaluation of feature detectors on camera calibration and object detection tasks.

  • Chapter 5: Benchmarks of Low-Level Vision Algorithms for DSP, FPGA, and Mo-

  

bile PC Processors by Daniel Baumgartner, Peter Roessler, Wilfried Kubinger,

  Christian Zinner, and Kristian Ambrosch. This chapter provides a comparison of performance of several low-level vision kernels on three fundamentally different processing platforms: DSPs, FPGAs, and GPPs. The authors show the optimiza- tion details for each platform and share their experiences and conclusions.

  • Chapter 6: SAD-Based Stereo Matching Using FPGAs by Kristian Ambrosch,

  Martin Humenberger, Wilfried Kubinger, and Andreas Steininger. In this chapter we see an FPGA implementation of SAD-based stereo matching. The authors describe various trade-offs involved in their design and compare the performance to a desktop PC implementation based on OpenCV.

  • Chapter 7: Motion History Histograms for Human Action Recognition by Hongy- ing Meng, Nick Pears, Michael Freeman, and Chris Bailey. In this chapter we learn about the authors’ work on human action recognition. In order to improve the performance of existing techniques and, at the same time, make these tech- niques more suitable for embedded implementation, the authors introduce novel features and demonstrate their advantages on a reconfigurable embedded system for gesture recognition.
  • Chapter 8: Embedded Real-Time Surveillance Using Multimodal Mean Back-

  ground Modeling by Senyo Apewokin, Brian Valentine, Dana Forsthoefel, Linda

  Wills, Scott Wills, and Antonio Gentile. In this chapter we learn about a new approach to background subtraction, that approaches the performance of mixture of Gaussians, while being much more suitable for embedded implementation. To complete the picture, the authors provide comparison of two different embedded PC implementations.

  • Chapter 9: Implementation Considerations for Automotive Vision Systems on a

  

Fixed-Point DSP by Zoran Nikoli´c. This chapter is an introduction to issues re-

  lated to floating- to fixed-point conversion process. A practical approach to this Preface xiii

  difficult problem is demonstrated on the case of an automotive safety application being implemented on a fixed-point DSP.

  • Chapter 10: Towards OpenVL: Improving Real-Time Performance of Computer

  Vision Applications by Changsong Shen, James J. Little, and Sidney Fels. In

  this chapter the authors present their work on a unified software architecture, OpenVL, which addresses a variety of problems faced by designers of embedded vision systems, such as hardware acceleration, reusability, and scalability.

  Part III: Looking Ahead

  • Chapter 11: Mobile Challenges for Embedded Computer Vision by Sek Chai. In this chapter we learn about the usability and other requirements a new applica- tion idea must satisfy in order to become a “killer-app.” The author discusses these issues on a particularly resource-constrained case of mobile devices such as camera phones. While being a great introduction into this emerging area, this chapter also provides many insights into the challenges to be solved in the future.
  • Chapter 12: Challenges in Video Analytics by Nikhil Gagvani. This chapter is another rare insight into the area of video analytics, this one more on the forward looking side. We learn about what challenges lie ahead of this fast growing area, both technical and nontechnical.
  • Chapter 13: Challenges of Embedded Computer Vision in Automotive Safety Sys-

  

tems by Yan Zhang, Arnab S. Dhua, Stephen J. Kiselewich, and William A. Bau-

  son. This chapter provides a gentle introduction into the numerous techniques that will one day have to be implemented on an embedded platform in order to help improve automotive safety. The system described in this chapter sets the automotive performance standards and provides a number of challenges to all parts of the design process: algorithm developers may be able to find algorith- mic alternatives that provide equal performance while being more suitable for embedded platforms; chip-makers may find good pointers on what their future chips will have to deal with; software developers may introduce new techniques for parallelization of multiple automotive applications sharing the same hardware resources.

  All in all, this book offers the first comprehensive look into various issues fac- ing developers of embedded vision systems. As Bruce Flinchbaugh declares in the Foreword to this book, “we are entering the age of embedded vision.” This book is a very timely resource!

  How This Book Came About

  As organizers of the 2007 IEEE Workshop on ECV (ECVW 2007), we were acutely aware of the gap in the available literature. While the workshop has established itself as an annual event happening in conjunction with IEEE CVPR conferences, there is very little focused coverage of this topic elsewhere. An occasional short course xiv Preface

  satisfying the need for knowledge sharing in this area. That is why we decided to invite the contributors to the ECVW 2007 to expand their papers and turn them into the stand-alone chapters of Part II, and to invite our esteemed colleagues to share their experiences and visions for the future in Parts I and III.

  Outlook

  While this book covers a good representative cross section of ECV applications and techniques, there are many more applications that are not covered here, some of which may have significant social and business impact, and some not even concep- tually feasible with today’s technology.

  In the following chapters, readers will find experts in the ECV field encourag- ing others to find, build, and develop further in this area because there are many application possibilities that have not yet been explored. For example, the recent successes in the DARPA Grand Challenge show the possibilities of autonomous ve- hicles, albeit the camera is currently supplemented with a myriad of other sensors such as radar and laser. In addition to the applications mentioned above, there are applications areas such as image/video manipulation (i.e., editing and labeling an album collection), and visual search (a search based on image shape and texture). In the near future, these applications may find their way into many camera devices, including the ubiquitous mobile handset. They are poised to make significant impact on how users interact and communicate with one another and with different kinds of electronic devices. The contributions in this book are therefore intended not only to provide in-depth information on the state of the art in specific, existing areas of ECV, but also to help promote the use of ECV techniques in new directions.

  May 2008 Branislav Kisaˇcanin Plano, TX

  Shuvra S. Bhattacharyya

  College Park, MD

  Sek Chai

  Schaumburg, IL

  Acknowledgements

  The editors are grateful to the authors of the chapters in this book for their well- developed contributions and their dedicated cooperation in meeting the ambitious publishing schedule for the book. We are also grateful to the program committee members for ECVW 2007, who helped to review preliminary versions of some of these chapters, and provided valuable feedback for their further development. We would like also to thank the several other experts who helped to provide a thorough peer-review process, and ensure the quality and relevance of the chapters. The chap- ter authors themselves contributed significantly to this review process through an organization of cross-reviews among the different contributors.

  We are grateful also to our Springer editor, Wayne Wheeler, for his help in launching this book project, and to our Springer editorial assistant, Catherine Brett, for her valuable guidance throughout the production process.

  Contents

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

List of Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

Part I Introduction 1 Hardware Considerations for Embedded Vision Systems . . . . . . . . . .

  3 Mathias K¨olsch and Steven Butner 1.1 The Real-Time Computer Vision Pipeline . . . . . . . . . . . . . . . . . . . . .

  3 1.2 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  5 1.2.1 Sensor History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  5 1.2.2 The Charge-Coupled Device . . . . . . . . . . . . . . . . . . . . . . . .

  6 1.2.3 CMOS Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  7 1.2.4 Readout and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  8 1.3 Interconnects to Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  9

  1.4 Image Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

  1.5 Hardware Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

  1.5.1 Digital Signal Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

  1.5.2 Field-Programmable Gate Arrays . . . . . . . . . . . . . . . . . . . . 15

  1.5.3 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . 17

  1.5.4 Smart Camera Chips and Boards . . . . . . . . . . . . . . . . . . . . . 18

  1.5.5 Memory and Mass Storage . . . . . . . . . . . . . . . . . . . . . . . . . . 19

  1.5.6 System on Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

  1.5.7 CPU and Auxiliary Boards . . . . . . . . . . . . . . . . . . . . . . . . . . 21

  1.5.8 Component Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

  1.6 Processing Board Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

  1.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 xviii Contents

  2 Design Methodology for Embedded Computer Vision Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

  3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

  3.4 An Embedded Video Analytics System: by the Numbers . . . . . . . . 66

  3.4.1 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

  3.4.2 Analysis of Embedded Video Analytics System . . . . . . . . 68

  3.5 Future Directions for Embedded Video Analytics . . . . . . . . . . . . . . . 70

  3.5.1 Surveillance and Monitoring Applications . . . . . . . . . . . . . 71

  3.5.2 Moving Camera Applications . . . . . . . . . . . . . . . . . . . . . . . 72

  3.5.3 Imagery-Based Sensor Solutions . . . . . . . . . . . . . . . . . . . . . 72

  Part II Advances in Embedded Computer Vision

  3.3.1 An Embedded Analytics Architecture . . . . . . . . . . . . . . . . . 57

  

4 Using Robust Local Features on DSP-Based Embedded Systems . . . . 79

  Clemens Arth, Christian Leistner, and Horst Bischof

  4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

  4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

  4.3 Algorithm Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

  4.3.1 Hardware Constraints and Selection Criteria . . . . . . . . . . . 82

  4.3.2 DoG Keypoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

  3.3.2 Video Analytics Algorithmic Components . . . . . . . . . . . . . 59

  3.3 How Does Video Analytics Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

  Sankalita Saha and Shuvra S. Bhattacharyya

  2.5.3 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

  2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

  2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

  2.3 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

  2.4 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

  2.5 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

  2.5.1 Modeling and Specification . . . . . . . . . . . . . . . . . . . . . . . . . 35

  2.5.2 Partitioning and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

  2.5.4 Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 40

  3.2.2 The Video Ecosystem: Use-Cases for Video Analytics . . . 54

  2.5.5 Code Generation and Verification . . . . . . . . . . . . . . . . . . . . 41

  2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

  

3 We Can Watch It for You Wholesale . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

  Alan J. Lipton

  3.1 Introduction to Embedded Video Analytics . . . . . . . . . . . . . . . . . . . . 49

  3.2 Video Analytics Goes Down-Market . . . . . . . . . . . . . . . . . . . . . . . . . 51

  3.2.1 What Does Analytics Need to Do? . . . . . . . . . . . . . . . . . . . 52

  4.3.3 MSER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Contents xix

  4.3.5 Descriptor Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

  6.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

  6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

  6.3 Stereo Vision Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

  6.4 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

  6.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

  6.4.2 Optimizing the SAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

  6.4.3 Tree-Based WTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

  6.5.1 Test Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

  Kristian Ambrosch, Martin Humenberger, Wilfried Kubinger, and Andreas Steininger

  6.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

  6.5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

  6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

  

7 Motion History Histograms for Human Action Recognition . . . . . . . . 139

  Hongying Meng, Nick Pears, Michael Freeman, and Chris Bailey

  7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

  6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

  

6 SAD-Based Stereo Matching Using FPGAs . . . . . . . . . . . . . . . . . . . . . . 121

  4.3.6 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

  5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

  4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

  4.4.1 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

  4.4.2 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

  4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

  5 Benchmarks of Low-Level Vision Algorithms for DSP, FPGA, and Mobile PC Processors

  . . . . . . . . . . . . . . . . . . . . . . 101 Daniel Baumgartner, Peter Roessler, Wilfried Kubinger, Christian Zinner, and Kristian Ambrosch

  5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

  5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

  5.3 Benchmark Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

  5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

  5.4.1 Low-Level Vision Algorithms . . . . . . . . . . . . . . . . . . . . . . . 104

  5.4.2 FPGA Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

  5.4.3 DSP Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

  5.4.4 Mobile PC Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 115

  5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

  7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 xx Contents

  7.3 SVM-Based Human Action Recognition System . . . . . . . . . . . . . . . 142

  8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

  8.3 Multimodal Mean Background Technique . . . . . . . . . . . . . . . . . . . . . 166

  8.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

  8.4.1 Embedded Platform: eBox-2300 Thin Client . . . . . . . . . . . 169

  8.4.2 Comparative Evaluation Platform: HP Pavilion Slimline . 169

  8.5 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.5.1 eBox Performance Results and Storage Requirements . . . 172

  8.5.2 HP Pavilion Slimline Performance Results . . . . . . . . . . . . 172

  9 Implementation Considerations for Automotive Vision Systems on a Fixed-Point DSP

  8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Zoran Nikoli´c

  9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

  9.1.1 Fixed-Point vs. Floating-Point Arithmetic Design Process 179

  9.1.2 Code Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

  9.2 Fixed-Point Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

  9.3 Process of Dynamic Range Estimation . . . . . . . . . . . . . . . . . . . . . . . . 182

  9.3.1 Dynamic Range Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 182

  8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

  Senyo Apewokin, Brian Valentine, Dana Forsthoefel, Linda Wills, Scott Wills, and Antonio Gentile

  7.4 Motion Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

  7.5.2 Subsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

  7.4.1 Temporal Template Motion Features . . . . . . . . . . . . . . . . . . 143

  7.4.2 Limitations of the MHI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

  7.4.3 Definition of MHH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

  7.4.4 Binary Version of MHH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

  7.5 Dimension Reduction and Feature Combination . . . . . . . . . . . . . . . . 148

  7.5.1 Histogram of MHI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

  7.5.3 Motion Geometric Distribution (MGD) . . . . . . . . . . . . . . . 148

  8 Embedded Real-Time Surveillance Using Multimodal Mean Background Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

  7.5.4 Combining Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

  7.6 System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

  7.6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

  7.6.2 Performance of Single Features . . . . . . . . . . . . . . . . . . . . . . 151

  7.6.3 Performance of Combined Features . . . . . . . . . . . . . . . . . . 155

  7.7 FPGA Implementation on Videoware . . . . . . . . . . . . . . . . . . . . . . . . . 156

  7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

  9.3.2 Bit-True Fixed-Point Simulation . . . . . . . . . . . . . . . . . . . . . 185 Contents xxi

  9.3.3 Customization of the Bit-True Fixed-Point Algorithm to a Fixed-Point DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

  Part III Looking Ahead

  10.4 Example Application Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

  10.4.1 Procedure for Implementing Applications . . . . . . . . . . . . . 211

  10.4.2 Local Positioning System (LPS) . . . . . . . . . . . . . . . . . . . . . 211

  10.4.3 Human Tracking and Attribute Calculation . . . . . . . . . . . . 214

  10.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

  10.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

  

11 Mobile Challenges for Embedded Computer Vision . . . . . . . . . . . . . . . 219

  10.3.6 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

  Sek Chai

  11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

  11.2 In Search of the Killer Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 221

  11.2.1 Image Finishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

  11.2.2 Video Codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

  11.2.3 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

  11.2.4 Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

  10.3.7 Isolating Layers to Mask Heterogeneity . . . . . . . . . . . . . . . 210

  10.3.5 Synchronization and Communication . . . . . . . . . . . . . . . . . 207

  9.4 Implementation Considerations for Single-Camera Steering Assistance Systems on a Fixed-Point DSP . . . . . . . . . . . . . . . . . . . . . 186

  10.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

  9.4.1 System Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

  9.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

  9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

  10 Towards OpenVL: Improving Real-Time Performance of Computer Vision Applications

  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Changsong Shen, James J. Little, and Sidney Fels

  10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

  10.2.1 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

  10.3.4 Data Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

  10.2.2 Pipes and Filters and Data-Flow Approaches . . . . . . . . . . . 198

  10.2.3 OpenGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

  10.2.4 Hardware Architecture for Parallel Processing . . . . . . . . . 200

  10.3 A Novel Software Architecture for OpenVL . . . . . . . . . . . . . . . . . . . 201

  10.3.1 Logical Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

  10.3.2 Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

  10.3.3 Event-Driven Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

  11.3 Technology Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 xxii Contents

  11.3.2 Computing Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

  12.5.6 Vision for an Analytics-Powered Future . . . . . . . . . . . . . . . 254

  12.4 Embedded Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

  12.5 Future Applications and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 250

  12.5.1 Moving Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

  12.5.2 Multi-Camera Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

  12.5.3 Smart Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

  12.5.4 Scene Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

  12.5.5 Search and Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

  12.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

  12.3.3 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

  13 Challenges of Embedded Computer Vision in Automotive Safety Systems

  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Yan Zhang, Arnab S. Dhua, Stephen J. Kiselewich, and William A. Bauson

  13.1 Computer Vision in Automotive Safety Applications . . . . . . . . . . . . 257

  13.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

  13.3 Vehicle Cueing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

  13.3.1 Cueing Step 1: Edge Detection and Processing . . . . . . . . . 260

  13.3.2 Cueing Step 2: Sized-Edge detection . . . . . . . . . . . . . . . . . 261

  12.3.4 Behavior and Activity Recognition . . . . . . . . . . . . . . . . . . . 248

  12.3.2 Classification and Recognition . . . . . . . . . . . . . . . . . . . . . . 246

  11.3.3 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

  11.4.3 Business Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

  11.3.4 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

  11.3.5 Cost and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

  11.3.6 Image Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

  11.3.7 Illumination and Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

  11.4 Intangible Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

  11.4.1 User Perception and Attitudes Towards Computer Vision 230

  11.4.2 Measurability and Standardization . . . . . . . . . . . . . . . . . . . 231

  11.5 Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

  12.3.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

  12 Challenges in Video Analytics