Design of Image Processing Embedded Systems Using Multidimensional Data Flow

  

Embedded Systems

Series Editors Nikil D. Dutt Peter Marwedel Grant Martin

  For further volumes: http://www.springer.com/series/8563

  Joachim Keinert ˙ Jürgen Teich

Design of Image Processing

Embedded Systems Using

Multidimensional Data Flow

  123 Joachim Keinert Jürgen Teich Michaelstraße 40 Department of Computer Science 12 D-90425 Nürnberg University of Erlangen-Nuremberg Germany Am Weichselgarten 3 joachim.keinert@yahoo.de D-91058 Erlangen

  Germany teich@informatik.uni-erlangen.de

  ISBN 978-1-4419-7181-4 e-ISBN 978-1-4419-7182-1 DOI 10.1007/978-1-4419-7182-1 Springer New York Dordrecht Heidelberg London

  Library of Congress Control Number: 2010937183 c Springer Science+Business Media, LLC 2011

All rights reserved. This work may not be translated or copied in whole or in part without the written

permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,

NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in

connection with any form of information storage and retrieval, electronic adaptation, computer

software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if

they are not identified as such, is not to be taken as an expression of opinion as to whether or not

they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

  Preface Overview of This Book

  With the availability of chips offering constantly increasing computational performance and functionality, design of more and more complex applications becomes possible. This is partic- ularly true for the domain of image processing, which is characterized by huge computation efforts. Unfortunately, this evolution risks to be stopped by the fact that employed design methodologies remain on a rather low level of abstraction. The resulting design gap causes increasing development costs or even project failure and thus threatens the technical progress.

  Consequently, new design methodologies are urgently required. A corresponding review about the state of the art reveals that different approaches are competing in order to solve the above-mentioned challenges. The proposed techniques range from behavioral compilers accepting standard C or Matlab code as input, over block-based design methods such as Simulink and SystemC, to data flow specifications and polyhedral analysis. Each of them offers important benefits, such as quick and easy hardware prototyping, higher levels of abstractions, and enhanced system and algorithm analysis on different levels of granularity. However, a solution combining the advantages of all these approaches is still missing. As a consequence, system level design of image processing applications still causes various chal- lenges. Corresponding examples are the lack to handle the resulting system complexity or to cover important algorithmic properties. Equally, the synthesis of high-performance hardware implementations is still difficult.

  Fortunately, recent research is able to demonstrate that multidimensional data flow seems to be a promising technique solving these drawbacks, because it combines the advantages of block-based specification, data flow-related system analysis, and polyhedral optimization on different levels of granularity. These benefits enable, for instance, the verification of the application specification on a very high level of abstraction, the calculation of required mem- ory sizes for correct algorithm implementation considering different design tradeoffs, and the synthesis of high-performance communication infrastructures and algorithm implemen- tations.

  However, despite these advantages, multidimensional data flow still lives quite in the shad- ows and is rarely adopted in both commercial and academic systems. Consequently, this book aims to give an encompassing description of the related techniques in order to demonstrate how multidimensional data flow can boost system implementation. In particular, this book identifies some of the requirements for system level design of image processing algorithms and gives an encompassing review in how far they are met by different approaches found in literature and industry. Next, a multidimensional data flow model of computation is intro- vi Preface duced that is particularly adapted for image processing applications. Its ability to represent both static and data-dependent point, local, and global algorithms as well as the possibil- ity for seamless interaction with already existing one-dimensional models of computation permit the description of complex systems. Based on these foundations, it is shown how system analysis and synthesis can be simplified by automatic tool support. In particular, it is explained in detail, how the amount of memory required for correct implementation can be derived by means of polyhedral analysis and how communication primitives for high-speed multidimensional communication can be generated. Application to different examples such as a lifting-based wavelet transform, JPEG2000 encoding, JPEG decoding, or multi-resolution filtering illustrates the concrete application of these techniques and demonstrates the capa- bility to deliver better results in shorter time compared to related approaches while offering more design flexibility.

  Target Audience

  As a consequence of the encompassing description of a system level design methodology using multidimensional data flow, the book addresses particularly all those active or interested in the research, development, or deployment of new design methodologies for data-intensive embedded systems. These are intended to process huge amounts of data organized in form of array streams. Image processing applications are particular prominent examples of this algorithm class and are thus in the focus of this book.

  In addition to this primary target audience, the book is also useful for system design engineers by describing new technologies for inter-module communication as well as design tradeoffs that can be exploited in embedded systems. And finally, the book wants to promote multidimensional data flow and makes it more accessible for education and studies by giving an encompassing description of related techniques, use cases, and applications.

  Prerequisites

  Since this book bridges different technologies such as data flow modeling, polyhedral analy- sis, and hardware synthesis, important concepts necessary in the context of multidimensional data flow are shortly summarized before their application. By this means it is avoided to unnecessarily complicate understanding of the presented material. Nevertheless, it is assumed that the reader is skilled in fundamental maths such as vector spaces and matrix multiplica- tion. Integer linear optimization is used in both memory analysis and communication syn- thesis. While the fundamentals are shortly summarized in this book, additional knowledge can deliver more detailed insights. Furthermore, familiarity with basic concepts of image processing helps in understanding the presented material. For a clearer picture of the overall concepts, some experiences in software and hardware implementation are helpful, although not strictly necessary.

  How the Book Is Organized

  Preface vii

  Introductory background information • Related techniques • Multidimensional modeling • Analysis and synthesis • depict concepts and information used in the corresponding target book parts. Bold

  Fig. 1 Arrows

  define information that is fundamental for understanding the chapter which the arrow

  arrows

  points to The first part containing the introductory background information basically aims to clarify the purpose and the main focus of the described technologies. To this end, Chapter

  1 (Intro-

  duction) explains the need for new design technologies and overviews the overall design flow described in this book. Chapter

  2 (Design of Image Processing Applications) adds some gen-

  eral considerations about the design of image processing embedded systems and exemplifies a JPEG2000 encoder in order to clarify the type of algorithms that are addressed in this book. The insights gained during its manual development have been formulated into a correspond- viii Preface

  The second part about related techniques summarizes concepts useful for system level design of image processing applications. To this end, Chapter

  3 (Fundamentals and Related

  Work) reviews related approaches and evaluates their benefits for system level design of image processing applications. In particular, it investigates a huge amount of different specifi- cation techniques ranging from sequential languages enriched by communicating sequential processes up to multidimensional data flow models. Furthermore, it evaluates their ability to describe complex image processing applications. Additionally, Chapter

  3 also summarizes

  the capacities of several behavioral compilers, buffer analysis techniques, and communication synthesis approaches. Furthermore, it discusses several system level design methodologies. Subsequently, Chapter

  4 presents an overview on the ESL tool S YSTEM C O D ESIGNER , since

  it shall serve as an example how to combine multidimensional system design with available ESL techniques. Furthermore, a case study in form of a Motion-JPEG decoder demonstrates the potential of ESL design for image processing applications and discusses lessons learned for both application modeling and synthesis.

  Both Chapters

  3 and 4 are thus intended to provide further insights into system level design

  of embedded systems. In particular, they aim to clarify the benefits of multidimensional data flow and its interaction with existing technologies. Consequently, both chapters can be con- sulted as needed. The only exception represents Section

  3.1.3 ( One-Dimensional Data Flow )

  that is recommended for all those not being familiar with one-dimensional data flow models of computation.

  Detailed discussion of multidimensional system level design then starts with the third and central book part about multidimensional modeling, subdivided into two chapters. Chapter

  5

  (Windowed Data Flow (WDF)) introduces the windowed data flow (WDF) model of com- putation used for application modeling in the remainder of this monograph. This includes both a theoretical discussion and the application to two concrete examples, namely the binary morphological reconstruction and the JPEG2000 lifting-based wavelet transform. In particu- lar Sections

  5.1 ( Sliding Window Communication ), 5.2 ( Local WDF Balance Equation ),

  5.3

  ( Communication Order ), and

  5.4 ( Communication Control ) introduce fundamental concepts

  required in the remainder of this monograph. The same holds for Sections

  6.1 ( Problem Formulation ), 6.2 ( Hierarchical Iteration Vectors ), and 6.3 ( Memory Models ). It discusses

  fundamental concepts of memory organization within multidimensional arrays as required in the remaining chapters. In particular, a study is performed that compares two different memory allocation functions in terms of memory efficiency.

  Based on those multidimensional modeling concepts, the fourth part of this book then addresses system analysis and synthesis. More precisely, Chapter

  7 (Buffer Analysis for

  Complete Application Graphs) is dedicated to the question of automatic buffer size deter- mination required for correct system implementation. Taking the results of Chapter

  6 into

  account, Chapter

  7 presents a method for polyhedral buffer size requirement calculation in

  case of complex graph topologies. Application to several examples like the lifting-based wavelet transform, JPEG2000 block building, and multi-resolution image filtering demon- strates that the resulting analysis times are suitable for system level design of complex applications and competitive with alternative approaches. Furthermore, it will be shown that analytical methods deliver better solutions in shorter time compared to buffer analysis via simulation, while offering more design tradeoffs.

  The so-derived buffer sizes can be directly used for efficient communication synthesis. To this end, Chapter

  8 (Communication Synthesis) considers the derivation of high-speed

  hardware communication primitives from WDF specifications. This allows to interconnect hardware modules by a high-performance point-to-point communication. The corresponding

  Preface ix multidimensional data flow. To this end, Chapter

  8 presents all analysis steps required to

  transform a WDF edge into an efficient hardware implementation. Application to different examples originating from a JPEG2000 encoder and a JPEG decoder demonstrates the bene- fits of the methodology. Furthermore, Chapter

  8 illustrates how the hardware communication

  primitive can be combined with a behavioral synthesis tool in order to handle overlapping sliding windows.

  The book is concluded by Chapter

  9 . Appendix A (Buffer Analysis by Simulation) then

  delivers some supplementary information concerning a buffer analysis performed during sim- ulation as applied in Chapter

  6 . Appendix B summarizes the abbreviations used within this book while Appendix C contains repeatedly used formula symbols.

  Distinct Features and Benefits of This Book

  Although combining the benefits of various design methodologies such as block-based sys- tem design, high-level simulation, system analysis, and polyhedral optimization, multidi- mensional data flow is still not very widely known. Whereas there exist several books dis- cussing the one-dimensional counterparts, similar literature is not available for multidimen- sional modeling. Consequently, this book aims to provide a detailed insight into these design methodologies. Furthermore, it wants to provide an encompassing review on related work and techniques in order to show their relation to multidimensional data flow.

  By these means, the book is intended to contribute to the promotion of multidimensional data flow in both academic and industrial projects. Furthermore, it aims to render the sub- ject more accessible for education. In more detail, this monograph provides the following contributions:

  • First encompassing book on multidimensional data flow covering different models of computation. In particular, both modeling, synthesis, and analysis are discussed in detail demonstrating the potential of the underlying concepts.
  • The book bridges different technologies such as data flow modeling, polyhedral analysis, and hardware synthesis, which are normally only considered independently of each other in different manuscripts. Consequently, their combination possess significant difficulties, since even the terminology used in the different domains varies. By combining the above- mentioned technologies in one book and describing them in a consistent way, the book can leverage new potential in system design.
  • Analysis in how far multidimensional data flow can better fit the designers’ requirements compared to alternative description techniques, such as well-known one-dimensional data flow or communicating sequential processes.
  • Description of how multidimensional data flow can coexist with classical one-dimensional models of computation.
  • Explanation of a novel architecture for efficient and flexible high-speed communication in hardware that can be used in both manual and automatic system design and that offers various design alternatives trading achievable throughput against required hardware sizes.
  • Detailed description of how to calculate required buffer sizes for implementation of static image processing applications. Various illustrations help to apply the method both in ESL tools and in manual system design.
  • Compared to books on geometric memory analysis, a significant extension assures that this method can be applied for data reordering and image subsampling in hardware imple-

  The content of this book has been created, edited and verified with highest possible care. Nevertheless, errors and mistakes of any kind cannot be excluded. This includes, but is not restricted to, missing information, wrong descriptions, erroneous results, possible algorithmic mistakes or citation flaws causing that algorithms may not work as expected.

  x Preface

  • New concepts for embedded system design, such as trading communication buffer sizes against computational logic by different scheduling mechanisms.
  • Various experimental results in order to demonstrate the capabilities of the described architectures and design methods. In particular, several example applications such as JPEG2000 encoding, Motion-JPEG decoding, binary morphological reconstruction, and multi-resolution filtering are discussed.

  Erlangen, Germany Joachim Keinert

  Jürgen Teich

  Acknowledgments

  This book is the result of a 5-year research activity that I could conduct in both the Fraunhofer Institute for Integrated Circuits IIS in Erlangen and at the Chair for Hardware-Software- Co-Design belonging to the University of Erlangen-Nuremberg. This constellation allowed me to combine the theory of system level design with the requirements for design of high- performance image processing applications. In particular, the experiences gained during the development of multiple embedded devices for image processing within the Fraunhofer research organization have been a valuable inspiration for the presented technologies. There- fore, I want to express my gratitude toward all those who supported me within this period of time.

  I especially want to thank Prof. Jürgen Teich for supervising the underlying research activity and for his motivation to tackle the right mathematical problems, in particular data flow graph models of computation and polyhedral analysis. My superior at the Fraun- hofer Institute for Integrated Circuits, Dr. Siegfried Fößel, also merits special thanks for the provided support and for his help in making this book possible. Dr. Christian Haubelt from the University of Erlangen-Nuremberg contributed to this book by means of multi- ple reviews and by his coordination of the S YSTEM C O D ESIGNER tool. This enabled its extension with a multidimensional design methodology in order to demonstrate the under- lying concepts and techniques. In this context I could particularly profit from the work of Joachim Falk, designer of the S YSTE M O C library and of a huge tool set for manipula- tion of the resulting graph topologies. Similarly, the cooperation with Hritam Dutta and Dr. Frank Hannig has been an important prerequisite for combining data flow-based system design with polyhedral analysis. In addition, the various discussions with Prof. Shuvra Bhat- tacharyya, member of the University of Maryland, helped to better understand and evaluate the advantages of multidimensional system design. And of course I also want to thank Mr. Charles Glaser from Springer for his assistance in achieving this book. Erlangen, Germany

  Joachim Keinert May 2010

  Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  1

1.1 Motivation and Current Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5 Requirements for System Level Design of Image Processing Applications . 18

  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

  2.5.9 Tool-Supported Design of Memory Systems . . . . . . . . . . . . . . . . . . 20

  2.5.8 High-Level Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 20

  2.5.7 High-Level Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

  2.5.6 Fast Generation of RTL Implementations for Quick Feedback During Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

  2.5.5 Support of Data Reordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

  19

  2.5.4 Tight Interaction Between Static and Data-Dependent Algorithms

  

2.5.3 Capability to Represent Control Flow in Multidimensional

Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

  2.5.2 Representation of Task, Data, and Operation Parallelism . . . . . . . . 19

  2.5.1 Representation of Global, Local, and Point Algorithms . . . . . . . . . 19

  2.4.6 Inability to Precisely Predict Required Computational Effort for Both Hardware and Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

  2.4.5 Lack to Simulate the Overall System . . . . . . . . . . . . . . . . . . . . . . . . 18

  2.4.4 Manual Design of Memory System . . . . . . . . . . . . . . . . . . . . . . . . . . 18

  2.4.3 Missing Possibility to Explore Consequences of Implementation Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

  2.4.2 Lack of Architectural Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

  2.4.1 Design Gap Between Available Software Solution and Desired Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

  2.4 System Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

  2.3 Parallelism of Image Processing Applications . . . . . . . . . . . . . . . . . . . . . . . . . 15

  2.2 JPEG2000 Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

  2.1 Classification of Image Processing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 10

  9

  4 2 Design of Image Processing Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  1

1.2 Multidimensional System Level Design Overview . . . . . . . . . . . . . . . . . . . . .

2.6 Multidimensional System Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Fundamentals and Related Work

3.1 Behavioral Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

  xiv Contents

  4.1.3 Actor and Communication Synthesis . . . . . . . . . . . . . . . . . . . . . . . . 83

  3.5.2 Model-Based Simulation and Design . . . . . . . . . . . . . . . . . . . . . . . . 71

  3.5.3 System Level Mapping and Exploration . . . . . . . . . . . . . . . . . . . . . . 77

  3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

  4 Electronic System Level Design of Image Processing Applications with S YSTEM C O D ESIGNER

  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

  4.1 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

  4.1.1 Actor-Oriented Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

  4.1.2 Actor Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

  4.1.4 Automatic Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . 84

  3.5 System Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

  4.1.5 System Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

  4.1.6 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

  4.2 Case Study for the Motion-JPEG Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

  4.2.1 Comparison Between VPC Estimates and Real Implementation . . 87

  4.2.2 Influence of the Input Motion-JPEG Stream . . . . . . . . . . . . . . . . . . 90

  4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

  

5 Windowed Data Flow (WDF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

  5.1 Sliding Window Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

  5.1.1 WDF Graph and Token Production . . . . . . . . . . . . . . . . . . . . . . . . . . 94

  3.5.1 Embedded Multi-processor Software Design . . . . . . . . . . . . . . . . . . 68

  3.4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

  3.1.2 Sequential Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

  3.2.6 MMAlpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

  3.1.3 One-Dimensional Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

  3.1.4 Multidimensional Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

  3.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

  3.2 Behavioral Hardware Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

  3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

  3.2.2 SA-C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

  3.2.3 ROCCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

  3.2.4 DEFACTO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

  3.2.5 Synfora PICO Express . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

  3.2.7 PARO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

  3.4.4 Out-of-Order Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

  3.2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

  3.3 Memory Analysis and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

  3.3.1 Memory Analysis for One-Dimensional Data Flow Graphs . . . . . . 57

  3.3.2 Array-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

  3.3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

  3.4 Communication and Memory Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

  3.4.1 Memory Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

  3.4.2 Parallel Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

  3.4.3 Data Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

  5.1.2 Virtual Border Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

  Contents xv

5.4.1 Multidimensional FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6 Memory Mapping Functions for Efficient Implementation of WDF Edges . . . . 133

  7.3.4 Lattice Shifting Based on Dependency Vectors . . . . . . . . . . . . . . . . 164

  7.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

  7.2 Buffer Analysis by Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

  7.3 Polyhedral Representation of WSDF Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

  7.3.1 WSDF Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

  7.3.2 Lattice Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

  7.3.3 Out-of-Order Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

  7.4.1 Principle of Lattice Wraparound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

  7.3.5 Pipelined Actor Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

  5.1.4 Determination of Extended Border Values . . . . . . . . . . . . . . . . . . . . 99

  7.4.2 Formal Description of the Lattice Wraparound . . . . . . . . . . . . . . . . 176

  7.4.3 Lattice Shifting for Lattices with Wraparound . . . . . . . . . . . . . . . . . 177

  7.5.1 Lattice Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

  7.5.2 Lattice Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

  7.6.1 ILP Formulation for Buffer Size Calculation . . . . . . . . . . . . . . . . . . 186

  7.6.2 Memory Channel Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

  6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

  6.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

  6.3.2 The Linearized Buffer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

  5.6.2 Application to an Example Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

  5.1.5 WDF Delay Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

  5.2 Local WDF Balance Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

  5.3 Communication Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

  5.4 Communication Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

  

5.4.2 Communication Finite State Machine for Multidimensional

Actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

  5.5 Windowed Synchronous Data Flow (WSDF) . . . . . . . . . . . . . . . . . . . . . . . . . . 108

  5.6 WSDF Balance Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

  6.3.1 The Rectangular Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

  5.6.1 Derivation of the WSDF Balance Equation . . . . . . . . . . . . . . . . . . . 112

  5.7 Integration into S YSTEM C O D ESIGNER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

  5.8 Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

  5.8.1 Binary Morphological Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 118

  5.8.2 Lifting-Based Wavelet Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

  5.9 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

  5.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

  6.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

  6.2 Hierarchical Iteration Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

  6.3 Memory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7 Buffer Analysis for Complete Application Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.4 Lattice Wraparound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

7.5 Scheduling of Complete WSDF Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

7.6 Buffer Size Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

  xvi Contents

7.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8 Communication Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

  8.5.3 Out-of-Order Communication with Parallel Data Access . . . . . . . . 255

  8.4.1 Latency Impact of Coarse-Grained Scheduling . . . . . . . . . . . . . . . . 248

  8.4.2 Memory Size Impact of Coarse-Grained Scheduling . . . . . . . . . . . 250

  8.4.3 Controlling the Scheduling Granularity . . . . . . . . . . . . . . . . . . . . . . 250

  8.5.1 Implementation Strategy for High Clock Frequencies . . . . . . . . . . 253

  8.5.2 Out-of-Order Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

  8.5.5 Combination with Data Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

  8.5.4 Influence of Different Memory Channel Sizes . . . . . . . . . . . . . . . . . 258

  7.8 Solution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

  8.5.6 Impact of Scheduling Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

  9.1 Multidimensional System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

  9.2 Discussed Design Steps and Their Major Benefits . . . . . . . . . . . . . . . . . . . . . . 266

A Buffer Analysis by Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

  A.1 Efficient Buffer Parameter Determination for the Rectangular Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 A.1.1 Monitoring of Live Data Elements . . . . . . . . . . . . . . . . . . . . . . . . . . 269 A.1.2 Table-Based Buffer Parameter Determination . . . . . . . . . . . . . . . . . 270 A.1.3 Determination of the Minimum Tables . . . . . . . . . . . . . . . . . . . . . . . 272 A.1.4 Determination of the Maximum Tables . . . . . . . . . . . . . . . . . . . . . . 274 A.1.5 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 A.2 Efficient Buffer Parameter Determination for the Linearized Buffer Model . 277

  8.4 Granularity of Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

  8.3 Determination of Channel Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

  8.2.8 Elimination of Modular Dependencies . . . . . . . . . . . . . . . . . . . . . . . 244

  8.2.7 Fill-Level Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

  7.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

  7.9.1 Out-of-Order Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

  7.9.2 Application to Complex Graph Topologies . . . . . . . . . . . . . . . . . . . 199

  7.9.3 Memory Channel Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

  7.9.4 Multirate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

  7.9.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

  8.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

  8.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

  8.2.1 Read and Write Order Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

  8.2.2 Memory Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

  8.2.3 Source Address Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

  8.2.4 Virtual Memory Channel Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 226

  8.2.5 Trading Throughput Against Resource Requirements . . . . . . . . . . 233

  8.2.6 Sink Address Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

8.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

  Contents xvii

  

A.2.2 Determination of the Lexicographically Smallest Live

Data Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 A.2.3 Tree Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 A.2.4 Complexity of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

  A.3 Stimulation by Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

B Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

C Formula Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

  List of Figures

1 Book organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

  4

  

3.11 PICO top-level code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

  

4.4 Extract of the architecture template used for the Motion-JPEG decoder. . . . . . . . 85

  4.3 Depiction of the PPM Sink actor from Fig. 4.2 along with the source code of the action f newFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

  

4.2 Actor-oriented model of a Motion-JPEG decoder . . . . . . . . . . . . . . . . . . . . . . . . . 83

  . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

  4.1 ESL design flow using S YSTEM C O

D

ESIGNER

  3.19 Translation of a static affine nested loop program (SANLP) into a Kahn process network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

  

3.18 Communication synthesis in Gaspard2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

  

3.17 Communication synthesis in Omphale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

  

3.16 Tiling operation as discussed in Section 2.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

  

3.15 Loop fusion example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

  

3.14 Nested loop and corresponding lattice model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

  3.13 Simple SDF graph that illustrates some of the challenges occurring during automatic determination of the required communication buffer sizes. . . . . . . . . . 58

  

3.12 Loop-accelerator hardware generated by the PARO compiler . . . . . . . . . . . . . . . . 55

  

3.10 Kernel pseudocode with streaming input and output loops . . . . . . . . . . . . . . . . . . 52

  

2.1 Block diagram of a JPEG2000 encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

  1.1 Overview of a multidimensional design methodology for image processing applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  

3.8 Reuse chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

  

3.7 Modeling of the different running modes in Array-OL . . . . . . . . . . . . . . . . . . . . . 40

  

3.6 Specification of repetitive tasks via tilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

  

3.5 MDSDF delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

  

3.4 MDSDF graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

  

3.3 CSDF model applied to a sliding window application. . . . . . . . . . . . . . . . . . . . . . 31

  

3.2 Example sliding window algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

  

3.1 Structure of static (imperfectly nested) for-loop . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

  15

  

2.5 Resource sharing between different wavelet decomposition levels . . . . . . . . . . . . 15

2.6 Pixel production and consumption order for the JPEG2000 block builder (BB).

  

2.4 Lifting scheme for a one-dimensional wavelet transform. . . . . . . . . . . . . . . . . . . . 14

  

2.3 Sliding window for vertical filtering with downsampling . . . . . . . . . . . . . . . . . . . 13

  

2.2 Wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

  

3.9 PICO target architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 xx List of Figures

  5.1 WDF graph with two actors and a single edge for illustration of the introduced notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

  

6.6 Initial data elements not following the production order . . . . . . . . . . . . . . . . . . . . 147

  

5.24 Extension of the lifting-based wavelet transform by tiling . . . . . . . . . . . . . . . . . . 127

  

5.25 Spatial decorrelation with two decomposition levels . . . . . . . . . . . . . . . . . . . . . . . 128

  

5.26 Lifting-based wavelet resource sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

  

6.1 Extract of a JPEG2000 encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

  

6.2 Live data elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

  

6.3 Iteration vectors and their relation with the communication order . . . . . . . . . . . . 137

  

6.4 Successive moduli technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

  

6.5 Worst-case data element distribution for scenario (2) . . . . . . . . . . . . . . . . . . . . . . 146

  6.7 Worst-case distribution of live data elements for the inverse tiling operation and sequential ASAP scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148