workflow escience elsiv

(1)

August 31 2006, Elsevier, Amsterdam

Scientific Workflows

in e-Science

Dr Zhiming Zhao

([email protected])

System and Network Engineering, University of Amsterdam


(2)

August 31 2006, Elsevier, Amsterdam

Outline

Background

Scientific workflow management system

Virtual Laboratory for e-Science

Our approach

Challenges and research lines


(3)

August 31 2006, Elsevier,

Problem solving: a typical scenario in

scientific research

• Analysis • Hypothesis • Related work

• Propose experiments

• Define steps

• Prototype computing systems

• Perform experiments • Data collection

• Visualization • Validation

• Adjust experiment • Refine hypothesis

• Presentation • Dissemination

Define problems

Experiments

Data analysis

Discovery

Activities are:

- Iterative, dynamic, and human centered


(4)

August 31 2006, Elsevier, Amsterdam

Example scenarios

In problem analysis

 Identify domains, search key problems, find typical methods, and

review related work

In scientific experiments

: scientific computing & data processing

 Define dependencies between computing and data processing tasks,

and schedule their runtime behavior

In data analysis

 Visualization, compare the results of different parameters, keep

meaningful configuration and continue experiments

 Search related work, compare results

In dissemination


(5)

August 31 2006, Elsevier,

Computer support for problem solving

Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994)

• Organize different software components/ tools

• Allows a user to assemble these tools at a high level of abstraction • Control runtime behavior of experiments

• Examples: MATLab, Ptolemy, etc.

Traditional PSE:

organize and execute resources locally!

Distributed resources

Distributed Parallel computing

Visualization, Remote resource

invocation

Distributed data sharing &

dissemination

Scientific workflow management systems:

A new guise of PSE!


(6)

August 31 2006, Elsevier, Amsterdam

Inside a Scientific Workflow Management System

In our view, a SWMS

at least

implements:

A model

for

describing

workflows;

An engine

for

executing/managi

ng workflows;

Different levels

of support

for a

user to compose,

execute and

control a

workflow.

Workflow (based on certain model)

Engine Use

r s

u

p

p

o

rt

resources

Composition Engine level control

Resource level control


(7)

August 31 2006, Elsevier,

Scientific Workflows in e-Science

Workflows varies at different

Phases of

experiments: design, runtime control,

dissemination;

Abstractions of

resources: concrete and abstract;

Levels of activity

details: computing, data access, search/matching, human activities; Experiment processes Abstract workflows

Executable (concrete workflows)

W o rk flo w s f o r a d m in is tra tio n , e .g ., A A A , an d o th er is su es .


(8)

August 31 2006, Elsevier, Amsterdam

Diversity in SWMS

Taverna:

-Web services based language: Scufl;

-FreeFluo: engine

-Graphical viz of workflow

Kepler:

-Actor,director

-MoML

-Execution models

Triana: -Components

-Task graph

-Data/control flow

DAGMan: -Computing tasks

-DAG

Pegasus:

-Based on DAGMan

-VDL

-DAG


(9)

August 31 2006, Elsevier,

Virtual Laboratory for e-Science

D u tc h te le sc ie n c

e Data

i n te n si v e sc ie n ce M e d ic a l d ia g n o si s

Generic

e

-science framework layer

Application layer

B io in fo rm a ti c

s ASP

B io d iv e rs it y Fo o d In fo rm a ti cs

Grid layer


(10)

August 31 2006, Elsevier, Amsterdam

Mission

Effectively reuse

existing workflow managements

systems

, and

provide a generic e-Science framework

for different application domains.

A generic framework can

Improve the reuse of workflow components and the

workflows for different experiments

Reduce the learning cost for different systems

Allow application users to work on a consistent


(11)

August 31 2006, Elsevier,

Previous work: VLAM-G environment

VLAM-G

A Grid enable PSE

Data intensive

applications

Visual interface

Two levels of workflow

support

Human interaction


(12)

August 31 2006, Elsevier, Amsterdam


(13)

August 31 2006, Elsevier,

Experiment Topology

– Graphical representation of self-contained data processing modules attached to each other in a workflow. hasExperiments (NOREUSE) hasSteps (NOREUSE) PROJECT

(LINK) EXPERIMENT(COPY)

COMMENT (COPY) hasComments (COPY) OWNER (LINK) hasOwner LINK CONTRIBUTOR (LINK) isPartOfProject (NOREUSE) ownsExperiments (NOREUSE) hasContributors (LINK) contributedExperiments (NOREUSE) EXPERIMENT (LINK) hasNextExperiment (NOREUSE) hasPrevExperiment (NOREUSE) isPartOfExperiment (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) ARRAY MEASUREMENT (COPY) COMMENT (COPY) hasComments (COPY) PROPERTY (COPY) hasProperties (COPY) OWNER (LINK) isPerformedBy (LINK) hasPerformed (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) hasNextStep (NOREUSE) hasPrevStep (NOREUSE) DATA ANALYSIS (COPY) hasExperiments (NOREUSE) hasSteps (NOREUSE) PROJECT (LINK) EXPERIMENT (COPY) COMMENT (COPY) hasComments (COPY) OWNER (LINK) hasOwner LINK CONTRIBUTOR (LINK) isPartOfProject (NOREUSE) ownsExperiments (NOREUSE) hasContributors (LINK) contributedExperiments (NOREUSE) EXPERIMENT (LINK) hasNextExperiment (NOREUSE) hasPrevExperiment (NOREUSE) isPartOfExperiment (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) ARRAY MEASUREMENT (COPY) COMMENT (COPY) hasComments (COPY) PROPERTY (COPY) hasProperties (COPY) OWNER (LINK) isPerformedBy (LINK) hasPerformed (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) hasNextStep (NOREUSE) hasPrevStep (NOREUSE) DATA ANALYSIS (COPY)

Process-Flow Template

– Graphical representation of data elements

and processing steps in an experimental procedure.

Study

Descriptions of experimental steps represented as an instance of a PFT with references to

experiment topologies.


(14)

August 31 2006, Elsevier, Amsterdam

Lessons learned

How to introduce a new PSE to a domain

scientist?

Because it has a beautiful architecture?

Or because it can allow a scientist to keep their

current work style?

How to use existing work?

Scientists need one system or more options?

How to include user in the computing loop?

Dynamic workflows and human in the loop computing

are important.

Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia


(15)

August 31 2006, Elsevier,

Workflow support in VL-e

Recommend suitable workflow systems for

different application domains:

Analyze typical application use cases

Define small projects with different application

domains

Review existing workflow systems

Recommend four workflow systems: Triana, Taverna,

Kepler, and VLAMG

A long term

Extend VLAMG and develop our own generic workflow


(16)

August 31 2006, Elsevier, Amsterdam

A workflow bus paradigm

Workflow bus

Taverna Kepler Triana

Sub workflow 1

Sub workflow 2

Sub workflow 3

Workflow

A workflow bus is a special workflow system for executing meta workflows, in which sub workflows will be executed by different engines.


(17)

August 31 2006, Elsevier,

Applications of workflow bus

Use case 1:

A user has workflow in Taverna

Some functionality is missing in Taverna but can be

provided by Triana

He can develop the workflow in two systems, and run

it via the workflow bus

Use case 2:

A user wants to execute a Taverna or Triana workflow

in multiple instances with different input data


(18)

August 31 2006, Elsevier, Amsterdam

Ongoing research

Web service in data intensive applications

Execution models for Grid workflows

Including PSE in scientific workflows


(19)

August 31 2006, Elsevier,

Relevance between our research

and Elsevier’s work

In a same context from the scale of entire

lifecycle of e-Science experiments

Different focuses

We focus on runtime behavior of scientific

experiments, e.g., Grid computing, data/computing

intensive applications, and scheduling of computing

tasks

Elsevier highlights data search and integration on well

structured data bases, research preparation, and

literature search and management


(20)

August 31 2006, Elsevier, Amsterdam

Cont.

Different characteristics in workflows

In our workflows, processing and managing runtime dynamic

data is the key patterns

In Elsevier workflows, storage, replicate, access, match and

integrate static data might be more common

Facing similar challenges:

Semantics based data search and integration

Workflow provenance

Collaborative interaction (workflow development, resource

sharing, knowledge transfer)


(21)

August 31 2006, Elsevier,

Activities

Int’l workshop on “

Workflow systems in e-Science”

, organized by

Zhiming Zhao

and

Adam Belloum

, in the context of ICCS06, Reading

University, May 28, 2006.

 Proceedings is in LNCS, Springer Verlag.

 A special issue will be published in Scientific Programming Journal.  http://staff.science.uva.nl/~zhiming/iccs-wses

Workshop on

“Scientific workflows and industrial workflow standards in

e-Science

, organized by

Adam Belloum

and

Zhiming Zhao

, in the

context of IEEE e-Science and Grid computing conference in

Amsterdam December 2006.

Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South

California)

BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory)Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of

California, Davis)

Taverna, Prof. Peter Rice (European Bioinformatics Institute)

WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of

Pi4 Technologies)

Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University)


(22)

August 31 2006, Elsevier, Amsterdam

References

1. Virtual Laboratory for e-Science: www.vl-e.nl

2. Network and System Engineering, Faculty of Science, University of Amsterdam:

http://www.science.uva.nl/research/sne/

3. Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic

Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the

5th International Conference on Computer and Information Technology (CIT2005),

pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005.

4. Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O.

Hertzberger: Scientific workflow management: between generality and

applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows in conjunction with the 5th International Conference on

Quality Software, pp. 357-364. IEEE Computer Society Press, Melbourne,

Australia , September 19th-21st 2005.

5. Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and

scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence

(ICTAI05), pp. 19-23. IEEE Computer Society Press, Hongkong, China, November


(1)

Applications of workflow bus

Use case 1:

A user has workflow in Taverna

Some functionality is missing in Taverna but can be

provided by Triana

He can develop the workflow in two systems, and run

it via the workflow bus

Use case 2:

A user wants to execute a Taverna or Triana workflow


(2)

August 31 2006, Elsevier,

Ongoing research

Web service in data intensive applications

Execution models for Grid workflows

Including PSE in scientific workflows


(3)

Relevance between our research

and Elsevier’s work

In a same context from the scale of entire

lifecycle of e-Science experiments

Different focuses

We focus on runtime behavior of scientific

experiments, e.g., Grid computing, data/computing

intensive applications, and scheduling of computing

tasks

Elsevier highlights data search and integration on well

structured data bases, research preparation, and

literature search and management


(4)

August 31 2006, Elsevier,

Cont.

Different characteristics in workflows

In our workflows, processing and managing runtime dynamic

data is the key patterns

In Elsevier workflows, storage, replicate, access, match and

integrate static data might be more common

Facing similar challenges:

Semantics based data search and integration

Workflow provenance

Collaborative interaction (workflow development, resource

sharing, knowledge transfer)


(5)

Activities

Int’l workshop on “

Workflow systems in e-Science”

, organized by

Zhiming Zhao

and

Adam Belloum

, in the context of ICCS06, Reading

University, May 28, 2006.

 Proceedings is in LNCS, Springer Verlag.

 A special issue will be published in Scientific Programming Journal.  http://staff.science.uva.nl/~zhiming/iccs-wses

Workshop on

“Scientific workflows and industrial workflow standards in

e-Science

, organized by

Adam Belloum

and

Zhiming Zhao

, in the

context of IEEE e-Science and Grid computing conference in

Amsterdam December 2006.

Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South

California)

BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory)Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of

California, Davis)

Taverna, Prof. Peter Rice (European Bioinformatics Institute)


(6)

August 31 2006, Elsevier,

References

1. Virtual Laboratory for e-Science: www.vl-e.nl

2. Network and System Engineering, Faculty of Science, University of Amsterdam:

http://www.science.uva.nl/research/sne/

3. Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic

Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the 5th International Conference on Computer and Information Technology (CIT2005), pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005. 4. Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O.

Hertzberger: Scientific workflow management: between generality and

applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows in conjunction with the 5th International Conference on Quality Software, pp. 357-364. IEEE Computer Society Press, Melbourne, Australia , September 19th-21st 2005.

5. Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and

scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence

(ICTAI05), pp. 19-23. IEEE Computer Society Press, Hongkong, China, November 14th-16th 2005.