workflow escience elsiv
August 31 2006, Elsevier, Amsterdam
Scientific Workflows
in e-Science
Dr Zhiming Zhao
System and Network Engineering, University of Amsterdam
(2)
August 31 2006, Elsevier, Amsterdam
Outline
Background
Scientific workflow management system
Virtual Laboratory for e-Science
Our approach
Challenges and research lines
(3)
August 31 2006, Elsevier,
Problem solving: a typical scenario in
scientific research
• Analysis • Hypothesis • Related work
• Propose experiments
• Define steps
• Prototype computing systems
• Perform experiments • Data collection
• Visualization • Validation
• Adjust experiment • Refine hypothesis
• Presentation • Dissemination
Define problems
Experiments
Data analysis
Discovery
Activities are:
- Iterative, dynamic, and human centered
(4)
August 31 2006, Elsevier, Amsterdam
Example scenarios
In problem analysis
Identify domains, search key problems, find typical methods, and
review related work
In scientific experiments
: scientific computing & data processing
Define dependencies between computing and data processing tasks,
and schedule their runtime behavior
In data analysis
Visualization, compare the results of different parameters, keep
meaningful configuration and continue experiments
Search related work, compare results
In dissemination
(5)
August 31 2006, Elsevier,
Computer support for problem solving
Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994)
• Organize different software components/ tools
• Allows a user to assemble these tools at a high level of abstraction • Control runtime behavior of experiments
• Examples: MATLab, Ptolemy, etc.
Traditional PSE:
organize and execute resources locally!
Distributed resources
Distributed Parallel computing
Visualization, Remote resource
invocation
Distributed data sharing &
dissemination
Scientific workflow management systems:
A new guise of PSE!
(6)
August 31 2006, Elsevier, Amsterdam
Inside a Scientific Workflow Management System
In our view, a SWMS
at least
implements:
A model
for
describing
workflows;
An engine
for
executing/managi
ng workflows;
Different levels
of support
for a
user to compose,
execute and
control a
workflow.
Workflow (based on certain model)
Engine Use
r s
u
p
p
o
rt
resources
Composition Engine level control
Resource level control
(7)
August 31 2006, Elsevier,
Scientific Workflows in e-Science
Workflows varies at different
Phases of
experiments: design, runtime control,
dissemination;
Abstractions of
resources: concrete and abstract;
Levels of activity
details: computing, data access, search/matching, human activities; … Experiment processes Abstract workflows
Executable (concrete workflows)
W o rk flo w s f o r a d m in is tra tio n , e .g ., A A A , an d o th er is su es .
(8)
August 31 2006, Elsevier, Amsterdam
Diversity in SWMS
Taverna:
-Web services based language: Scufl;
-FreeFluo: engine
-Graphical viz of workflow
Kepler:
-Actor,director
-MoML
-Execution models
Triana: -Components
-Task graph
-Data/control flow
DAGMan: -Computing tasks
-DAG
Pegasus:
-Based on DAGMan
-VDL
-DAG
…
(9)
August 31 2006, Elsevier,
Virtual Laboratory for e-Science
D u tc h te le sc ie n c
e Data
i n te n si v e sc ie n ce M e d ic a l d ia g n o si s
Generic
e
-science framework layer
Application layer
B io in fo rm a ti cs ASP
B io d iv e rs it y Fo o d In fo rm a ti cs
Grid layer
(10)
August 31 2006, Elsevier, Amsterdam
Mission
Effectively reuse
existing workflow managements
systems
, and
provide a generic e-Science framework
for different application domains.
A generic framework can
Improve the reuse of workflow components and the
workflows for different experiments
Reduce the learning cost for different systems
Allow application users to work on a consistent
(11)
August 31 2006, Elsevier,
Previous work: VLAM-G environment
VLAM-G
A Grid enable PSE
Data intensive
applications
Visual interface
Two levels of workflow
support
Human interaction
(12)
August 31 2006, Elsevier, Amsterdam
(13)
August 31 2006, Elsevier,
Experiment Topology
– Graphical representation of self-contained data processing modules attached to each other in a workflow. hasExperiments (NOREUSE) hasSteps (NOREUSE) PROJECT
(LINK) EXPERIMENT(COPY)
COMMENT (COPY) hasComments (COPY) OWNER (LINK) hasOwner LINK CONTRIBUTOR (LINK) isPartOfProject (NOREUSE) ownsExperiments (NOREUSE) hasContributors (LINK) contributedExperiments (NOREUSE) EXPERIMENT (LINK) hasNextExperiment (NOREUSE) hasPrevExperiment (NOREUSE) isPartOfExperiment (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) ARRAY MEASUREMENT (COPY) COMMENT (COPY) hasComments (COPY) PROPERTY (COPY) hasProperties (COPY) OWNER (LINK) isPerformedBy (LINK) hasPerformed (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) hasNextStep (NOREUSE) hasPrevStep (NOREUSE) DATA ANALYSIS (COPY) hasExperiments (NOREUSE) hasSteps (NOREUSE) PROJECT (LINK) EXPERIMENT (COPY) COMMENT (COPY) hasComments (COPY) OWNER (LINK) hasOwner LINK CONTRIBUTOR (LINK) isPartOfProject (NOREUSE) ownsExperiments (NOREUSE) hasContributors (LINK) contributedExperiments (NOREUSE) EXPERIMENT (LINK) hasNextExperiment (NOREUSE) hasPrevExperiment (NOREUSE) isPartOfExperiment (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) ARRAY MEASUREMENT (COPY) COMMENT (COPY) hasComments (COPY) PROPERTY (COPY) hasProperties (COPY) OWNER (LINK) isPerformedBy (LINK) hasPerformed (NOREUSE) COMMENTATOR (LINK) isMadeBy (LINK) hasNextStep (NOREUSE) hasPrevStep (NOREUSE) DATA ANALYSIS (COPY)
Process-Flow Template
– Graphical representation of data elements
and processing steps in an experimental procedure.
Study
–
Descriptions of experimental steps represented as an instance of a PFT with references toexperiment topologies.
(14)
August 31 2006, Elsevier, Amsterdam
Lessons learned
How to introduce a new PSE to a domain
scientist?
Because it has a beautiful architecture?
Or because it can allow a scientist to keep their
current work style?
How to use existing work?
Scientists need one system or more options?
How to include user in the computing loop?
Dynamic workflows and human in the loop computing
are important.
Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia
(15)
August 31 2006, Elsevier,
Workflow support in VL-e
Recommend suitable workflow systems for
different application domains:
Analyze typical application use cases
Define small projects with different application
domains
Review existing workflow systems
Recommend four workflow systems: Triana, Taverna,
Kepler, and VLAMG
A long term
Extend VLAMG and develop our own generic workflow
(16)
August 31 2006, Elsevier, Amsterdam
A workflow bus paradigm
Workflow bus
Taverna Kepler Triana
Sub workflow 1
Sub workflow 2
Sub workflow 3
Workflow
A workflow bus is a special workflow system for executing meta workflows, in which sub workflows will be executed by different engines.
(17)
August 31 2006, Elsevier,
Applications of workflow bus
Use case 1:
A user has workflow in Taverna
Some functionality is missing in Taverna but can be
provided by Triana
He can develop the workflow in two systems, and run
it via the workflow bus
Use case 2:
A user wants to execute a Taverna or Triana workflow
in multiple instances with different input data
(18)
August 31 2006, Elsevier, Amsterdam
Ongoing research
Web service in data intensive applications
Execution models for Grid workflows
Including PSE in scientific workflows
(19)
August 31 2006, Elsevier,
Relevance between our research
and Elsevier’s work
In a same context from the scale of entire
lifecycle of e-Science experiments
Different focuses
We focus on runtime behavior of scientific
experiments, e.g., Grid computing, data/computing
intensive applications, and scheduling of computing
tasks
Elsevier highlights data search and integration on well
structured data bases, research preparation, and
literature search and management
(20)
August 31 2006, Elsevier, Amsterdam
Cont.
Different characteristics in workflows
In our workflows, processing and managing runtime dynamic
data is the key patterns
In Elsevier workflows, storage, replicate, access, match and
integrate static data might be more common
Facing similar challenges:
Semantics based data search and integration
Workflow provenance
Collaborative interaction (workflow development, resource
sharing, knowledge transfer)
(21)
August 31 2006, Elsevier,
Activities
Int’l workshop on “
Workflow systems in e-Science”
, organized by
Zhiming Zhao
and
Adam Belloum
, in the context of ICCS06, Reading
University, May 28, 2006.
Proceedings is in LNCS, Springer Verlag.
A special issue will be published in Scientific Programming Journal. http://staff.science.uva.nl/~zhiming/iccs-wses
Workshop on
“Scientific workflows and industrial workflow standards in
e-Science
”
, organized by
Adam Belloum
and
Zhiming Zhao
, in the
context of IEEE e-Science and Grid computing conference in
Amsterdam December 2006.
Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South
California)
BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of
California, Davis)
Taverna, Prof. Peter Rice (European Bioinformatics Institute)
WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of
Pi4 Technologies)
Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University)
(22)
August 31 2006, Elsevier, Amsterdam
References
1. Virtual Laboratory for e-Science: www.vl-e.nl
2. Network and System Engineering, Faculty of Science, University of Amsterdam:
http://www.science.uva.nl/research/sne/
3. Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic
Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the
5th International Conference on Computer and Information Technology (CIT2005),
pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005.
4. Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O.
Hertzberger: Scientific workflow management: between generality and
applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows in conjunction with the 5th International Conference on
Quality Software, pp. 357-364. IEEE Computer Society Press, Melbourne,
Australia , September 19th-21st 2005.
5. Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and
scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence
(ICTAI05), pp. 19-23. IEEE Computer Society Press, Hongkong, China, November
(1)
Applications of workflow bus
Use case 1:
A user has workflow in Taverna
Some functionality is missing in Taverna but can be
provided by Triana
He can develop the workflow in two systems, and run
it via the workflow bus
Use case 2:
A user wants to execute a Taverna or Triana workflow
(2)
August 31 2006, Elsevier,
Ongoing research
Web service in data intensive applications
Execution models for Grid workflows
Including PSE in scientific workflows
(3)
Relevance between our research
and Elsevier’s work
In a same context from the scale of entire
lifecycle of e-Science experiments
Different focuses
We focus on runtime behavior of scientific
experiments, e.g., Grid computing, data/computing
intensive applications, and scheduling of computing
tasks
Elsevier highlights data search and integration on well
structured data bases, research preparation, and
literature search and management
(4)
August 31 2006, Elsevier,
Cont.
Different characteristics in workflows
In our workflows, processing and managing runtime dynamic
data is the key patterns
In Elsevier workflows, storage, replicate, access, match and
integrate static data might be more common
Facing similar challenges:
Semantics based data search and integration
Workflow provenance
Collaborative interaction (workflow development, resource
sharing, knowledge transfer)
(5)
Activities
Int’l workshop on “
Workflow systems in e-Science”
, organized by
Zhiming Zhao
and
Adam Belloum
, in the context of ICCS06, Reading
University, May 28, 2006.
Proceedings is in LNCS, Springer Verlag.
A special issue will be published in Scientific Programming Journal. http://staff.science.uva.nl/~zhiming/iccs-wses
Workshop on
“Scientific workflows and industrial workflow standards in
e-Science
”
, organized by
Adam Belloum
and
Zhiming Zhao
, in the
context of IEEE e-Science and Grid computing conference in
Amsterdam December 2006.
Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South
California)
BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of
California, Davis)
Taverna, Prof. Peter Rice (European Bioinformatics Institute)
(6)
August 31 2006, Elsevier,
References
1. Virtual Laboratory for e-Science: www.vl-e.nl
2. Network and System Engineering, Faculty of Science, University of Amsterdam:
http://www.science.uva.nl/research/sne/
3. Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic
Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the 5th International Conference on Computer and Information Technology (CIT2005), pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005. 4. Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O.
Hertzberger: Scientific workflow management: between generality and
applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows in conjunction with the 5th International Conference on Quality Software, pp. 357-364. IEEE Computer Society Press, Melbourne, Australia , September 19th-21st 2005.
5. Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and
scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence
(ICTAI05), pp. 19-23. IEEE Computer Society Press, Hongkong, China, November 14th-16th 2005.