Software support
4. Software support
In order to provide a software support to the HASL formalism we developed Cosmos 3 [ 9 ], a prototype software platform for HASL-based verification. In this section, we describe the Cosmos tool including its architecture and providing as
3 Cosmos is an acronym of the French sentence ‘‘Concept et Outils Statistiques pour le MOdèles Stochastiques’’ whose English translation would sound like: ‘‘Tools and Concepts for Statistical analysis of Stochastic MOdels’’.
12 P. Ballarini et al. / Performance Evaluation (
well a comparison with other platforms featuring statistical model checking functionalities: Prism [ 22 ], Uppaal-smc [ 23 ], Marcie [ 24 ] and Plasma [ 25 ]. We start with a brief summary of confidence interval estimation, the statistical method that Cosmos relies on.
4.1. Confidence interval estimation In statistics, Confidence Interval (CI) estimation is a method for estimating an interval which is likely to contain the exact
value that an (unknown) parameter θ of a certain population may assume. The peculiarity of CI estimation is twofold: (i) it allows for specifying how reliable the estimate should be by choosing the desired confidence level α∈( 0 , 1 ) ; (ii) it allows for specifying how accurate the estimate should be by choosing the desired admitted error bound (i.e. the width δ of the resulting interval). In other words, if we repeatedly estimate the interval for a given θ we are guaranteed that the (possibly
different) resulting intervals will contain the actual value of θ in a proportion corresponding to ( 1 − α) . Static sample size estimation. Originally CI estimation works by collecting (through execution of experiments) a fixed number
n of samples X 1 ,..., X n of the target parameter θ . Sampled values are then used to calculate the interval containing θ . The general form of the 100 ( 1 − α) % confidence-interval for the expected value µ θ of θ , denoted CI α µ θ , is:
CI α µ θ =( PE µ θ )± EB α where PE µ θ is a Point Estimate of µ θ and EB 
α is the Error Bound, which corresponds to the semi-width of the CI interval,
is used as (unbiased) Point Estimator of µ θ . On the other hand the expression for the EB depends on assumptions concerning the nature of target parameter θ (hence of samples X 1 ,..., X n ). Cosmos handles three kinds of variables:
i.e. EB X = δ/ 2. The sample mean X = i = 1 i
• Indicator variables (Bernoulli variables) used for evaluation of unknown probabilities. In this case, we provide an error
bound applying the Clopper–Pearson method [ 26 ].
• Bounded variables used for evaluation of proportions, ratios, mean number of clients in a system with finite capacity, etc.
In this case, we provide an error bound applying the Chernoff–Hoeffding method [ 27 ].
• General variables without any additional knowledge. In this case using the central limit theorem, we provide an asymptotically correct error bound based on an approximation by a normal distribution with unknown mean and variance.
There are three parameters for the interval estimation: the size of the samples, the error bound and the confidence level. In the case of indicator or bounded variables, the user provides two of them and the third one is computed before the sampling. For general variables, the size of the sample, and one more parameter (the error bound or the confidence level) must be given, and the remaining one is computed after the sampling. Thus, the sample size is fixed before the sampling.
Dynamic sample size estimation. When the user wants to a priori provide the error bound and the confidence level for a general variable, the system can perform a dynamic number of samples depending on some stopping condition. While an
exact condition cannot be achieved, Chow and Robbins [ 28 ] provide a stopping condition ensuring that when the error bound goes to 0, then the probability that the unknown expectation belongs to the interval associated with their method converges to the confidence level. Cosmos also offers this functionality. Compared to conservative methods like Clopper–Pearson and Chernoff–Hoeffding ones, the simulation time is significantly reduced, as illustrated in Fig. 6 .
Confidence interval for expressions. The previous paragraphs deal with the case of a single estimation. HASL expressions include an arbitrary number of estimations denoted by operator E [] (see Eq. (1) ). Assume that we have to perform n
estimations for which we get CI [ a i , b i ] and confidence level α i for i = 1 ... n. Then the confidence level of the whole expression is given by α=
i = 1 1 −α i . The CI of the expression is obtained by applying operations on intervals like:
[ a , b ]+[ a ′ , b ′ ]=[ a + a ′ , b + b ′ ], [ a , b ]−[ a ′ , b ′ ]=[ a − b ′ , b − a ′ ] , [ a , b ]×[ a ′ , b ′ ]=[ min ( aa ′ , ab ′ , ba ′ , bb ′ ), max ( aa ′ , ab ′ , ba ′ , bb ′ )] , etc.
4.2. Hypothesis testing When one is not interested in the actual value of a statistic but rather wants to decide whether this value is above or
below a threshold, hypothesis testing methods can be used. For a Bernoulli variable of unknown mean p, and given two probabilities p 0 < p 1 (the maximal probability of false positive and true negative results), the Sequential Probability Ratio
Test (SPRT) [ 29 ] is an optimal sequential test for deciding whether p ≥ p 0 or p ≤ p 1 holds.
4.3. The Cosmos tool Cosmos code generation scheme. Cosmos is implemented in C ++ and relies on the Boost libraries for random number
generation functionalities. The tool is designed according to a model driven code generation scheme ( Fig. 5 ): the inputs D and
A are parsed in order to generate a C ++ code that implements the simulation of the synchronised product of the inputs. The code generator takes advantage of the structure of the GSPN and the LHA to produce an efficient code.
P. Ballarini et al. / Performance Evaluation (
Fig. 5. Cosmos’s model-driven code generation scheme.
More precisely the code generation works as follows: From the GSPN given as input the tool produces C ++ functions for firing transitions, checking whether a transition is enabled and computing the probability distributions. For any transition t, the set of transitions that might become enabled or disabled due to the firing of t are also generated. These informations are obtained by an analysis of the structure of the Petri net. They significantly increase the speed of simulation by testing only a small subset of transitions after each firing.
In order to perform an efficient synchronisation of the LHA with the GSPN, the tool generates data that links synchronised transitions of the automaton with transitions of the net. Checking the firing of an autonomous transition of the LHA requires to generate a function that manages a system of linear equations associated with the transition. Due to the syntactical constraints for autonomous transitions, the function computes the exact firing time of such a transition. A timed integral occurring in a formula of HASL is also efficiently determined thanks to the linear constraint requirement.
The generated code is linked with a library containing parts of the simulator that are independent of the model and the formula. This library contains the main function that determines the next event to occur by means of an event heap and the generated code. In addition a pseudo random generator computes delays for the new events that are put into the heap.
Cosmos launches several copies of the resulting executable code in parallel that repeatedly generate trajectories and send back the evaluation of the formulas on these trajectories. Cosmos aggregates these evaluations and stops the simulation depending on the selected statistical method (see above). The compilation time is generally negligible compared to the simulation time.
Input and output files. Cosmos uses the XML-based file format of CosyVerif [ 30 ] named GrML both for the GSPN and the LHA. Cosmos can output results in different ways. By default the mean value and confidence interval of each HASL formula is written in a text file with some statistics about the simulation (number of generated paths, execution time, etc.). Several options allow to output intermediate results, traces of simulation and graphics.
Interface. Cosmos interface can be either a command line tool or a graphical user interface. The command line requires as parameters the path to the GSPN and LHA files and several options which set the statistical parameters and the output
format. Cosmos is integrated into the CosyVerif [ 30 ] platform which provides a graphical interface. HASL implementation. Cosmos implements a slight extension of HASL. This extension does not enhance the expressive
power of HASL but adds some macros allowing more compact syntax. First, the operator VAR is provided as a macro for
E [ Y 2 ]− E [ Y ] 2 . Second, two macros allow to compute PDF (Probability Density Function) and CDF (Cumulative Density Function). Their syntax is PDF ( Y , step , start , end ) (respectively CDF ( Y , step , start , end ) ) where Y is an expression defined like in (1) . PDF (resp. CDF ) is translated by Cosmos into several HASL formulas, each of them compute the probability for
Y ∈[ start + i · step ; start +( i + 1 )· step [ (resp. Y ≥ start + i · step) with i <( end − start )/ step. An example of their use is provided in the second case study. HASL and Cosmos have also extended to handle symmetric stochastic nets [ 31 ]. In this setting the HASL logic contains new operators and variables to express quantities defined on the numbers of coloured tokens in the places of the net.
4.4. Related tools Numerous tools are available for performing SMC, some of them also performing numerical model checking. Here is a
non exhaustive list of tools freely available for universities: Cosmos [ 32 ], Plasma [ 33 ], Prism [ 34 ], Uppaal [ 35 ], Marcie [ 24 ], Apmc [ 36 ], Ymer [ 37 ], Mrmc [ 38 ] and Vesta [ 39 ]. As Apmc is partly integrated in Prism, we have discarded it. Since 2011, Mrmc has not been updated and the corresponding team seems to use Uppaal. Finally, the link for downloading Vesta is not valid anymore. So we focus on the following tools: Ymer, Prism, Uppaal, Plasma, Marcie and Cosmos.
Ymer is a statistical model checker for CTMCs and generalised semi-Markov processes described using the Prism language. Its property specification language is a fragment of CSL without the steady-state operator but including the unbounded Until .
Prism is a tool for performing model checking on probabilistic models that has been used for numerous applications. The numerical part of Prism can analyse discrete and continuous time Markov chains, Markov decision processes and probabilistic timed automata. The statistical part only deals with Markov chains as it cannot handle nondeterminism. The Prism language defines a probabilistic system as a synchronised product between reactive modules, thus can describe large systems in a compact way. The verification procedures of Prism take as input a wide variety of languages for the specification of properties. Most of them are based on CSL or PCTL.
14 P. Ballarini et al. / Performance Evaluation (
Uppaal is a verification tool including many formalisms: timed automata, timed games, priced timed automata, etc. It supports automata-based and game-based verification techniques and has shown its ability to analyse large scale applications. It has recently been enriched with a statistical model checker engine to verify timed systems with a stochastic semantics. The specification language is PLTL (i.e. an adaptation of LTL with path operators substituted for quantifiers) with bounded Until .
Plasma is a platform dedicated for statistical model checking. It accepts the Prism language but extended with more general distributions and a dedicated biological language for the models. The property specification language is a restricted version of PLTL with a single threshold operator. Furthermore, Plasma is built with a plugin system allowing a developer to extend it, and it can be integrated in another software via a library.
Marcie is a tool for qualitative and quantitative analysis of Generalised Stochastic Petri nets. It relies on Interval Decision Diagram (IDD) to represent symbolically the state space of the Petri net. The implementation of IDD is mostly parallel, taking advantages of multicore architectures. It has been recently extended with a simulation engine for the model checking of PLTL formulas. Like Cosmos, Marcie can deal with unbounded until properties as long as the user guarantees the termination. It has been developed for the study of chemical reaction networks and thus facilitates the modelling of such systems.
Discussion. The formalisms are characterised by different features: model specifications (dedicated languages in Prism or standard formalisms in Cosmos, Marcie and Uppaal), expressiveness of the formalisms (supported distributions, presence of clocks, etc.) and dedicated application based languages (e.g. for biological systems in Plasma and Marcie). While the probabilistic extension of Uppaal is similar to Cosmos, the differences come essentially from the initial motivations. As Uppaal intends to attribute a stochastic semantics to originally non probabilistic timed automata, the available probability distributions are restricted (exponential for transitions having guards without upper time bound, and uniform otherwise). Cosmos which is constructed as a statistical model checker provides several well known distributions (exponential, normal, deterministic, uniform, gamma, etc.). Even if any general distribution can be approximated by combining several exponential distributions, the simulation cost increases with the required accuracy. Marcie only supports exponential and immediate distributions.
The property specification is expressed by a formula for most of the tools but it is a combination of an expression and an automaton for Cosmos. The available operators in formulas often exclude unbounded Until and nesting of probabilistic operators (only possible for Markovian models). Moreover the evolution of the time and data variables are subject to some restrictions.
4.5. Tool evaluation We performed several experiments aimed at evaluating Cosmos both in terms of accuracy and runtime. For this purpose
we consider two popular workbench models. The first one, a tandem queuing system (TQS), is available on the Prism web page [ 22 ], the second one is a model of dining philosophers (DPM). We run experiments with Cosmos, Prism (version 4.0.2), Uppaal-smc (version 4.1.13), Plasma (version 1.1.4), Ymer (version 3.1) and Marcie (version 1178M). These are the stable version available from the corresponding web sites.
5, the arrival rate: λ=
The TQS is a M / Cox 2 / 1 queue composed with a M / M / 1 queue. For the experiments, we use queue capacities: N =
20, the service rates of the first phase in the first queue: µ 1 = 0 . 2 ,µ ′ 1 = 1 . 8 (for clients without second service phase), the service rate of the second phase in the first queue: µ 2 = 2 and the service rate in the second queue: κ= 4 as in [ 34 ]. The DPM is a mutual exclusion problem where N philosophers are sitting around a table. Initially thinking, they can decide to eat by taking two forks shared with their right and left neighbours. However a contention problem may arise due to the sharing of resources (forks). For the experiments, the rate of all exponential distributions is chosen as 10. Prism supports Chernoff–Hoeffding, SPRT methods and the sequential confidence interval computation using approximations to Student and Normal laws. Uppaal and Plasma support Chernoff–Hoeffding and SPRT methods while Ymer only supports SPRT method. Marcie uses a static sample size algorithm which is not described in the manual. So for experiments with Marcie, we set the number of samples to the one required by Chernoff–Hoeffding method. Cosmos provides Chernoff–Hoeffding, Chow–Robbins and Gaussian methods for the sequential confidence interval estimation and the SPRT method.
For the TQS model we consider the following time-bounded reachability measure: φ TQS ≡ The probability that the first queue in the tandem gets full within time T . For the dining philosopher model we consider the following measure: φ DPM ≡ The probability to reach a deadlock state before N philosophers eat. A deadlock occurs when all philosophers have taken one fork. Both properties can be straightforwardly encoded in CSL and HASL, hence equivalent verification experiments can be performed by all tools.
Experiment settings. We have set the following statistical parameters: the confidence level is 0.95 and the width of confidence interval is 0.005, the probability of error is 0.005 for the hypothesis testing and the width of the indifference region is 0.001. To fulfil these parameters, tools must generate large number of trajectories. The other parameters have been set to
their default value. Most tools can take advantage of parallelisation but for simplicity we use only one processor 4 for the comparison.
4 These experiments have been executed on a MacBook Pro, with processor 2.4 GHz Intel Core 2 Duo.
P. Ballarini et al. / Performance Evaluation (
(a) The philosophers model.
(b) The TQS model.
Fig. 6. Comparison of simulation time for probability measures.
Fig. 7. Runtime for a probability measure of the TQS model for sequential testing.
DPM experiments. Fig. 6 (a) refers to the runtime for the DPM as a function of the number of philosophers. Cosmos is the fastest of the tools using Chernoff–Hoeffding bounds. For 100 philosophers, Marcie is 1.4 times slower, Uppaal is 1.5 times slower, Plasma is 1.9 times slower, and Prism-Apmc is 2.5 times slower. Among the tools using sequential procedures, the two versions of Prism have similar runtime and Cosmos is up to 1.9 times faster.
TQS experiments with confidence intervals. Results about the runtime comparison with different time bounds T are reported in Fig. 6 (b). There are two kinds of behaviours for tools depending on the applied statistic method. For the first one that corresponds to the Chernoff–Hoeffding method, the simulation time is increasing with the time bound T . About 295 000 trajectories are required to obtain the specified confidence interval. For the second one that corresponds to sequential confidence interval methods, the required number of samplings decreases when the time bound goes to infinity. This phenomenon is a consequence of the evolution of the satisfaction probability of φ TQS that goes to 1 when T goes to infinity.
Cosmos is again the fastest of the tools using Chernoff–Hoeffding method. When the time bound is 200, Plasma is approximatively 2.6 times slower, Marcie is 3.7 times slower, Uppaal is 4.2 times slower and Prism-Apmc is 6.5 times slower (see Fig. 7 ). Among the tools using sequential procedures, when the time bound is 40 the two versions of Prism have similar runtime and Cosmos is up to 2.8 times faster.
4.5.1. TQS experiments with sequential hypothesis testing Results on hypothesis testing are reported in Table 1 . Each value is the mean over 100 experiments. The threshold value
for the hypothesis is always very close the numerical value in order to increase the number of trajectories performed by tools. Results confirm that hypothesis testing methods are faster than confidence interval based methods. The number of trajectories generated by all tools are similar to each other. In most cases Cosmos is the fastest, Ymer being up to 1.8 times slower, Uppaal 2.8 times slower, and Prism 6 times slower.
Accuracy comparison. To assess the accuracy of Cosmos we compared its output with the one produced by Prism via both its numerical engine (Prism-n) and its statistical engine (Prism-s) 5 : results indicate that Cosmos and Prism-s are comparably
5 Experiments were executed with confidence level = 99.99%, interval width = 0.01.
16 P. Ballarini et al. / Performance Evaluation (
Table 1
Runtime comparison for the TQS for sequential testing. Time
accurate with the estimated intervals always containing the (actual) value obtained with the numerical engine of Prism. Using the value computed by the numerical engine of Prism, we perform a coverage test of Cosmos: we compute the ratio of simulations that return a confidence interval which contains the real value. This ratio is always close to the confidence level.
