Choosing the Best Model for

13 Choosing the Best Model for

Fisheries Assessment PER SPARRE AND PAUL J.B. HART

13.1 INTRODUCTION

approach here is to provide a suite of examples, which we believe to be representative of con-

Fisheries scientists have developed a suite of temporary fisheries models. Throughout we have models for a variety of purposes ranging from fun- tried to follow three general principles: (1) models damental descriptions of fish growth to long-term should be as simple as possible, (2) the complexity predictions of population dynamics. While many of a model should match the objective and the of the chapters in this volume review the basic available data, and (3) models should be judged by concepts, a full description of these could easily their utility rather than by the beauty of the math- have occupied both volumes of this book. For de- ematics, because they are tools, not objectives in tailed reviews of fisheries models we recommend themselves. the excellent books by Hilborn and Walters (1992)

Mathematical models are used to distil a com- and Quinn and Deriso (1999), and for a more gen- plicated system into a simpler one that we can read- eral overview, we recommend Jennings et al. ily understand and use for predictions. Our ability (2001). It is also still worth consulting Ricker to model the real world depends on our ability to (1975). Our objective in this chapter is more mod- collect adequate data. Smart models cannot substi- est: to outline the basic concepts behind fisheries tute for bad data. It may take hours to develop a modelling, and review briefly the major modelling mathematical model and months to implement it options that confront the fisheries biologist. We on computer, but it may take decades to collect an hope that this overview will help readers to see adequate time series of quality fisheries data. The how the various methods outlined in Chapters descriptive part of a fisheries research programme, 5–12, Volume 2, fit together, as well as providing perhaps using the techniques of Geographical links with various chapters in both volumes that Information Systems (GIS), (Meaden and Do Chi review the basics of fish biology.

1996), may sometimes be more valuable than a

There is a risk in saying a little about models mathematical model, which in the worst case may without saying ‘everything’ about them, because

be a poor reflection of the key mechanisms revealed fisheries managers may become tempted to grab by the data. Models may not necessarily enhance an ‘off the shelf’ model without paying enough at- our understanding of a fisheries system. tention to its underlying assumptions, or ensuring that it is applied to an appropriate situation. It is imperative to look at each situation afresh and to

13.2 BASIC CONCEPTS

choose models that can be tailored to each situa- tion. It is also important to test or ‘confront’

A general expression for a mathematical model is models with data (Hilborn and Mangel 1997). Our useful for the presentation of guidelines for the

271 choice of model (Schnute 1994). A model is usually also that a control vector Z t = (Z 1t ,Z 2t ,...,Z rt ) rep-

Choosing the Best Model for Assessment

framed in the form of a mathematical function, resents r quantities known to influence the popu- which can be written as

lation dynamics. Then the two vector equations FXP : ( t , )Æ Y t ,

(13.5a) where X t and Y t are variables at time t and P is GXZP ( t ,, t )= Y t

(13.1) FX ( t - 1 ,, FP t )= X t

(13.5b)

a parameter. What equation (13.1) says is that the function F takes values of X t and P and maps them encapsulate descriptions of system dynamics and onto values of Y t such that there is a unique value measurement, respectively, where P = (P 1 ,P 2 ,..., of Y t for each pair of values (X t , P). In a fisheries con- P k ) represents a parameter vector of dimension text X t could be some state of the fish stock and Y t k that remains constant through time. Thus the the catch. The dynamics of the state of the stock, vector F governs the evolution from time t to t + 1 say its size, is likely to be a function of not only bio- of the m states X t , and G relates the n observations logical processes such as births and deaths due to Y t to the current system state X t . As outlined by natural causes but also deaths due to fishing. As Schnute and Richards (2001 and Chapter 6, this fishing is a variable that can be controlled we need volume), the model (13.5a)–(13.5b) describes the to know the relationship between X t and the fish- progression of X t through the space of all possible eries-related control variable Z t . So we need a new states, and is therefore called a state space model. function G that will describe the way in which X t

A concrete realization of (13.5a) and (13.5b) is changes in response to Z t -1 . This is evaluated at t - the discrete form of the logistic equation

1 because it is assumed that there is a lag between the application of the control and the change in the

state of the fish stock. The new function G is then

B • ¯-

qE B t t (13.6a)

(13.6b) Equation (13.1) can now be rewritten as

GZ : ( t - 1 )Æ X t .

(13.2) C t = qE t B t

where B t is biomass of fish, E t is fishing effort and

C t is catch, all at time t. a is the intrinsic rate of nat- FGZ : ( () t , P )Æ Y t .

(13.3) ural increase, B • is the carrying capacity and q is the catchability coefficient. In terms of our general So in general terms we can say that the dynamics of form in (13.5a) and (13.5b), m = n = r = 1, k = 3, X t = the system is fully accounted for by two functions B t ,Z t =E t ,Y t =C t , and P = (a, B • , q). of X, Z and P,

A model may be deterministic or stochastic. The deterministic model yields a one-to-one rela- FX : ( t - 1 ,, ZP t )Æ X t

(13.4a) tionship between each X and each Y. With the de- terministic model we assume that there is no GXZP : ( t ,, t )Æ Y t .

(13.4b) process error meaning that the equation is perfect, and we can make a precise prediction of Y. This In most cases of interest F and G and the vari- outcome is unlikely in fisheries science. There is ables X t ,Y t ,Z t , and the parameters P, are multival- always some degree of uncertainty when predict- ued and have to be described by vectors. As a result ing the development of individual fish, the size of a

a fish population will be characterized in each year fish stock, or the capture rate of a fishing fleet. A t by a vector of state variables X t = (X 1t ,X 2t ,..., stochastic model operates with many Y-values for

X mt ) with dimension m. Y t = (Y 1t ,Y 2t ,...,Y nt ) will each X-value, as will be discussed later. Determin- denote the corresponding vector of n observations istic models are easier to handle than are stochas- obtained annually from this population. Suppose tic models. Also the computations and the data

Chapter 13

demands for stochastic modelling are greater than the model, one might as well have used the simpler those for deterministic models. Whether to use single-species model. deterministic or stochastic modelling is an im-

The model may be applied to estimate the para- portant choice.

meters, or to predict the value of Y for an assumed The choice of parameter vector, P = (P 1 ,P 2 ,..., value of X. The latter application is applied to an- P k ), and the choice of the relationship between X, swer ‘what-if-then’ questions. For example, what Z, P and Y, here called, F and G, is essentially the will happen to the landings, Y, if the fishing effort, choice of the model. The number of parameters (k) Z, is a certain number of fishing days per year? In may be small, say 2 or 3, of it may be hundreds or terms of the logistic equation of a fishery (13.6), thousands. The number of parameters is an impor- one might ask what happens when E t is increased tant choice. You may choose to cover many details by 10%? in your description of the world, or you may choose

The model may be dynamic or static. A dynam- to apply the minimum number of parameters. ic model predicts output for a time series of years, There are advantages and disadvantages of both whereas the static model predicts the average approaches. The model-designer is often not free output. A static model may be said to predict what to choose parameters, but is more or less con- will happen ‘on average’, say, over the next 10 strained by the objectives of the model. It may be years, whereas the dynamic model predicts what mandatory to introduce certain parameters into happens in each individual year. It is usually easier the model to meet certain objectives. Alterna- to handle the static than the dynamic models, be- tively, some parameters may have to be left out, cause there are more details to describe in the dy- because the data available cannot be used to esti- namic model, and consequently more parameters. mate them. This is a typical situation for fisheries

An example from fisheries of a dynamic model models, with the number of unknowns greatly ex- is the model of the Pacific halibut (Hippoglossus ceeding the number of observations (Schnute and stenolepis ) fishery by Thompson and Bell (1934) Richards 2001).

(Shepherd and Pope, Chapter 8, this volume). This It may be possible to choose plausible values for predicted numbers of fish dying over a 13-year parameters that cannot be estimated. The philo- period when fishing mortality increased from 40% sophical question faced is whether it is better to to 60%. The yield-per-recruit model of Beverton ignore a feature of the system description or to and Holt (1957) is an example of a static model, as use the best possible ‘guesstimate’ available. Obvi- is the Ricker stock–recruitment curve (Myers, ously there is no general answer to the question.

Chapter 6, Volume 1) which gives the average re- Typical examples of fisheries models with cruitment for a given spawning stock biomass. Sta- few parameters are the surplus production models tic models in fisheries are most often chosen (equations (13.6a) and (13.6b), and Chapter 6, this because data for dynamic models are not available. volume), which do not account for the size or age Often the static models are called ‘Long-term pre- composition of stocks. Models accounting for size diction models’ and the dynamic models are called or age usually have many more parameters. Going either ‘Short-term prediction models’ (that, is 2–3 from single-stock models to multi-stock models years) or ‘Medium-term prediction models’ (10–20 clearly increases the number of parameters. Simi- years) (See also Shepherd and Pope, Chapters 7 and larly, going from a one-fleet and one-area model to

8, this volume).

a multi-fleet and multi-areas model increases the number of parameters yet again. The number of parameters may not be proportional to the number

13.3 STOCHASTIC

of components, but may be more as there may be

MODELLING

a need to model interactions between the various components. If interactions such as that between No model in fisheries can predict the exact value fish stocks or fishing fleets are not accounted for in of any variable. To the simple model one should

273 ideally add stochastic vectors, d and e, so the

Choosing the Best Model for Assessment

stochastic model reads as

GXZP ( t ,, t )+= e Y t .

(13.7b)

The stochastic vectors d and e take unpredictable

values from a probability distribution which we Frequency may have some knowledge about. Usually, d and

e are assumed to be normally or log-normally distributed in fisheries models. The stochastic

term accounts for all the elements not accounted

for by F(X t -1 ,Z t , P) and G(X t ,Z t , P). If the model Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 Y 11 Y 12 actually reflects the true relationship between X

and Y, which is rarely the case in any fisheries Output model, the stochastic term has a known mean Fig. 13.1 Output from a stochastic simulation in terms

value which is usually zero. However, the fisheries of the equation Y + e = F(X, Z, P). models are always incomplete with an unknown bias.

the result of process error. A discussion of how to In more detail a stochastic version of (13.6a) and distinguish between the two sources of error is (13.6b) requires three probability distributions to given in Hilborn and Walters (1992).

be defined. These are The way in which Y t may vary is shown in (Fig. 13.1). Where stochasticity is accounted for in

P ( X t , X t - 1 , Z ,P t ) , P ( Y X Z ,P t , t , t ) , P ( YP 0 , ) .

(13.7a) and (13.7b) by the vectors of residuals d and

e, they may have a probability distribution of an as- sumed form or a distribution estimated from time The third item determines the initial state of the series of observations of the pair (X t ,Y t ). system. Any distribution can describe the varia-

When using a stochastic model for prediction, tion in X t ,Y t and X 0 .

the standard procedure is to let a computer pro- The stochastic terms, d and e, can be thought of gram repeat the same prediction or simulations as containing two components, the measurement many times, say 1000 times or 10 000 times (the error and the process error. Each can be expressed Monte Carlo method; Hilborn and Mangel 1997). as a linear combination of two terms, d = d m +d p and In each simulation the computer program draws e=e m +e p . The measurement error (d m ,e m ) is the de- the values of parameters from a random-number viation expected if the model is correct, and the generator. Eventually, the probability distribution process error (d p ,e p ) is the deviation caused by in- is estimated by the frequency distribution of Y (see sufficiencies in the model (Schnute 1994). In a dy- Fig. 13.1). This means that a stochastic model re- namic model, the process errors will appear in the quires as input the characteristics of the probabil- entire dynamic system, whereas the measurement ity distribution for the parameter estimates. For errors will occur individually in each time step.

a general introduction, see for example Hilborn As an example of process error and measurement and Mangel (1997) and Manly (1997). One of the error, we may consider a research survey, which earliest attempts at introducing a stochastic ele- suddenly gives an unexpectedly high density of ment into a fishery model was by Beddington and fish. This may be due to bad survey design, which May (1977). They attached stochastic process vari- would lead to measurement error, or to an unex- ation to the parameter a of the Schaefer surplus pectedly high stock productivity, which would be yield equation (equations (13.6a) and (13.6b), and

Chapter 13

Objectives (request(s) from

Input data yet to Constraint

Data already

be collected

Choice of model(s)

Fig. 13.2 Flowchart for selection

of model comprising the steps: Data

F(X, Z, P) = Y + ε

Define objectives and identify

existing data and data to be

Estimate P

collected subject to available funding. Then choose the input,

Parameters

Feedback to

X, the control variables, Z, the

customer

output, Y, the parameters, P, the

Test of model(s)

relationships between X, Y and P,

F(X, Z, P) = Y?

the stochastic term, e. Then estimate parameters, P, test the

model and eventually apply it. For further explanation see text.

Apply model

Schnute and Richards, Chapter 6, this volume) and method from first principles. In all cases the task of showed that the coefficient of variation of yield in- choosing a model can be broken into five steps: creased as effort increased, so making the predic-

1 Choosing the states, X = (X 1 ,X 2 ,...,X m ). tion of yield less and less reliable as the fishery

2 Choosing the control variables Z = (Z 1 ,Z 2 ,..., took more and more fish.

Z r ).

3 Choosing the output, Y = (Y 1 ,Y 2 ,...,Y n ).

4 Choosing or estimating the parameters, P = (P 1 ,

13.4 MECHANICS OF

P 2 ,...,P k ) and the relationships between X, Z, Y

CHOOSING A MODEL

and P.

5 Choosing the assumption about the stochastic The choice of a model will depend on what is re- terms, d = (d 1 ,d 2 ,...,d r ) and e = (e 1 ,e 2 ,...,e r ). quired as output, and will be constrained by data Figure 13.2 illustrates some key elements in the available to estimate parameters. For a manager process of choosing a model. The output, Y, is the requirement is most likely to be how the fish- selected to match some objectives defined either ery will behave under a given measure. Existing by the scientist or perhaps more often by a cus- models tailored to the particular fishery may meet tomer, which is often also the funding body. Also this objective. For scientists doing pure research, the funding and the manpower available for the the model may help to explain the workings of na- creation of the model will place obvious con- ture, and this may often require creation of a new straints on what can be achieved.

275 Once the objectives are defined clearly, the next

Choosing the Best Model for Assessment

Very often, fisheries scientists are constrained step is to investigate the existing data (X) for use as by managers who want a particular type of out- input to the model. Depending on the funding, one put (Y) from a model. An example might be where may (or may not) consider collection of additional managers want to reduce the amount of discard- data (X¢). For example, to apply an age-structured ing, and to that end, they want to introduce tech- model, age composition data must be available.

nical measures. Suppose further that one of the The choice of model should aim at utilizing all measures is to close a certain sea area for fish- relevant and available data. Within each family of ing by certain types of vessels using certain types fisheries model, there are usually a small number of gears with certain mesh sizes. An example is of standard relationships, as listed in Table 13.1. the so-called Plaice-box in the North Sea (Pastoors The differences between models are due mainly to et al. 2000). The output may then be the ‘discard differences in the degree to which input is aggre- ogive’ before the introduction and after the intro- gated. For example, are data given by geographical duction of the closed area. The discard ogive areas, by month or by year, and by fishing fleet?

can be parameterized by DL 50% and DL 75% . To Once the relationships between X, Z, P and Y

be able to respond to such a request, a number of have been determined, and this is usually the preconditions must be met. Data must be disaggre- easiest part of the task, the next step is to estimate gated by area, fleet, gear, mesh size, species and the parameters and then test if the model fits the body length. A traditional surplus production data. If the model does not fit the data, it should be model cannot be used, as this type of model does modified. Ideally, one should work simultaneous- not cover many of the features of the problem. ly with a number of alternative models (Hilborn Accordingly, a more complex age-structured and Mangel 1997). In many cases, if the aims of model is required. modelling are known, and the manager is using standard fisheries data, it is possible to use models already available in the literature.

13.5 ESTIMATION

A typical example of model choice for the man-

OF PARAMETERS

agement of a fishery would be a situation where the management agency has data on effort exerted Estimation of the parameters, P, from the obser- by the fleet (X Fl ) measured as days at sea and land- vations (X,Y), can be made by, for example, mini-

ings in weight of each species (Y Sp ). The fishery sci- mization of the so-called modified c 2 criterion (see entist now needs to choose a model which relates for example Sokal and Rohlf 1981): the input variable X to the output variable Y. An example might be Y Sp = qX Fl

B where q is the catch-

ability coefficient and B is the biomass vector for

2 ( FX ( iv , P )- Y iv c )

, (13.9) each species. Alternatively, the job could be done

i = 1 v = 1 FX ( iv , P )

using a surplus production model (Schnute and Richards, Chapter 6, this volume). In the service of when v = 1, 2, . . . , m sets of observation (X 1v ,X 2v , realism, the data on X and Y could be divided by ..., X mv ) are available. There are other equally fleet, area and time in the following way:

well-established techniques for parameter esti- mation, such as the maximum-likelihood method.

X 1,Fl,p,y,Ar,g = Effort of fleet Fl in time period p Thus, there is a choice of methodology when (e.g. month) of year y in area Ar using the model for parameter estimation, al- using gear g

though the results will be almost the same when Y 1,Sp,Sc,Fl,p,y,Ar,g = Landings of species Sp, commer- data fit the model well. The problems emerge cial size category Sc caught by when there is a large variation of observations fleet Fl in time p (e.g. month) of around the values predicted by the model. In this year y in area Ar using gear g

case, the choice of estimation model may have

Chapter 13

great impact on the estimated values, and some- times in fisheries the methods are so elastic and the data so bad, that it becomes almost up to the researcher to decide the parameter value, by manipulating the assumptions behind the esti- mation procedure. It might perhaps be better in these sick cases to admit that the estimation failed, and start the search for either better data or better models.

Together with the estimation of parameter values, the standard methods of parameter estima- tion will provide estimates of the probability dis- tribution of the parameter estimates themselves. These in turn may be used as input for stochastic modelling.

The most common methodology for parameter estimation in fisheries science is represented by the Statistical Analysis System (SAS) computer- ized statistical system (e.g. DiIorio 2000). There is no space here for a deeper review of the mathemati- cal features of estimation, but we refer readers to the large literature on mathematical model- ling and parameter estimation (see Hilborn and Walters 1992, and Hilborn and Mangel 1997 for an introduction to the literature). It is strongly recommended always to follow the standards of generally accepted textbooks or computerized sys- tems for parameter estimation, such as SAS or the S+ (e.g. Krause and Olson 2000) and to avoid inventing ad hoc estimation methods. The SAS procedures entitled ‘GLIM’ (General Linear Model) and ‘MIXED MODELS’ (Littell et al. 1999) appear to be able to handle almost any estimation problem encountered in fisheries research. In North Ameri-

ca some fisheries scientists are using AD Model Builder (http://otter-rsch.com/admodel.com, AD Model Builder 2001) which is an efficient com- mercial package designed for non-linear estima- tion. The AD Model Builder is recommended for estimation when models are large, complicated and non-linear. The AD Model Builder makes it easy to code almost any model and to investigate its statistical properties. The Builder is also re- commended for estimation when large, non-linear models are based on many observations and have many parameters (see also Pitcher, Chapter 9, this volume).