Non-parametric methods plained sum of peaks’. This is based on how often

9.2.5 Non-parametric methods plained sum of peaks’. This is based on how often

a von Bertalanffy growth curve hits modes in the Most non-parametric methods work by scanning data. During fitting, growth curves with different

a range of L • and k values and working out a parameters are run and mapped. The maximum goodness-of-fit (GOF) for each combination. The of the scoring function is chosen as the best fit. A best GOF is searched for by the user or by auto- seasonally modified growth curve can easily be matic search, or a combination of both. The best fit substituted for the standard von Bertalanffy, and, gives the estimate of growth. Mortality is esti- in fact, the same goodness-of-fit technique could mated from the numbers in the cohorts sliced

be used for any growth curve or pattern. from the original size distribution using these

The number of peaks is identified by those growth parameters.

standing out above a 5-point moving average:

Usually, these methods attempt to do this by troughs are the areas below the moving average. fitting a growth curve through a whole set of sam- The number of peaks gives the maximum ‘avail- ples taken through time. The von Bertalanffy able sum of peaks’ (ASP). A von Bertalanffy curve growth curve, or its seasonal modification, is for the specified L • and k is traced through the data ordinarily employed, although it is possible to use starting at the base of the first peak. A point is other growth models, or even empirical growth scored each time the curve hits one of the peaks: a values, but these options have rarely been used. point is deducted each time it hits a trough. This is

A goodness-of-fit function based on how well the repeated for starting times equal to the base of each growth curve passes through the ‘peaks and peak. The maximum value is the ‘explained sum troughs’, is maximized for a range of values of L • of peaks’ (ESP). The goodness-of-fit function is the and k. So growth, and sometimes mortality, is ratio ESP/ASP. This process is repeated for all estimated along with the dissection of the required combinations of L • and k, the goodness- length–frequency curves.

of-fit mapped and the maximum value chosen as

Three of these non-parametric methods are out- giving the best growth parameters. For each com- lined in this paper: ELEFAN, originally devised by bination of k and L • , it is possible to search for the Daniel Pauly; SLCA, originated by John Shepherd; value of t 0 , the starting point within a year for the and the Projection Matrix method, which was growth curve, relative to the data, which maxi- developed by Andrew Rosenberg and Marinelle mizes the ESP/ASP ratio. Note that absolute ages

Basson (Rosenberg et al. 1986) from an original are needed to find the true t 0 . method by Shepherd (1987b).

Simulations show that ELEFAN can give clear and correct answers where peaks are well separ-

ated in the data, but that it tends to underestimate k (Rosenberg and Beddington 1987). There are two

ELEFAN (electronic length–frequency analysis)

Daniel Pauly was the first to realize the potential problems with the ELEFAN technique. First, it of this type of method, and working versions of his seems sensitive to the appearance of discrete original ELEFAN first appeared in the late 1970s modes in the data. Second, because it is an ad hoc (Pauly and David 1980). Nowadays, the modern method, it lacks explicit statistical error structure version of this length–frequency method is the and therefore provides neither standard errors of ELEFAN 1 module of the widely-used FiSAT pack- the estimates nor a guide to performance in any age distributed by FAO (Food and Agriculture situation. But the latter problem could today be Organization) (Gayanilo et al. 1996). Pauly (1987) investigated using Monte Carlo simulation has written a very clear review of the basis of methods (Hilborn and Mangel 1997). the method.

Like the other non-parametric methods,

How ELEFAN works. ELEFAN works by at- ELEFAN does not directly evaluate multiple tempting to find a maximum for a goodness-of-fit recruitments during a year, such as occurs in many function based on peaks and troughs: the ‘ex- tropical fisheries, although there is a recruitment

Size-based Methods in Fisheries Assessment

197 pattern routine which helps to detect multiple re- (Note: this covers the likely range of t 0 values). The

cruitment pulses. Alternative growth models can maximum score, S m , for the current combination

be used instead of the von Bertalanffy, and one that of growth parameters is then given by: has been frequently employed is a seasonal version of this growth curve.

S m = ( Sa 2 + Sb 2 ) .

For any one pair of values of L • and k, t 0 can be

SLCA (Shepherd’s

easily found as:

length-composition analysis)

(9.10) goodness-of-fit function for detecting peaks and troughs, using a damped sine-wave function

Shepherd (1987a) introduced an objective t 0 = arctan ( Sb Sa ) 2 p .

The above is repeated for all the L • and k combi- borrowed from time-series analysis of diffrac- nations under consideration, and the values of tion patterns. The damped sine-wave function goodness-of-fit, S m , are entered into a table of emulates the decreasing spacing of mean lengths- results so that the maximum may be identified. at-age of the von Bertalanffy curve. The SLCA Contours of S m may be mapped to avoid picking method is conceptually very similar to ELEFAN, local maxima. As with ELEFAN, the upper limit the value of a scoring function being mapped of length classes needs truncating to avoid bias against a range of values of L • and k.

from the ‘pile-up’ effect. The ‘pile-up’ effect can be

How SLCA works. Values of L • and k are minimized by the use of the Pauly and Arreguin- chosen for a von Bertalanffy curve. For each, length Sanchez modification. interval L, t max and t min are calculated as the ages

Provided ages as defined by the start point in the corresponding to the start and mid-point of the analysis are known, Shepherd’s method can pro- interval using the growth equation: t is the average vide a direct estimate of t 0 . Published simulations of t max and t min . The test function T L is estimated suggest that it is more robust than the ELEFAN as:

algorithm (Basson et al. 1988). Provided that the modes for the younger fish are reasonably clear in

(9.6) the samples, it seems less sensitive to the appear- ance of modes overall. But Terceiro and Idoine where Q = (t max -t min ) and t s is the proportion of (1990) showed that SLCA suffers from the same the year since recruitment when the sample was general problems as the other methods. taken. The GOF function is then calculated over

T L = [ sin ( p Q )( p Q ) ] ¥ { cos [ 2 p ( tt - s ) ] } ,

The algorithm of SLCA is firmly linked to the all length groups as:

von Bertalanffy model, and it would be hard to modify it for seasonal growth or alternative growth

(9.7) models.

where N L is the number in each length class, and Dt Projection matrix method

L is the time needed to grow through each length class:

The basis of the projection matrix method (Rosenberg et al. 1986) is elegant, simple and differ-

(9.8) ent from the previous two methods. The numbers in any size group at time (t + 1) can be predicted where L u is the upper bound of the length class and from the numbers in that group at time t, using

Dt L =- 1 k ◊ ln [ ( L • - L u ) ( L • - L d ) ]

L d is the lower bound. This modification was intro- growth, mortality and recruitment from length duced by Pauly and Arreguin-Sanchez (1995).

groups smaller in size:

To estimate t 0 , this is run with t 0 set to zero, to give Sa, then again with t 0 set to 0.25, giving Sb.

fgZft [ ,, () i ]Æ fgZft { ,, ()+ i 1 } . (9.11)

Chapter 9

Now, if constant mortality is assumed over the any growth model is easily incorporated in the time interval, or the pattern with size is known, projection. only the growth model parameters will affect the projected numbers.

How the projection matrix works . First, using the first set of parameters for the growth curve, All three non-parametric methods can easily give the expected length frequency is projected for- silly answers, and should only to be applied with wards from the first sample in the set. Secondly, care and with insight of the ecology of the fish the goodness-of-fit, G, of this expected data in each under study. Using Monte Carlo simulations, Isaac class i, proj i , is compared with the actual length (1990) compared how SLCA and ELEFAN perform frequencies, obs i , using least squares:

Summary: non-parametric methods

under a variety of conditions and provides some

2 helpful guidelines. For example, SLCA seems to be

G = Â [ ( obs i - proj i ) ] .

(9.12) better for slow-growing and ELEFAN better for

fast-growing fish.

Thirdly, the projection of expected values is re- All methods based on the von Bertalanffy curve peated using the next set of values for the growth suffer from multiple optima at harmonic combina- curve parameters. This is repeated for the whole tions of k and L • (see Kleiber and Pauly 1991), range of parameters and the G goodness-of-fit which is not surprising given that a number of values tabulated and contoured so that, as with alternative fits to length–frequency data will be SLCA, the best fit can be chosen. Note that here a reasonable statistically. A degree of subjectivity is minimum value of G is sought. As with SLCA and inevitable in interpreting the GOF response sur- ELEFAN, the upper-length classes need truncat- face: all the methods can generate multiple peaks ing to avoid bias from the ‘pile-up’ effect.

along ridges of high GOF (see example below) and

The projection matrix method can perform simulations show that in some cases a high peak is well under a wide range of conditions (Basson et al. not the correct answer. The recommended non- 1988). Unlike all the other non-parametric and parametric approach to length–frequency analysis graphical methods, the Projection Matrix does is to try to find solutions which are robust against not rely on the appearance of ‘peaks and troughs’ the particular method used. If all three non- (modes) in the data, an advantage it shares parametric methods indicate similar peak GOFs, with parametric distribution mixture methods. then you can have some confidence in the answers. It is also robust against increase of variance Where they differ, decisions have to be based on in length with age. An advantage is that any additional knowledge of the species in question. growth model could be used for the projection. In

its basic form it suffers from the same multiple 1 The far-horizon (‘rubber tuna’) problem The recruitment problem as ELEFAN and SLCA, but, far-horizon problem occurs when a good statistical like ELEFAN, it could be easily modified to deal fit is obtained at high L • and k. This fit implies very with this.

rapid growth and, by analogy, a k of 2 might corre-

In my opinion, of the three non-parametric spond to the ‘growth rate’ of a tuna-shaped rubber length–frequency analysis methods, the projec- balloon inflated rapidly with a tyre pump. This tion matrix, as the least ad hoc, is the one that would be an absurdly fast growth rate. A good fit would stand more statistical development. Unre- like this would imply that all the bumps and wig- solved problems in the present version appear to be gles in the length–frequency data are only noise how to evaluate shifts in the date of origin of the and there is only one, or a few, cohorts present. Co- projections, and whether projections from growth hort slicing, which assigns fish to ages (see below), year 1 should be applied to samples taken in subse- and examination of the fitted growth curve against quent calendar years. Seasonality in growth needs the data histograms will check out whether the to be included, but this should not be difficult as suggested far-horizon fit is realistic. SLCA is

199 especially prone to this problem, and, to avoid it, it metric ratios, and so plotting published values of

Size-based Methods in Fisheries Assessment

is recommended to scale the GOF by dividing by log L • against log k can be a good guide to the accu- (L • *k). The general remedy for all methods is to racy of estimates from length–frequency analysis. beware of far-horizon solutions unless (a) you have

Figure 9.3 shows an example of such an analysis good evidence that fish actually grow that fast of small pelagic zooplanktivores from some (some do, e.g. Coryphaena), and (b) you are satisfied African lakes (Pitcher et al. 1996). Values of L • and with the implications shown by cohort slicing.

k that fall outside the boundaries of the regions characteristic of each species are unlikely to be

2 The near-horizon (‘bumpy road’) problem correct and can be avoided. It is recommended to The near-horizon problem occurs when a good sta- draw up auximetric plots like this one when per- tistical fit is obtained at very low or very high L • , forming new analyses. Phi prime values can be ob- and low k, where L • is very close to L max . Cohort tained for many species and locations from the slicing on these growth values will reveal many FishBase website, where there is also an online age classes. A good fit here implies that every little routine for plotting the equivalent of Fig. 9.3 bump and wiggle in the length–frequency distribu- (www.fishbase.org). tion is a cohort, meaning an absurdly slow growth rate. The only remedy is to beware of solutions An example of non-parametric analysis To illus- which are very close to L max and to examine the im- trate the use of the three non-parametric methods, plications with cohort slicing. In general k values this section compares the fits obtained to some less than 0.05 are suspect, but of course this can generated test data. Six bi-monthly samples of mask growth rates of fish that are long-lived and around 1500 individuals from a hypothetical popu- genuinely slow growing, such as sharks, orange lation are plotted in Fig. 9.4. Fish growth is nor- roughy (Hoplostethus atlanticus), Pacific rockfish mally distributed (coefficient of variation = 0.1) (Sebastes spp.) or sturgeon (Acipenseridae).

3 General approach to multiple optima It is Limnothrissa – Tanganyika not always easy to choose one from a set of many

Limnothrissa – Kivu GOF peaks, especially if you get them from dif-

0.8 Limnothrissa – Kariba ferent methods. In the last resort you may have to Limnothrissa – Cahora Bassa

0.6 retain several peaks right through to the assess- Engraulicypris – Malawi Rastrineobola – Victoria

ment stage and examine the management implica- tions of each one. Additional information can be 0.4

brought to bear upon the problem. For example,

one can look at the implications for age structure of alternative peaks using cohort slicing, or otolith

Log

1.5 1.7 1.9 2.1 2.3 or scale readings of subsamples. 2.5

Log L-inf sults using data from similar species which have

A powerful general approach is to filter the re-

been analysed elsewhere. Pauly and Munro (1984) –0.4 demonstrated what they termed an ‘auximetric’ Fig. 9.3 Auximetric plot of published growth

relationship, ‘Phi prime’, between L • and k that is parameters of three species of small pelagic cyprinids consistent across species:

and clupeids from African lakes. Points within lines indicate values for each species consistent with the

F¢ = log k + 2 log L Phi prime analysis. Points outside the lines are suspect

(9.13) and should only be used with caution. Circular symbols are cyprinids; square symbols are sardines.

Analysis of thousands of such relationships shows (Source: data taken from Pitcher et al. 1996, where that members of a taxon have closely similar auxi- full details of sources may be found.)

Chapter 9

Fig. 9.4 Length–frequency plots of samples taken from a hypo- thetical fish population with growth and mortality parameters as described in the text.

around a von Bertalanffy curve with L • = 1000 mm other hand, ELEFAN, which has an interesting- and k = 0.1. The instantaneous mortality rate is set looking rough and ridged surface that might to 0.2 and samples are taken with equal catchabili- confuse a local search procedure, exhibits a huge ty, so that sample sizes reflect abundance.

trough of very unlikely values near to the correct

Goodness-of-fit surfaces for values of L • from values that might help in focusing attention on the

40 to 1500mm, and k from 0.05 to 0.3 are drawn correct area. This example reinforces the recom- in Plate 1 for each of the three non-parametric mendation to run all three of the non-parametric length–frequency analysis methods (see colour methods on any data you may wish to analyse. plate). Coloured shading runs from red (low) to pur- ple (high). The correct value is indicated by a star on the response surface and cross-wires on the base. All

9.2.6 Parametric methods

three methods exhibit the banana-shaped area of al- Parametric length–frequency methods work by ternative fits very clearly, and this is intrinsic to any calculating a GOF between the sample data and a method that attempts to fit a von Bertalanffy curve. distribution mixture specified by its component

For these data, the projection matrix method is parameters. The history of these methods is re- the only one to find the true value. However, the viewed by Macdonald and Pitcher (1979). GOF is peak GOF is a small bump in the middle of a wide- calculated as the difference between the sample domed plateau, and it would not have been found data and the fitted mixture of distributions. The without very fine scale intervals in the search pro- methods usually work by searching automatically cedure. Plate 1 also demonstrates one of the prob- for a maximum GOF, but the user can also inter- lems with both ELEFAN and SLCA. Both show the vene and guide the fitting process. The number of highest GOF in banana-shaped peaks that increase component cohorts is usually the choice of the at high L • and low k (the ‘bumpy road’ effect).

user, guided by the GOF values of alternatives. Al-

SLCA has a smoother GOF surface than ELE- ternative fits with different numbers of compo- FAN. Hence, for SLCA, automated search proce- nent age groups can be compared: dures (such as are embedded in the LFDA package)

are less likely to find a spurious local peak. On the 2 Chi square = Â { [ ( obs L )- f L ] f L } .

Size-based Methods in Fisheries Assessment

number of components. A more complex but essentially similar statistical method, MULTI- FAN (Fournier et al. 1990), gets around the prob-

6 lem by using a von Bertalanffy curve to provide the

number of cohorts in a similar fashion to the non- parametric methods. In fact, results from MIX and the more complex multi-sample MULTIFAN are

4 generally very similar (Wise et al. 1994; Kerstan Frequency

Experience suggests that MIX is robust for 2

single-sample analysis, although it tends to under- estimate k (Rosenberg and Beddington 1987). Where there is a series of samples it has advantages if there is any reason to suspect that growth does not fol-

0 5 10 15 20 25 30 35 low a von Bertalanffy curve. This can happen in Length

some fish that switch to piscivory during their lifespan (LeCren 1992). The additional work in the

Fig. 9.5 Example of length–frequency analysis as a multi-sample MIX technique is to join cohorts in statistical dissection of a distribution mixture – the MIX

successive samples using an MPA-like method, technique. Shaded area represents histogram of length

which can be both an advantage and a disadvan- frequencies from fish sample data. Thin lines are normal

distributions; thick line is overall mixture. Parameters tage. Modifications to the MIX approach can easily

of the component normal distributions are adjusted incorporate information about growth (e.g. Liu until difference between overall mixture and histogram

et al. 1989) either as starting values for mean is at a minimum (least squares). Source: figures

cohort sizes and/or as additional constraints on the from a spreadsheet that may be obtained from

fitting process. Schnute and Fournier (1980) pub- www.fisheries.ubc.ca/projects/lbased.htm .

lished an alternative version of this process. In the tropics, there is often more than one co- hort recruiting each year which is a consequence of This is the basis of statistical mixture analysis, monsoon-like seasonality in productivity. For ex- originally embodied in the MIX technique ample, Koranteng and Pitcher (1987) used MIX to (Macdonald and Pitcher 1979), although its statis- analyse length–frequency data for a West African tical roots go back to Peterson (1891). The essential sparid fishery, where a cohort recruited after each calculations, using f(L) from the first two equa- of the two major upwellings each year. The plot of tions in this chapter, can today be easily performed estimated means from MIX was best joined up on spreadsheet (e.g. Fig. 9.5).

using a strong assumption that there were two co- The main problem in using the MIX approach is horts per year. In fact, similar results can be ob- obtaining the number of components in the mix- tained using ELEFAN if a similar assumption is ture. The approach recommended by Macdonald made (Pauly, personal communication). and Pitcher (1979) and Macdonald and Green

An example of the detailed analysis of cohorts (1988) is to get the best fit for h - 1, h and h + 1 com- of a small freshwater fish is presented in Fig. 9.6 ponents where h is the guessed number of compo- (data from Pitcher 1971). Samples were taken nents, the final choice of number of age classes using back-pack electro fishing at approximately being mainly on the basis of the minimum chi- every 4 weeks and subjected to mixture analysis square. Rosenberg and Beddington (1987) show using the MIX routines. The progression of mean that MIX growth parameter estimates are quite lengths of each cohort identified from the samples robust against small mistakes in obtaining the were traced using information from seasonal

40 Length (mm)

30 Fig. 9.6 Example of tracing

20 cohort progressions following length–frequency analysis of sam-

10 ples of 90–300 minnows (Phoxinus

phoxinus ) from a 0.5 km reach of the Seacourt Stream, Berkshire

(UK) in 1967–9. Mean lengths

Time (weeks)

estimated from each cohort are represented by different symbols: lines join up most likely progress- ions due to growth. (Source: data

Mar. 67 Apr. 67 May 67 Jun. 67 Jul. 67 Aug. 67 Sep. 67 Oct. 67 Nov. 67 Dec. 67 Jan. 68 Feb. 68 Mar. 68 Apr. 68 May 68 Jun. 68 Jul. 68 Aug. 68 Sep. 68 Oct. 68 Nov. 68 Dec. 68 Jan. 69 Feb. 69 Mar. 69 Apr. 69 May 69 Jun. 69 redrawn from Pitcher 1971.)

growth (see Pitcher and Macdonald 1973). The pic- (Rosenberg and Beddington 1988). This is very ture was complicated by the hatching of several co- similar to using an age–length key (Gulland and horts of eggs from separate groups of spawning fish Rosenberg 1990; Sparre and Hart, Chapter 13, this in summer. Figure 9.6 shows three such young- volume). The relative numbers of fish in the fitted of-the-year and 1+ age-group cohorts traced from mixture components give the annual mortality births in 1966 and 1967, but such multiple cohorts rates directly. were not detected in 1968. This detailed analysis of

But for non-parametric methods, this informa- the growth, and mortality, of sub-cohorts from dif- tion will not be available, so the overlap classes ferent hatch-groups would have been difficult with may have to be split on an ad hoc basis to mimic the the non-parametric techniques. Overall the best changing component proportions. After slicing, advice is to plot out all means against time and age the relative numbers in adjacent cohorts may be so that effects like this can be detected.

used to estimate mortality rates. A better method is to track samples through time and examine mor-