Data sampling and stratification
5.1.8 Data sampling and stratification
Consideration of the coverage, location, quantity and quality of data sampling also form part of the information cycle, generally as one of the final stages prior to the actual collection of data. At the beginning, or during revision of an information programme, it is common to undertake fishery frame surveys or censuses in which complete enu- meration (100% coverage) of the basic structure of the fishery sector is compiled, including produc- tion, infrastructure, employment and community dependence, and may also include environmental baselines. From these surveys, decisions can be made whether to maintain complete enumeration or conduct sample surveys in such a way that the estimate from the data samples is as close as possi- ble to the true value for the data population. The difference between these two is the bias of the esti- mate, and it is extremely important to calculate and account for this bias by statistical methods. Reducing bias can be undertaken by increasing sample size but also in other ways, partly by ran- dom sampling to avoid sampling error and partly at the design stage of the programme. Sampling, of course, takes much less effort and therefore costs less. Some data will always require complete enu- meration, such as the information needed to con- trol quota allocations or fishing licence limits.
The main problem of data from sample surveys, quite apart from sample error and accuracy, is that the data population from which they are taken may not be evenly distributed across the location (strata) of the data, whether this is in geographic space, time or other dimension. Subdividing a data population into groups or strata and then ran- domly sampling those can reduce the data variabil- ity to that which represents real differences
between the strata, which are then amenable to comparative analysis. The major strata in fisheries are usually one or a combination of space, time, landings, vessels, gears, enterprises, trade, people, the environment and the specific requirements of at-sea fishery-independent research. Subdividing the data population into minor strata, and then randomly sampling those, will enable further reduction of data variability. For example, a major stratum for fish trade might comprise markets/auctions, intermediaries/wholesalers/ retailers, or exporters/importers as minor strata.
The key operational constraint of cost is also what drives stratification. The amount of effort, defined as the number of samples, hence cost, falls as bias is eliminated by stratification. At the out- set, decisions on the quantity of data by stratum may often be made based on the results of cen- suses, but simulation can also be undertaken, par- ticularly when revising sampling design based on experience. In general, of course, the larger the sample then the higher the accuracy. However, in- creased accuracy is not proportional to sample size, hence to the cost or effort applied. Depending on the purposes of the data, there may well be a cost/accuracy combination that satisfies both op- erational and statistical constraints. Even during sampling operations, such as very expensive trawl or bioacoustic surveys, it is usually wise to con- tinuously estimate bias and to curtail sampling (hence costs) once a minimum satisfactory bias is reached.
Poor data quality, even assuming sample pre- dictability from stratification, can limit any sam- ple’s value. Quality is also related to cost since it links directly to operational possibilities and con- straints. Compliant participants, realistic data col- lection schedules, good training and equipment, all take time and cost money. Indeed, the likely quality of data obtainable from the application of available resources for an envisaged data collec- tion scheme will need to be taken into account as far back as the derivation of the performance indi- cator under consideration. If the quality of data from a devised system cannot match the required accuracy of the indicator then no amount of re- design of the sampling system will substitute, and
Gathering Data for Monitoring and Management
89
90 Chapter 5
its estimation should be dropped or another choice of indicator be made.