Cattle movement data Statistical analyses

Page 21 of 201 and a denominator representing animal time at risk. Animal time at risk was calculated by multiplying the total cattle loaded by the sum of voyage days plus discharge days. More recent years of data from the SMDB contained detailed records of mortalities per voyage day for export voyages. These records were used to create a dataset with one row per animal that recorded survival time for each animal. Each row included a variable measuring time at risk for that animal. For animals that survived, time at risk was equal to the length of the voyage in days and for animals that died it was equal to the voyage day when death occurred. These data were used for survival analysis.

3.8 Cattle movement data

NLIS data on animal movement histories was obtained from the NLIS database for Western Australia for those cattle exported on a subset of voyages that had been enrolled in Stage 3 of this project. Extensive data exploration and cleansing was required in order to relate NLIS database records with particular export assembly feedlot codes and with dates of departure for enrolled voyages, because the date recorded in the NLIS database for animal movement may not necessarily be identical with the actual date of movement there is some leeway allowing users to enter movement data within several days of animal movements. PIC codes were used to derive property locations by using additional data sources including the public brands database Department of Agriculture and Food Western Australia, 2014, Google Earth and saleyard postcode locations. All location data was then aggregated to shire levels for analysis and reporting to ensure de-identification. The resulting dataset was then used to infer location consolidated to the shire level and also to a corresponding climate zone, number of moves and distance travelled for each animal, for the whole of life and for a 90 day period prior to the ship sailing from a port of loading. These movements were viewed using a Geographic Information System to identify general patterns.

3.9 Statistical analyses

Voyage mortality percentage was calculated as the cumulative incidence of deaths during the voyage with a denominator representing the total count of cattle loaded onto the ship at the ports of loading, and a numerator representing the count of deaths observed for the entire voyage from first day of loading to last day of discharge. Voyage mortality percentage is only directly comparable between voyages for those voyages that are of similar duration. Where voyages are of differing durations, the voyage mortality percentage may provide a biased measure of mortality risk. Voyage mortality rate was estimated using Poisson regression that incorporated cattle- days at risk into the estimation process. Cattle days at risk were estimated by multiplying the total animals at risk number of cattle loaded minus half of the deaths that occurred during the voyage by voyage duration in days. The voyage mortality rate provides a true cumulative incidence rate that is adjusted for voyage duration. Voyage mortality rate when estimated in this way assumes that the risk of mortality is the same on each voyage day for the duration of the voyage. It is acknowledged that in fact mortality risk is not likely to be constant over the duration of any voyage and may vary from day to day. Page 22 of 201 Daily mortality rate can be estimated if data is available of the count of deaths that occurred on each voyage day. This outcome allows appreciation of the variability or dynamic nature of mortality risk from day to day during the course of a voyage. Descriptive results are presented as summary counts and percentages for the number of animals that died or that returned positive test results for specific pathogen testing or for causes of death. Denominators for percentage estimation were based on numbers of animals in a sampled group or loaded onto a ship as part of a participating voyage. Poisson and negative binomial regression modelling was used to analyse count data number of animals that died or that tested positive for defined outcomes to screen for associations between outcomes and relevant explanatory variables. Regression models were run as univariable screening models to test for unadjusted associations. In some cases multivariable models were run by adding multiple significant explanatory variables from screening models into one multivariable model to look for adjusted effects effects of one explanatory variable that are adjusted for other variables in the same model. In many cases it was not possible to use multivariable models because of the sparseness of the data and the presence of confounding between many explanatory variables. In these cases results are presented only for univariable models with appropriate caution in interpretation of findings. Cattle are typically managed in groups during export property of origin group, transport, feedlot pen, voyage, ship leading to statistical clustering. Data from cattle in the same cluster unit same mob, same ship etc are likely to be more similar than data from cattle in different clusters. The impact of clustering is to increase risk of correlation between data points and this in turn may lead to biased results generated from statistical models unless appropriate analytic adjustments are made. In this report statistical models were adjusted where possible for clustering either by adding fixed effects or random effects to models to account for clustering and ensure estimates for other effects were unlikely to be biased by clustering in the data. There were occasions where linear regression models were applied for continuous outcomes such as core body temperature and logistic regression or chi-squared tests for binary outcomes deadalive, disease positivenegative. Spearman rank correlation test was used to test for correlations between the presence of various organisms, as detected by qPCR, in the nasal secretions and lung. A flexible parametric survival model Royston-Parmar model was applied in Stata for daily mortality data created from SMDB data. 8 An initial model was fitted that incorporated proportional hazards assumptions no interaction between destination and time. The approach involved fitting splines using five degrees of freedom and then generating predicted hazards for defined time periods based on the range of time periods covered by both levels of the destination variable out to day 35. Model output was used to describe mortality rates per day of voyage daily mortality rate. 8 Royston and Lambert 2011 Page 23 of 201 Descriptive summary tables were prepared in Microsoft Excel ® and statistical analyses conducted using Stata www.stata.com and the R software environment for statistical computing www.r-project.org . All analyses applied a significance threshold of alpha=0.05. 4 Results – Stage 1

4.1 General approach