Material and methods Directory UMM :Data Elmu:jurnal:L:Livestock Production Science:Vol67.Issue1-2.Dec2000:

144 J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 sufficient for many research tasks and saves the user mink’s function Wilmink, 1987 with three parame- from software development time and from possible ters per trait giving 72 equations per cow with TD programming errors. However, efficient, routine records and 36 equations per animal without data. genetic evaluations from specially developed soft- For the Holstein breed, CTDM required processing ware can save time for delivery of results and may over 21 million TD records on 1.3 million cows in 2 be necessary when general software can not accom- million contemporary groups and 2.2 million animals modate the model or data size. in total. The total number of equations was more Iteration on data Schaeffer and Kennedy, 1986 than 135 million. has been used widely as a method of solving MME. Several requests were received from Europe by The MME are not constructed explicitly, but data Canadian Dairy Network CDN to acquire the files are read each round of iteration or stored in computer programs used in the CTDM, but the cost memory, and diagonal elements, right hand sides, of the programs was a major obstacle. The decision and solutions need to be stored in memory. Iteration was made, therefore, to publish the gory computa- on data allows for a variety of techniques to be tional details in this journal so that others may write applied, such as Gauss–Seidel, Jacobi, or combina- their own programs if they want. Also, the details tions of both, sparse matrix techniques Misztal, given here can serve as the beginning of the history 1999, transformations to simplify multiple trait on computing algorithms for random regression problems Ducrocq and Besbes, 1993, or parallel models. First attempts, such as given here, are processor algorithms for solving large sparse equa- usually replaced with better algorithms over time. tion systems Madsen and Larsen, 1998. Thus, the objectives of this paper were to present the The scale of the equations to be solved has computing details used in the CTDM, to present an dramatically increased with the introduction of test outline for an alternative computing procedure that day TD models Ptak and Schaeffer, 1993 for uses less memory and disk space, and to compare the genetic evaluation of dairy cattle. Reents et al. computing requirements of the two algorithms when 1995 presented an efficient iteration on data algo- applied to data of the Canadian Jersey dairy breed. rithm for a multiple lactation TD model that has been applied in Canada and Germany. Random regression RR TD models Jamrozik et al., 1997 added

2. Material and methods

another level of complexity, mainly through an enormous increase in the size of the MME. Savings 2.1. Model in computing requirements for TD models have been suggested by using transformations Van der Werf et The CTDM has been described by Schaeffer et al. al., 1998. Canonical transformations for the random 2000a with a discussion of the experiences in using regression model does not lead to a series of a TD model for routine genetic evaluation. In matrix univariate analyses, but to a multiple trait model of notation, the multiple lactation, multiple trait, ran- reduced rank in which only the variables with dom regression TD model could be written as significant eigenvalues are evaluated. Then the multi- y 5 Hc 1 Xb 1 Wp 1 Za 1 e, ple trait model with missing traits of Ducrocq and where Besbes 1993 is applied. Parallel computing tech- niques have also been attempted Stranden, 1999. The introduction of a multiple lactation, multiple y is the vector of observations on T traits in L trait, random regression test day TD model in lactations ordered traits within lactations, Canada in February 1999 was possible through the c is the vector of fixed contemporary group design and development of specialized software. effects defined as herd-test date-parity sub- Milk, fat, and protein yields plus somatic cell scores classes, within each of the first three lactations are simul- b is the vector of fixed regression coefficients taneously analyzed in the Canadian Test Day Model nested within time period–region–age-pari- CTDM. Random regressions were based on Wil- ty–season subclasses, J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 145 p is the vector of random regression coeffi- where cients for animal permanent environmental PE effects, A is the additive genetic relationship matrix, a is the vector of random regression coeffi- P are covariance matrices of order RT L for cients for animal additive genetic effects, and the PE and genetic regression coefficients, e is the vector of random residual effects, G respectively, and H is an incidence matrix that relates contem- R is the covariance matrix for cow i on a given ij porary groups to observations, and test day j. X,W,Z are matrices of covariates involving number of days in milk associated with a cow on a The rank of R can vary from 1 to T depending on ij given test date and corresponding to the the traits that are missing on a cow. The values in R ij observations. depend on the lactation number and number of days in milk within that lactation. The corresponding mixed model equations Genetic groups for unknown parents are included MME for this model are in the vector a, for simplicity of notation, and in the definition of A, the matrix of additive genetic 21 21 21 21 ˆ H9R H H9R X H9R W H9R Z c relationships. In the CTDM, the fixed regressions 21 21 21 21 ˆ X9R H X9R X X9R W X9R Z b and the random genetic and PE regressions were 21 21 21 21 21 ˆ W9R H W9R X W9R W 1 P W9R Z p 1 212 modelled after Wilmink’s function Wilmink, 1987, 21 21 21 21 21 ˆ Z9R H Z9R X Z9R W Z9R Z 1 G a but in general, these regressions do not need to be 21 H9R y modelled after the same function. For example, the 21 X9R y fixed regressions could be modelled as classification 5 . 21 variables for every 10 days in milk because the W9R y 1 2 21 shapes of the lactation curves would be allowed to Z9R y take whatever form was appropriate rather than being Let forced to fit a particular function. Let R be the number of regression coefficients in the function NPE equal the number of animals with TD used for the lactation curve, i.e. for Wilmink’s records, function, R 5 3. NAN equal the total number of animals, The expectations and covariance matrices are NCG equal the number of contemporary groups, and y Hc 1 Xb p NFR equal the number of subclasses of fixed E 5 a regressions. 1 2 1 2 e Then the total number of equations in MME for and this model would be p P NEQ 5 NCGT 1 NFRT R Var a 5 0 G 0 , S D S D 1 NPE 1 NANRT L, e R where but because NFR is usually much smaller than NCG, NPE, or NAN, and because NCG does not involve P 5 I P , any regression functions, then NEQ can be roughly approximated by G 5 A G , NEQ 5 NPE 1 NANRT L. and Suppose that R 5 3, L 5 3, NPE51 000 000, NAN5 1 R 5 S R , 2 000 000, and T 54, then NEQ would be ij 146 J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 21 108 000 000. If all calculations are performed as ing the elements of A . Recall that the relationship double precision i.e. eight bytes per variable, then matrix inverse can be decomposed as storing the solutions, diagonal elements, and right 21 22 A 5 T 9D T, hand sides to MME would require a minimum of 2.6 gigabytes GB of memory. If the computer on where T is a triangular matrix with ones on the which the calculations are to be performed has only diagonals and at most two non-zero elements per row 2 GB of memory, then not all of these elements can in columns corresponding to the parents of an animal 22 be stored in memory at one time or computations with value equal to 20.5, and D is a diagonal must be done in single precision, or a combination of matrix. In a non-inbred population the diagonal both. In the United States NAN would be greater elements are 2 if both parents are known, 4 3 if than 10 million for their Holstein population. only one parent is known, and 1 if both parents are unknown. With an inbred population then there are many more possible values for these diagonal ele- 2.2. Algorithm A ments which can be computed using the methods of Meuwissen and Luo 1992. The variables in the A multiple trait model provides an obvious block- PEDIGREE file are ing structure of MME by traits. With the TD model, blocks can be defined on different levels of generali- Animal number ID, ty. Two different blocks will be used through the Its sire ID, description of Algorithm A, which is the algorithm Its dam ID, and currently used with the CTDM. Record blocks RB 22 Value from D . are determined by the residual covariance matrix, R , ij on a given test day for an animal. These blocks are The ID numbers of animals were consecutive from of order T or 4 for the CTDM. RB matches the way 1 to NAN, i.e. youngest to oldest sequence. Genetic in which data are stored, i.e. four traits within an groups were assigned for all missing parents and animal on a given test day. The contemporary group were numbered from NAN11 onwards. Obviously, effects have diagonal blocks of order T as well as T there must be another file that links the consecutive elements in the right hand sides RHS. That is, ID number to an animal’s registration number, name, 21 H9R H is a block diagonal matrix with blocks of and ownership, but such a file is not needed in the order T. computation process. The entire PEDIGREE file is The other type of blocking is called an animal stored in memory during the iteration process, and block, AB, which is defined by G and P of order requires approximately 16NAN bytes of memory. 21 RT L or 36 in the CTDM. That is, W 9R W and The TD records data file on all cows contains the 21 Z9R Z are block diagonal matrices with blocks of following information: order RT L. Data associated with an AB or RB are processed at the same time, and all equations pertain- Animal ID, matches the ID in PEDIGREE file ing to a block are solved simultaneously. Data Cow ID, numbering for PE effects processing in blocks is simple and speeds conver- Contemporary group number gence, but to implement a blocking strategy requires Fixed regression subclass number specific preparation of data files. Days in milk DIM Parity number 2.2.1. Data files Missing traits code Two types of data files are required; the pedigree Accuracy of TD yields code file and the data file with TD yields. The pedigree Yields for milk, fat, protein and somatic cell score. file, PEDIGREE, contains one record per indi- vidual. The PEDIGREE file must be such that Levels of all effects have to be numbered consecu- animals are ordered and numbered from youngest to tively. Missing trait codes tell which traits are oldest. The PEDIGREE file is needed for determin- missing on that test day. In CTDM, all records have J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 147 milk yields present, but other traits may be missing. work vector large enough to store the RHS for The missing trait codes specify the correct R to be animal additive genetic effects, and this vector must ij used in conjunction with parity number and DIM. be double precision because the RHS of the MME TD yields are estimates of 24 h yield and if for some animals can become quite large in mag- estimated from two supervised weighings receives an nitude. Elements of RHS for CG and PE effects are accuracy of 100. If 24 h yield is estimated from an created sequentially while processing the CG file and evening or morning weighing only, then accuracy is COW file, respectively. Because the fixed regression 89. If 24 h yields are estimated from one weighing effects are relatively small in number of levels, the in herds that are milking three times a day, then solutions, diagonal blocks, and RHS for fixed regres- accuracy would be lower around 80. These num- sions are stored in memory. The PEDIGREE file is bers are provided by the milk recording organiza- also stored in memory, as mentioned previously. tions Schaeffer et al., 2000b. Each record in the Inverses of the residual covariance matrices by yield file requires 20 1 4T bytes of storage. With parity number, four DIM intervals, missing trait over 21 million records in this file for Canadian code, and accuracy of TD yields are stored in Holsteins, storage of the information in memory is memory as half-stored T 3 T matrices. Inverses for impossible, and therefore the file must be re-read G and P are created prior to iteration. during each iteration. In fact, two copies of the data The iteration process proceeds as follows: file are needed: one sorted by contemporary group 1. The CG file is read sequentially. numbers CG file and one sorted by cow ID COW a All records within a CG are stored in memory 21 file, and each file needs to be read once during every and the appropriate R is selected for each record ij round of iteration. Reading these files in an efficient based on DIM, parity number, missing trait combina- manner is facilitated by special input output I O tion, and accuracy of the TD information. Let the routines in the C language and writing the data in model for the jth TD record in the ith CG and kth an unformatted manner. fixed regression subclass be y 5 c 1 X b 1 W p 1 Z a 1 e . ijk i ij k ij ij ij 2.2.2. Iteration scheme Prior to iteration the diagonal blocks for contem- b Adjust the observations for the current solu- porary groups, animal PE, and animal genetic effects tions for fixed regressions, animal PE, and animal need to be created, inverted, and stored on disk as additive genetic effects, and accumulate into the three separate data files written in standard RHS for that CG call it CGRHS, FORTRAN 77 as unformatted. Animal genetic and 21 PE diagonal blocks for cows with TD records are CGRHS 5 O R y 2 X b 2 W p 2 Z a. ij ijk ij k ij ij functions of DIM on which the cow’s records were j made. Because there is a very large number of Because Wilmink’s function is used for both fixed possible combinations of DIM, missing trait codes, and random regressions in the CTDM, the values of and accuracy codes, these diagonal blocks have to be the covariates that appear in X , W , and Z are the ij ij ij created and stored explicitly. For animals without same. This is not essential, but it does make pro- TD records i.e. ancestors, the diagonal block for gramming a little easier. the genetic effect is c Adjust the observations in each TD record for ii 21 animal PE and animal additive genetic effects and a G , accumulate into the RHS for the kth fixed regression ii 21 subclass call them FRHS , where a is the diagonal element of A for animal i, k which can be created as needed or stored using an 21 9 FRHS 5 FRHS 1 X R y 2 W p 2 Z a. implicit representation as shown by Tier and Graser k k ij ij ijk ij ij 1991. Algorithm A requires memory storage space for d After all records in a CG have been processed, solution vectors for all effects in the model and for a read in the inverse of the diagonal block for that CG, 148 J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 21 21 9 HINV5 H R H , p 5 WINVPERHS. i i i i and obtain a new solution for that CG, e Adjust the animal genetic RHS for the new c 5 HINVCGRHS. animal PE solution, i 21 9 ARHS 5 ARHS 2 Z R W p . i i ij ij ij i e Go through the records for that CG again and adjust the RHS of the fixed regressions for the new 4. To get new animal genetic solutions, the CG solution, PEDIGREE file in memory must be processed. 21 9 FRHS 5 FRHS 2 X R c . k k ij ij i Remember that animals are sorted from youngest to oldest, and that this ordering is critical. Let i Continue until all CG have been processed. represent the ith animal, s represents the sire of 2. Compute new solutions for the fixed regres- animal i, and d represents the dam of animal i, and sions. The block diagonal inverses are already stored km 21 let a represent elements of A between animals k in memory, and m. 21 21 a Adjust the animal genetic RHS for its sire and b 5 X9R X FRHS k k k dam solutions, for all k from 1 to NFR. 21 is id ARHS 5 ARHS 2 G a a 1 a a . 3. The COW file is processed next. Now let the i i s d model for the jth TD record on the ith cow be denoted as b If the animal has TD records, read in the inverted diagonal block for animal i as y 5 H c 1 X b 1 W p 1 Z a 1 e , ij ij ij ij i ij i ij 21 ii 21 21 9 ZINV5 Z R Z 1 a G , i i i with Vare 5 R . For simplicity, the same ij ij subscript, i, has been used to denote PE and animal or if the animal has no TD records, then genetic effects, but remember the PE effects are ii 21 21 referenced by the cow ID in the data file and animal ZINV5 a G . genetic effects are referenced by the animal ID. a Read and store in memory all TD records for a Calculate a new animal genetic solution vector as given cow. a 5 ZINVARHS . b Adjust the observations for fixed regressions, i i CG, and animal genetic effects and accumulate in the RHS for the PE effects i.e. a 36 by 1 vector. c Adjust the sire and dam genetic RHS for the new animal genetic solution and the solution for its 21 9 PERHS 5 O W R y 2 H c 2 X b 2 Z a . ij ij ij ij ij ij i mate as j 21 si sd ARHS 5 ARHS 2 G a a 1 a a , and c Adjust the observations for CG and fixed s s i d regressions and accumulate into the RHS for animal 21 di ds genetic effects, which is the large work vector in ARHS 5 ARHS 2 G a a 1 a a . d d i s memory for all animals, 5. Solve for new genetic group solutions as 21 9 ARHS 5 ARHS 1 Z R y 2 H c 2 X b. i i ij ij ij ij ij ii 21 21 a 5 a G ARHS . i i d Read in the diagonal block inverse for the animal PE effect, The iteration process is continued until satisfactory 21 21 21 convergence is obtained. The CTDM applies up to 9 WINV5 W R W 1 P , i i i 300 iterations, but utilizes solutions from a previous and compute the new animal PE solution as run as starting values in the iteration. J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 149 2.3. Algorithm B would only be of length 78, but three could be necessary for each animal. In terms of disk space this Inspection of the MME for the CTDM reveals the would be only 35 of that required for the larger following structures as a result of partitioning ac- blocks. If an animal does not have any second or cording to lactation number. That is, third lactation TD records, then their inverted diag- onal blocks for those lactations are not written to 21 21 W9R W 1 P 5 22 21 disk because those blocks would be equal to P 21 11 12 13 33 21 9 W R W 1 P P P 1 1 1 and P , respectively for PE effects, for exam- 21 21 22 23 9 P W R W 1 P P , ple, and there is no need to write multiple copies of 2 2 2 1 2 31 32 21 33 9 these matrices. Thus, the actual savings in disk space P P W R W 1 P 3 3 3 would be greater than 65. Cows, however, need to 21 21 with a similar structure for Z9R Z 1 G , and be coded in the program to know which ones do not have records in second or third lactations. 21 9 W R Z 1 1 1 Memory storage is still required for solutions to all 21 21 9 W R Z W 9R Z 5 . 2 2 2 effects in the model, but now the RHS for animal 1 2 21 genetic effects only needs to be large enough for all 9 W R Z 3 3 3 animals for one lactation, i.e. NANRT rather than Note that there are no data connections between NANRT L. The iteration process proceeds as lactations, but only connections via the non-zero follows: covariances of PE and genetic effects between 1. The CG file is read sequentially and calcula- lactations. These structures suggest blocking PE and tions are performed exactly the same as in Algorithm genetic effects on a lactation by lactation basis, A. RHS for fixed regressions are handled in the same rather than all three lactations simultaneously. Let manner. 11 12 13 2. New solutions for fixed regressions are calcu- P P P lated as in Algorithm A. 21 22 23 21 P P P P 5 3. The COW file is processed. Remember that this 1 2 31 32 33 P P P file is now sorted by cow ID within parity number. Let the model for the jth TD record on the ith cow in and lactation m be 11 12 13 G G G y 5 H c 1 X b 1 W p 1 Z a 1 e , ijm ijm ijm ijm im ijm im ijm 21 22 23 21 G G G G 5 , 1 2 31 32 33 and Vare 5 R . ijm ijm G G G a Read and store in memory all TD records for a 21 where each partition is of order RT. cow and determine the appropriate R . ijm b Adjust the observations for contemporary 2.3.1. Data files groups, fixed regressions, and animal genetic effects The PEDIGREE file is needed as before with no and accumulate in the RHS for the PE effects i.e. changes. The TD records data files are also the same within lactation RHS is a 12 by 1 vector, as before except that the COW file must be sorted by 21 9 PERHS 5 O W R y 2 H c 2 X b ijm ijm ijm ijm ijm cow ID within parity number. The CG file, sorted by j contemporary group numbers, remains the same. 2 Z a . ijm im 2.3.2. Iteration scheme c Further adjustment to PERHS is needed for the Blocks are now defined by animals within lacta- PE effects in the other lactations which are correlated tions. The order of these blocks is RT rather than to the PE effects in lactation m for cow i, RT L. For the CTDM with R 5 3, T 5 4, and L 5 3, , m the block size of a half-stored matrix per animal was PERHS 5 PERHS 2 O P p . i, of length 666, and blocks by animals within lactation , ±m 150 J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 d Adjust the observations for CG and fixed d Adjust the sire and dam genetic RHS for the regressions and accumulate over j into the RHS for new animal genetic solution and for the mate’s animal genetic effects for cow i. genetic solution as 21 L 9 ARHS 5 ARHS 1 Z R y 2 H c 2 X b. im im ijm ijm ijm ijm ijm m, si sd ARHS 5 ARHS 2 O G a a 1 a a , and sm sm i, d, , 51 e If a cow has TD records in lactation m, then L read in the inverted diagonal block for cow i PE m, di ds ARHS 5 ARHS 2 O G a a 1 a a . dm dm i, s, effects in lactation m, , 51 21 mm 21 9 WINV5 W R W 1 P , 5. Solve for new genetic group solutions in im im im lactation m as otherwise, ii mm 21 a 5 a G ARHS . mm 21 im im WINV5 P . Compute the new animal PE solution for lactation m 2.4. Comparison of algorithms as Algorithms A and B were applied to the national p 5 WINVPERHS. im Canadian Jersey dairy data set. Data were 543 769 TD records from the first three lactations of 35 502 f Adjust the animal genetic RHS for animal i cows 5NPE that calved after January 1, 1988. The and lactation m for the new PE solution, total number of animals in the evaluation was 69 946 21 9 ARHS 5 ARHS 2 Z R W p . 5NAN. Contemporary groups, formed on the basis im im ijm ijm ijm im of herd-test date-parity subclasses with second and third parities combined numbered 71 038 5NCG. 4. The animal genetic solutions for lactation m are Seventeen phantom parent groups were formed for obtained by processing the PEDIGREE file in unknown sires and dams based on sex of parent and memory. year of birth of offspring. The number of fixed a Adjust the animal’s RHS for the sire and dam regression subclasses, formed on the basis of region– solutions in all lactations as parity-age at calving-season of calving, was 38 L m, is id 5NFR. The model for each trait was the same and ARHS 5 ARHS 2 O G a a 1 a a . im im s, d, , 51 was described in detail by Schaeffer et al. 2000a. Wilmink’s function was utilized so that R 5 3. b Further adjustment to ARHS is needed for im The MME comprised a total of 4 081 096 equa- the genetic effects in the other lactations which are tions. Starting solutions for all effects were zero for correlated to the genetic effects in lactation m for both algorithms prior to iteration. Algorithms were cow i. compared on the basis of total computing time per ii , m ARHS 5 ARHS 2 O a G a . iteration, convergence properties, and memory and im im i, , ±m disk storage requirements. Convergence was attained when the sum of squares of differences in animal c If the animal has TD records in lactation m genetic solutions between iterations divided by the then read in the inverted diagonal block for animal i, sum of squares of animal genetic solutions in the 21 ii mm 21 9 ZINV5 Z R Z 1 a G , im im im latest iteration all times 100 was less than 0.00001. This criterion is unitless compared to a comparison otherwise, of the squared differences between actual and re- ii mm 21 ZINV5 a G . generated right hand sides which would require additional storage to re-generate the right hand sides Calculate a new animal genetic solution vector as for comparisons. a 5 ZINVARHS . Algorithms were written and implemented in im im J . Jamrozik, L.R. Schaeffer Livestock Production Science 67 2000 143 –153 151 Table 2 standard FORTRAN 77. Programs were run on an HP- Expected storage requirement for Algorithm B as applied to UX 9000 800 workstation. All solution and RHS Canadian Holstein data set for different numbers of covariates as vectors were declared as single precision except for random regressions the RHS work vector that was allocated for NAN Number of Memory space Disk storage animal genetic effects, which was declared as double covariates MB MB precision, and was critical to achieving convergence 3 715 2433 in the Holstein breed. 4 954 4242 5 1192 6552 6 1430 9360

3. Results