Results Directory UMM :Data Elmu:jurnal:P:PlantScience:PlantScience_Elsevier:Vol159.Issue1.Oct2000:

terminator sequencing chemistry and ABI377 au- tomatic DNA sequencers. Clones were sequenced from the 5 end only and sequences of more than 450 bp and less than 2 Ns were selected for further analysis. After vector trimming each EST was subjected to both a BLASTN 2.0.5 and a BLASTX 2.0.5 search [19] against the non-redun- dant protein and nucleotide databases of Gen- bank, EMBL and PIR and Swiss-Prot, releases 1.4.99. The BLAST score bits using the BLO- SUM62 and the PAM250 matricies for BLASTN and BLASTX respectively, was used to classify the alignments into strong, medium and weak homology. For details of individual EST sequences contact the Centre for Plant Conservation Genetics, e-mail rhenryscu.edu.au. 2 . 3 . Library quality Contaminating sequences non grape and non cDNA were detected by searching primary BLASTN matches for ESTs with very strong ho- mology log PN or E value B − 37 to viral, fungal, bacterial, mitochondrial, chloroplast, and ribosomal RNA genes. These usually fell into two distinct groups based on PN values. The weaker matches, score B 150, log PN \ − 30 are likely to represent grape cDNAs with similar sequences to the non-grape or non-cDNA BLAST matches. Grape ESTs were found with this level of homol- ogy to mammalian genes see Table 2, suggesting this level of homology is consistent with highly conserved genes. A comparison of the proportion of full length and near full length cDNAs represented in each of the libraries was made by assessing the position in the BLAST match of the 5 end of the region of homology identified in the BLAST output. This will give an underestimate of the number of cDNA clones that are full-length or near full-length, be- cause of evolutionary changes in the start site, and full alignments were not carried out. However it is a quick way of obtaining data directly from the BLAST output in order to compare library quality without lengthy analysis. 2 . 4 . Cellular roles of primary homologues The putative cellular roles of the transcripts identified by ESTs with strong or nominal homol- ogy with known proteins were assigned by exami- nation of primary BLAST matches Genbank ‘definition’ entries, using a functional catalogue of plant genes [14] based on the yeast functional catalogue [13]. The assignments were carried out by the authors from their knowledge of biochem- istry and plant physiology, and reference to the JDBC website. The plant gene catalogue was modified to include an additional category 21 for genes related to specific areas of plant and grapevine development.

3. Results

Sequencing of two grape cDNA libraries pro- duced 2479 ESTs from berry tissue and 2438 from leaf; 99 were greater than 400 bp after trimming vector sequences, and all had less than 2 am- biguous bases. Library quality was assessed by estimating the proportion of contaminating non- cDNA pathogen sequences and 5 integrity from BLASTN and BLASTX outputs see Table 1. The libraries were of reasonable quality having only 5 contamination with non-cDNA mainly ribosomal RNA and pathogen sequences. At least 40 – 50 of the cDNAs were near full length. The berry library made from total RNA contains more full-length sequences than the leaf library made from mRNA, and a similar number of ribosomal Table 1 Library quality Berry Leaf Contaminating sequences 3.39 3.40 Ribosomal RNA sequences a Other non-cDNA sequences b 1.60 0.97 0.08 1.05 Pathogen cDNAs c cDNA length 38.4 28.8 Contain translational start site d 45.1 56.9 Within 10 amino acids of translational start e a ESTs with matches to ribosomal RNA BLASTN score] 150, log PNB−37. b ESTs with matches to mitochondrial, chloroplast and vector DNA BLASTN score]150, log PNB−37. c ESTs with matches to viral and other known pathogens BLASTN score]150, 150, log PNB−37. d Region of homology with BLASTX match contains the translational start site. e Region of homology with BLASTX match begins within 10 amino acids of the translational start site. Fig. 1. Overview of the combined ESTs from both libraries. Primary BLAST homologues are grouped by source. RNA and other non-cDNA sequences. This confi- rms that the use of total RNA rather than mRNA in the first step of library construction has not resulted in reduced reverse transcription or in- creased false priming during first strand synthesis. 3 . 1 . O6er6iew of BLAST results After removal of ribosomal and pathogen se- quences, primary homologies with P[N] B 0.5, de- termined using BLASTX and the non-redundant protein database, were classified as strong 31; BLAST score ] 150, nominal 26; 150 \ score ] 80, and weak 27; score B 80. The remaining ESTs with P[N] \ 0.5 were considered to have no match 16. The ESTs with measurable homology P[N] B 0.5 were further classified according to the source of their BLAST match, and Fig. 1 gives an overview of the 4661 grape ESTs. Seventy-two percent matched to plant genes. The small percent- age of matches to Vitis genes reflects the number of grape genes less than 100 listed in the non-redun- dant databases. Eleven percent of the ESTs matched to non-plant genes, of these 4.6 of ESTs matched with the Bacteria and Archea, 2.2 with simple eukaryotes including fungi, yeasts, and protozoans, 1.7 with invertebrate metazoans and 2.8 with the vertebrates. Within each of the major taxa groups the degree of homology see Table 2 reflected in part evolutionary relationships with more distant taxa; grape genes having a higher degree of homol- ogy with higher plants than with genes from other kingdoms. All groups apart from higher plants, e.g. fungi, bacteria and vertebrates had similar propor- tions of high medium and low homology to the grape ESTs which does not reflect their evolution- ary distances from higher plants. A small propor- tion 11 of vertebrate genes retained strong homology with the grape ESTs suggesting signifi- cant numbers of very highly conserved genes from prokaryotes to vertebrates and to higher plants. 3 . 2 . Redundancy within and between the libraries Redundancy of ESTs with strong or nominal homology BLAST score \ 80 was analyzed by examining the primary BLAST matches to identify ESTs likely to represent the same gene. Fig. 2 gives an overview of the redundancy analysis Table 2 Degree of homology within major taxa groups Taxa group Nominal a Strong a Weak a Total ESTs b 31 36 32 Plant c 41.2 33.0 25.8 5.9 69.3 Simple 24.8 eukaryotes d 69.9 Prokaryotes 6.9 23.2 and viruses e Invertebrate 2.5 22.5 75.0 metazoa f 69.0 Vertebrates g 19.4 11.6 a ESTs with measurable homology P[N]B0.5 divided into strong BLAST score]150, nominal 150\score]80, weak scoreB80. b Total ESTs from both the leaf and berry libraries. c ESTs with homology to sequences from the Viridiplantae. d ESTs with homology to sequences from the Fungi, and protozoa. e ESTs with homology to sequences from the Bacteria, Archaea and viruses. f ESTs with homology to sequences from the non-chordata metazoa. g ESTs with homology to sequences from the Chordata. Fig. 2. Redundancy in berry and leaf libraries. Single and multiple copy ESTs are shown for the leaf A and berry B libraries, with ESTs common to both libraries represented by the striped areas. based on BLAST matches. The leaf and berry libraries had similar proportions of single and multiple copy ESTs. Twenty-two percent of the genes in each library or 12 of the total number of distinct genes were common to both libraries see Fig. 2. Overall 2330 distinct gene matches were obtained from the total ESTs of both libraries. 3 . 3 . Abundance of transcripts with predicted cellular roles in berry and leaf Examination of the primary BLAST matches, revealed 4 classes of ESTs with varying potential to predict their cellular function see Table 3. ESTs in class A, matching sequences of known proteins with strong and nominal homology, are likely to be transcripts of genes with similar func- tions. The function of the BLAST match has been used to assign cellular roles to this group see below. Cellular roles could not be reliably as- signed to ESTs in class B ‘weak homology’ with- out carrying out motif analysis. Almost 10 of the ESTs class C matched to ‘unknown protein’ or ‘putative proteins’ with no indication of the func- tion of the gene product. Most of these were ESTs from other species that had been entered into the non-redundant database. The fourth class, D, of 756 ESTs with no matches, may contain gene sequences novel to grapes, along with ESTs from other species that have not yet been put into the public domain. The putative cellular roles of the transcripts identified by ESTs of class A were assigned by examination of primary BLAST homologues, us- ing 13 putative cellular roles for plant genes, and multiple subcategories see Table 4. About 8 of class A ESTs in the leaf library and 13 in the berry library could not be assigned with certainty unclassified. A major proportion of the leaf ESTs 36 represented transcripts involved in photosynthesis. These were grouped separately from the rest of the transcripts involved in energy production cate- gory 2; in other cases categories were combined, to give an overview of the major cellular roles of the classified transcripts see Fig. 3. Compared to the leaf library, the berry library had a smaller proportion of transcripts involved in photosynthe- sis 3, but a higher proportion involved in de- fense and resistance 18 compared to 7, energy minus photosynthesis 7 compared to 4, signal transduction 8 compared to 5, cell structure and plant growth 15 compared to 8, and protein production and processing growth 24 compared to 17. Primary and secondary metabolism transcripts were represented by a simi- lar proportion of ESTs in each of the libraries. A detailed breakdown of the percentage of tran- scripts in each subcategory for the leaf and berry libraries is shown in Table 4. The high proportion of disease and defense category transcripts in berry was largely due to elevated defense-regulated 11.02, stress response 11.05 and detoxification 11.06 sub-categories. In the energy 02 minus photosynthesis, cell division 03 and plant devel- opment 21, signal transduction 10 and trans- porters 07 categories, the differences between leaf and berry transcript abundance in all of the sub- categories reflect the differences in the main cate- Table 3 Classification of ESTs according to function prediction N total c N berry b N leaf a BLAST match 1102 47.6 Class A: known function strong and nominal homology d 1176 50.1 2278 48.9 637 27.5 1168 25.1 Class B: known function, weak homology e 531 22.6 Class C: unknown function f 191 8.3 268 11.4 459 9.8 Class D: no match g 373 15.9 383 16.6 756 16.2 a Number and percentage in brackets of ESTs in each class of the leaf library ESTs. b Number and percentage in brackets of ESTs in each class of the berry library ESTs. c Number and percentage in brackets of ESTs in each class of the combined ESTs. d ESTs with a BLAST match to genes encoding proteins of known or putative function, BLAST score]80. e ESTs with a BLAST match to genes encoding proteins of known or putative function, BLAST scoreB80. f ESTs with a BLAST match to genes encoding ‘unknown, putative, or hypothetical proteins’ with no reference to any function or known protein. g ESTs with a BLAST match PN\0.5. Table 4 Abundance of transcripts with various roles in the leaf and berry libraries a Functional category leaf b berry c berry c Functional category leaf b 10 . 25 11 . 43 07 Transporters 1 . 45 3 . 99 01 Metabolism 07.01 Ions 0.36 1.81 0.64 4.36 01.01 Amino acid 0.36 0.36 07.07 Sugars 0.00 0.36 01.02 Nitrogen and sulphur 07.10 Amino acids 0.00 0.18 01.03 Nucleotides 0.54 0.45 07.13 Lipids 0.45 0.64 0.36 01.04 Phosphate 0.27 07.22 ATPase 0.09 01.05 Sugars and polysaccharides 0.91 2.63 4.26 07.25 ABC-type 0.09 1.81 0.09 01.06 Lipid and sterol 2.27 0.91 1.00 07.99 Others 0.45 1.45 01.07 Cofactors 08 Intracellular traffic . 64 3 . 99 1 . 81 02 Energy less photosynthesis 7 . 08 02.01 Glycolysis 1.63 08.01 Nuclear 0.00 0.27 1.27 08.04 Mitochondrial 0.27 02.02 Gluconeogenesis 0.09 0.27 1.72 08.07 Vesicular 0.18 0.54 1.00 0.18 02.07 Pentose phosphate 08.10 Peroxisomal 0.18 02.10 TCA pathway 0.27 0.27 0.54 08.16 Extracellular 0.00 0.27 0.18 02.13 Respiration 0.18 2.36 1.81 02.20 E-transport 09 Cell structure 4 . 26 6 . 26 02 . 30 Photosynthesis 3 . 36 09.01 Cell wall 1.18 3.36 35 . 57 09.04 Cytoskeleton 1.72 1.63 09.07 ERGolgi 0.00 2 . 00 0.45 03 Cell growthdi6ision 1 . 45 09.13 Chromosomes 0.36 03.01 Cell growth 0.09 0.36 0.09 09.16 Mitochondria 0.09 0.00 0.00 03.16 DNA synthreplication 0.09 0.27 0.27 09.19 Peroxisome 0.00 0.27 03.19 Recombinationrepair 09.25 Vacuole 0.18 0.82 0.18 0.18 03.22 Cell cycle 0.54 0.45 09.26 Chloroplast 0.64 0.27 03.26 Growth regulators 09.99 Others 0.09 03.99 Other 0.00 0.09 0.27 10 Signal transduction 4 . 72 5 . 81 8 . 17 8 . 89 04 Transcription 10.01 Receptors 0.64 04.01 rRNA synthesis 0.73 0.64 0.64 10.04 Mediators 1.09 0.09 2.27 04.10 tRNA synthesis 0.36 04.19 mRNA systhesis 0.64 10.0404 Kinases 1.54 2.36 0.36 10.0407 Phosphatases 0.27 0.82 0.73 0.54 04.1901 General TFs 4.99 2.72 10.0410 G proteins 1.18 2.09 04.1904 specific TFs 04.1907Chromatin modification 0.09 0.36 11 Diseasedefense 6 . 90 1.36 18 . 78 04.22 mRNA processing 1.09 11.01 Resistance genes 0.18 0.36 11.02 Defense-regulated 1.45 10 . 98 5.44 05 Protein synthesis 6 . 53 5.90 4.08 11.03 Cell death 0.36 0.73 05.01 Ribosomal proteins 11.05 Stress responses 1.54 4.99 4.63 2.00 05.04 Translation factors 0.09 0.09 11.06 Detoxification 3.18 7.62 05.07 Translation control 11.99 Others 0.18 05.10 tRNA synthases 0.00 0.09 0.00 0.00 0.27 05.99 Others 20 Secondary metabolism 3 . 18 2 . 63 20.1 Phenylpropanoidsphenolics 1.54 7 . 08 1.00 06 Protein destination and storage 4 . 54 0.82 0.73 20.2 Terpenoids 0.73 0.82 06.01 Folding and stability 20.4 Non-protein 0.18 0.09 0.18 0.18 06.04 Targeting 0.64 0.27 20.5 Amines 0.09 0.00 06.07 Modification 20.99 Others 0.64 0.64 06.13 Proteolysis 3.09 5.35 0.18 06.20 Storage proteins 0.27 21 De6elopment 1 . 18 2 . 00 21.0 General 0.09 0.18 13 . 43 8 . 35 21.1 Leaf and stem 0.27 0.73 Unclassified 21.2 Fruit and flowers 0.82 1.27 a ESTs matching known proteins with BLASTX scores \80 have been classified into functional categories. b of ESTs in each category for the leaf library. c of ESTs in each category for the berry library. gory. Increased proportions of ESTs involved in sugars and polysaccharide metabolism 01.05, vesicular traffic 08.07 in the berry and increased proportion of cytoskeleton 09.04 transcripts in the leaf reflect differences in berry and leaf physi- ology and biochemistry. Although the primary metabolism category has similar numbers overall, the berry transcripts are elevated for sugars and polysaccharides 01.05 and the leaf transcripts are elevated for the amino acid 01.01 and lipid and sterol 01.06 sub-categories. Ribosomal RNA synthesis 04.01, ribosomal proteins 05.01 and folding and stability 06.01 sub-categories do not reflect the increased proportion of berry tran- scripts found in the rest of these categories.

4. Discussion