As the scope of EST projects continues to increase, EST data on large numbers of expressed genes
from a variety of tissues, cell types and develop- ment stages is becoming available. Analysis of
ESTs is now being used to identify novel genes exhibiting specific expression patterns, and to as-
sign gene function to certain cell types.
In order to analyze gene expression in specific samples from EST data, numbers of transcripts
need to be quantitated from the sampled ESTs. In an unbiased cDNA library, the number of ESTs
matching a particular gene should reflect the abun- dance of the corresponding cDNA in the library
and the level of its mRNA in the tissue from which the library was derived. EST sampling can be an
effective and quantitative measure of steady state mRNA levels [2]. This ‘electronic northern’ ap-
proach has been used to develop expression profi- les in a range of human [3], mouse, invertebrate [4],
and plant tissues [5].
Analysis of transcripts from EST data has also been used to discover novel genes only expressed in
mouse renal proximal tubule [6] and human granu- locytes [7] as well as genes expressed at different
levels in human neoplastic cells compared to their normal counterparts [8] and in cultured cells after
nerve growth factor treatment [2]. Recently re- ported methods of EST analysis [5] allow identifi-
cation of ESTs showing similar expression profiles, thereby raising the possibility of assigning function
to novel genes by virtue of their having similar expression patterns to genes related in function.
These applications will be escalated by the use of high density DNA arrays and chips for review see
[9] and methods such as SAGE [10], that allow expression levels of even larger numbers of tran-
scripts to be analyzed.
When data is available from sufficient libraries, temporal andor developmental expression profiles
can be generated. This has been demonstrated for six stages of soybean embryo development [11]. As
in conventional northerns, this data can be used to help understand more fully the function of the
corresponding gene products in biological pro- cesses, but with the EST approach this is extended
to thousands of genes. Analysis of the expression of large numbers of genes combined with knowl-
edge of their function allows us to perceive the global picture of biological processes in different
cell types. These studies have been initiated by using primary BLAST homologues to divide ESTs
matching known proteins into functional cate- gories defined for bacteria [12] and modified for
yeast [13] and plants [14]. Data for this type of analysis is available for 37 human tissues [3],
Schistosoma mansoni [4], cabbage bud flowers [15], and wood-forming tissues of poplar [16]. Here data
from the analysis of the first 5000 grape ESTs of the Centre for Plant Conservation Genetics’ grape
project are reported. Sequenced from two Char- donnay libraries, one derived from leaf tissue and
the other from berry, the grape ESTs have allowed us to look at global gene expression in two fairly
specialized types of plant cells.
2. Methods
2
.
1
. cDNA library preparation Total RNA was isolated as described by [17]
from fully expanded Chardonnay leaves with no obvious signs of senescence. Poly A + RNA was
then purified using oligo dT cellulose according to the manufacturer’s Boehringer Mannheim pro-
tocol. Total RNA was isolated from pre-veraison Chardonnay berries, average size 1.0 – 1.4 cm, using
CTAB extraction and lithium chloride precipita- tion [18]. This procedure gave low yields of RNA,
but it seemed to be of good quality as judged by agarose electrophoresis not shown.
Double stranded cDNAs were prepared from 5 m
g of poly A + leaf RNA or 25 mg total berry RNA using a Stratagene cDNA synthesis kit
c 200401 according to the manufacturer’s proto- col. This resulted in the production of hemimethy-
lated cDNAs with 5 Eco RI and 3 Xho I restriction sites; these were cloned, asymmetrically,
into Eco RI-Xho I digested pBluescript SK II + . Following ligation, plasmids were transformed into
electro-competent XL1-Blue-MRF’ cells Strata- gene and plated for blue-white selection, all analy-
ses were performed on white colonies. Total libraries of approximately 50 000 recombinants
were obtained in both cases.
2
.
2
. Sequencing and initial data processing DNA sequencing of plasmid clones was per-
formed by the AGRF Australian Genome Re- search Facility using a modified alkaline lysis
DNA preparation method and ABI Prism BigDye
terminator sequencing chemistry and ABI377 au- tomatic DNA sequencers. Clones were sequenced
from the 5 end only and sequences of more than 450 bp and less than 2 Ns were selected for
further analysis. After vector trimming each EST was subjected to both a BLASTN 2.0.5 and a
BLASTX 2.0.5 search [19] against the non-redun- dant protein and nucleotide databases of Gen-
bank, EMBL and PIR and Swiss-Prot, releases 1.4.99. The BLAST score bits using the BLO-
SUM62 and the PAM250 matricies for BLASTN and BLASTX respectively, was used to classify the
alignments
into strong,
medium and
weak homology.
For details of individual EST sequences contact the Centre for Plant Conservation Genetics, e-mail
rhenryscu.edu.au.
2
.
3
. Library quality Contaminating sequences non grape and non
cDNA were detected by searching primary BLASTN matches for ESTs with very strong ho-
mology log PN or E value B − 37 to viral, fungal, bacterial, mitochondrial, chloroplast, and
ribosomal RNA genes. These usually fell into two distinct groups based on PN values. The weaker
matches, score B 150, log PN \ − 30 are likely to represent grape cDNAs with similar sequences
to the non-grape or non-cDNA BLAST matches. Grape ESTs were found with this level of homol-
ogy to mammalian genes see Table 2, suggesting this level of homology is consistent with highly
conserved genes.
A comparison of the proportion of full length and near full length cDNAs represented in each of
the libraries was made by assessing the position in the BLAST match of the 5 end of the region of
homology identified in the BLAST output. This will give an underestimate of the number of cDNA
clones that are full-length or near full-length, be- cause of evolutionary changes in the start site, and
full alignments were not carried out. However it is a quick way of obtaining data directly from the
BLAST output in order to compare library quality without lengthy analysis.
2
.
4
. Cellular roles of primary homologues The putative cellular roles of the transcripts
identified by ESTs with strong or nominal homol- ogy with known proteins were assigned by exami-
nation of primary BLAST matches Genbank ‘definition’ entries, using a functional catalogue of
plant genes [14] based on the yeast functional catalogue [13]. The assignments were carried out
by the authors from their knowledge of biochem- istry and plant physiology, and reference to the
JDBC website. The plant gene catalogue was modified to include an additional category 21 for
genes related to specific areas of plant and grapevine development.
3. Results