Directory UMM :Data Elmu:jurnal:P:PlantScience:Plant Science_BioMedNet:061-080:
108
Chasing the dream: plant EST microarrays
Todd Richmond* and Shauna Somerville†
DNA microarray technology is poised to make an important
contribution to the field of plant biology. Stimulated by recent
funding programs, expressed sequence tag sequencing and
microarray production either has begun or is being
contemplated for most economically important plant species.
Although the DNA microarray technology is still being refined,
the basic methods are well established. The real challenges lie
in data analysis and data management. To fully realize the value
of this technology, centralized databases that are capable of
storing microarray expression data and managing information
from a variety of sources will be needed. These information
resources are under development and will help usher in a new
era in plant functional genomics.
Addresses
Carnegie Institution of Washington, Department of Plant Biology,
Stanford, California 94305, USA
*e-mail: [email protected]
†e-mail: [email protected]
Current Opinion in Plant Biology 2000, 3:108–116
1369-5266/00/$ — see front matter © 2000 Elsevier Science Ltd.
All rights reserved.
Abbreviations
EST
expressed sequence tag
LIMS
laboratory information management system
oligo
oligonucleotide
Introduction
Microarrays are widely recognized as a significant technological advance providing genome-scale information about
gene expression patterns [1,2•,3–6]. Two types of microarrays currently exist: the DNA-fragment-based and the
oligonucleotide (oligo)-based microarrays. These differ
primarily in the nature of the DNA fixed to the solid support. In this review, we focus on the DNA-fragment-based
arrays, in which PCR-amplified inserts of partially
sequenced cDNA clones are spotted onto glass microscope
slides [7••].
A number of reviews describe the methods and outline the
advantages of DNA-fragment microarrays [2,3,7••,8,9••].
As stated in many of them, the major advantages of
microarrays derive from the fact that data can be collected
for large numbers of genes in one experiment permitting
true genome-scale sampling of gene expression patterns. It
is important to note that microarrays, like northern blots,
provide a measure of steady-state RNA levels. Although
such information can be very valuable, not all biological
processes will be regulated at this level of gene expression
and other methods are needed to characterize processes
regulated at other levels.
Three general types of experiments can be performed with
microarrays. In ‘marker discovery’ experiments, the goal is
to discover a limited number of highly specific marker
genes for a cell type, a developmental stage or an environmental treatment. In such experiments, the researcher is
often interested only in genes that show a dramatic and
selective induction or repression of expression. The second type of experiment, an exploratory or ‘biology
discovery’ experiment, is one in which expression information about all genes is used to provide an integrative
view of a plant’s response to a treatment. Such experiments have the potential to provide new insights into
biological processes. In an example drawn from human
biology, Iyer et al. [10•] demonstrated that serum-starved
cells respond to the addition of fresh serum in part by
inducing a suite of genes involved in wounding. The
wound response had not previously been implicated in this
widely studied biological model. The third type of experiments is the ‘gene-function discovery’ experiments. As
microarray data accumulates in centralized public databases, hypotheses as to the function of novel, uncharacterized
genes can be generated by in silico analyses of when and
where such genes exhibit altered expression. These kinds
of experiments will be very beneficial to the field of plant
biology in terms of both developing a broader view of how
plants function and rapidly identifying novel genes most
worthy of detailed characterization. In this review, we
highlight some of our observations on the use of expressed
sequence tag (EST)-based microarrays, and, in particular,
on some of the plant-specific concerns, on the basis of our
experience with Arabidopsis microarrays.
Construction of microarrays
The advantage of constructing microarrays from EST clones
rather than from anonymous clones is that the subsequent
analysis of gene expression patterns is much more informative. EST clones are the obvious choice for Arabidopsis,
because of the availability of a large collection from a public
stock center — the Arabidopsis Biological Resource Center.
There are a few limitations associated with this collection
that should be noted. Most of the Arabidopsis ESTs consist
of only 300–400 basepairs sequenced from the 5′ end of a
longer cDNA clone [11]. Thus, the occurrence of chimeric
clones consisting of sequences from unrelated genes will go
undetected. Genes specifically induced under specialized
environmental conditions or in specialized cells types will
be under-represented in the EST collection, which was prepared from RNA from only four plant tissues. Recent
estimates suggest that only a third of Arabidopsis genes
[12,13] are represented among the current set of ~45,000
Arabidopsis ESTs.
The complete sequence of the Arabidopsis genome is
scheduled to be available in 2000 [14], and the first comprehensive annotation to identify candidate genes will
be completed shortly afterwards [15,16]. Thus, for
Chasing the dream: plant EST microarrays Richmond and Somerville
109
Table 1
EST or microarray projects funded by the National Science Foundation Plant Genome Program in 1999 (see [1] for projects funded
in 1998).
Project
Principal investigator/Institution
Resources to be developed
Tools for potato structural and
functional genomics
B Baker/University of California
at Berkeley
55,000 Potato EST sequences
Potato cDNA microarrays
Global expression studies for the
Arabidopsis genome
J Ecker/University of Pennsylvania
8000 Fully sequenced cDNA clones
Arabidopsis oligoarrays
Expression information from 100 conditions
High-density DNA microarrays for global
gene expression studies in plants
K Ketchem/TIGR
DNA-Fragment-Based arrays of all sequences
of Arabidopsis chromosome 2
Functional and comparative genomics of
disease resistance gene homologs
R Michelmore/
University of California at Davis
Sequences of resistance genes from
Arabidopsis, rice and maize
Specialized microarrays for Arabidopsis,
rice and maize
Structure and function of the expressed
portion of the wheat genomes
C Qualaset/University of California
10,000 Wheat ESTs
Genomics of wood formation in Loblolly pine
R Sederoff/North Carolina State University
EST sequencing from Loblolly pine
Pine microarrays
High-throughput mapping tools for
maize genomics
P Schnable/Iowa State University
Sequence 3′ end of 20,000 maize ESTs
TIGR, The Institute of Genome Research.
Arabidopsis, genes not represented by an EST clone can be
recovered from genomic sequence information. It should
be noted that the computer programs used to identify candidate genes are imperfect, and it may take some time to
fully identify and describe all Arabidopsis genes properly.
Therefore, Arabidopsis microarrays constructed in the near
future may include sequences that are not expressed and
there may be missing genes that are unusual enough to be
missed by gene-finding software programs.
Newer EST projects have noted the problems with the
Arabidopsis EST collection and will sequence a larger number of clones from larger collections of diverse libraries. For
example, The Institute for Genomic Research will sequence
150,000 tomato clones [17], and 300,000 soybean clones will
be sequenced as part of the Soybean EST project [18]
(Tables 1 and 2). Extensive EST sequencing projects have
also been initiated for sorghum, rice, cotton, Medicago truncatula, maize and wheat (Tables 1 and 2), and these will
provide a very useful foundation for future microarray projects in agronomically and horticulturally important plants.
For organisms lacking extensive collections of sequenced
cDNA clones, anonymous clones from normalized or subtracted libraries can be used [8]. Targets that exhibit
expression patterns of interest to the user can be
sequenced at the end of the expression experiment. One
disadvantage of constructing arrays from anonymous
clones is that gene identity will only be determined for
sequenced clones, whereas information about genes with
static expression patterns will be lost.
One additional consideration that is unique to DNA-fragment-based microarrays is that handling mistakes during
the production phase can lead to DNAs being mislabeled or
contaminated before spotting on the slides. In most commercial microarray facilities, the DNAs are re-sequenced as
part of the quality assurance process. In the absence of such
quality control, microarray users should verify key microarray results (see section on Data verification).
Cross-hybridization between related sequences is another
limitation of using EST clones to construct microarrays.
Cross-hybridization to a weakly expressed gene by a probe
for a highly expressed, related gene will confound data collected for the weakly expressed gene. Current estimates are
that sequences with >70% sequence identity over >200
nucleotides are likely to exhibit some degree of crosshybridization under standard conditions [19•]. As estimates
of Arabidopsis genes occurring in gene families range from >
12% [13] to 50% [12], this problem is of substantial concern.
Thus, in future projects, there is some advantage to developing a non-redundant set of completely sequenced cDNA
clones. For Arabidopsis, the availability of genomic
sequence will help address some of the limitations associated with the EST clones. Primer pairs can be designed to
amplify gene-specific fragments for each member of a gene
family for spotting on microarrays.
Sampling and experimental design
The composite nature of plant organs adds additional constraints to the interpretation of microarray data. Gene
expression in plant leaves, for example, is a composite of
the expression patterns of various cell types (e.g. mesophyll, epidermal, vascular) and of various spatially distinct
parts (e.g. upper cell layers versus lower cell layers). At present, our ability to sample single cells or a small number of
uniform cells is restricted. Also, it is difficult or impossible
110
Genome studies and molecular genetics
Table 2
US Department of Agriculture funded EST or microarrays projects active in the year 2000.
Project
Principal investigator/Institution
Resources to be developed
A public soybean EST project
R Shoemaker/ARS, Ames, Iowa
250,000—300,000 EST Soybean
sequences
Identification of genes involved in
plant response to water deficit
J Mullet/Texas A&M University
Specialized sorghum cDNA
microarrays
Genetic engineering of oilseed crops
J Ohlrogge/Michigan State University
Specialized Arabidopsis cDNA
microarrays
The effects of external stimuli on plant responses
at the molecular and cellular levels
E Davies/North Carolina State University
Arabidopsis microarrays
Universal chip for genotyping single
nucleotide polymorphisms
B Lemieux/University of Delaware
Microarray for mapping
SNPs in Brassica oleracea
Genomic analysis of seed quality traits in corn
B Lemieux/University of Delaware
Specialized maize cDNA microarrays
Regulation of metabolism in developing
seeds of Arabidopsis
C Benning/Michigan State University
Arabidopsis cDNA microarrays
Molecular biology of plant development relating
to the needs of Michigan horticulture
S van Nocker/Michigan State University
Sour cherry cDNA microarrays
The effect of environment on grain development
productivity and quality in wheat
W Hurkman/USDA, Albany, California
Wheat cDNA microarrays
Nodulation in V. unguiculata with Rhizobium
or Bradyrhizobium after treatment with biosolids
C Cousin/University of the District of Columbia
Nodule ESTs for Vigna unguiculata
Novel methods for sequence analysis of the
maize genome
W McCombie/Cold Spring Harbor Laboratory
ESTs for maize
ARS, Agricultural Research Service; SNP, single nucleotide polymorphism; USDA, US Department of Agriculture.
to obtain highly synchronized populations of Arabidopsis
cells. These limitations should be considered when interpreting microarray expression data.
As with any experiment, experimental design and sample
replication should be considered when designing microarray experiments. One of the strengths of the microarray
technique is that relatively minor changes in gene expression can be measured. However, the resolving power of
each experiment will be determined by the biological variation in the plant samples and the technical variation
associated with the microarray technology (see Probe
preparation and hybridization).
In designing an experiment, researchers should consider
whether they wish to use the microarrays for ‘marker discovery’ or whether they wish to gain genome-scale knowledge of
a biological process in ‘biology discovery’ experiments. In
the first case, a limited number of genes are identified as of
interest from a microarray experiment and then used in subsequent experiments to characterize a specific treatment or
developmental stage. Such an experiment might consist of
several replicates of two samples harvested at one time point
under a single condition. A limited number of microarrays are
required and the microarray data serves as the starting point
for a project rather than the endpoint. In the second case, in
which a detailed profile of gene expression is desired, a large
number of samples are generated during the course of
analyzing a variety of time points, treatment levels, developmental stages or genetic lines. Such experiments are
extremely rich in information, although costly to perform.
Probe preparation and hybridization
The design of experiments and collection of sample materials is only the first step in microarray experimentation.
There is a whole range of technical issues involved with
microarray hybridization and the collection of hybridization signals [7••]. Only a few of these points are specific to
plant EST microarrays, and thus we will address them only
briefly here.
Typically, two samples (e.g. from treated and control
plants) are labeled with two distinct fluorochromes, mixed
and hybridized simultaneously to the DNA-fragmentbased microarrays. Thus, DNA-fragment microarrays are
best suited to side-by-side comparisons of the two samples
hybridized to the same slide (Type I experiments [7••] or
‘marker discovery’ experiments). The data values are often
expressed as a ratio of the signals collected for the two
samples. Comparisons of multiple samples can be accomplished by using one sample as a common reference
(Type II experiments [7••] or ‘biology discovery’ experiments). Thus, for each given gene, the ratio of the
fluorescent signals derived from the two samples provides
a relative measure of gene expression (i.e. whether mRNA
accumulation is higher, lower or unchanged in one sample
relative to the second sample).
The original methods for preparing a probe required >1 µg
of poly(A)+ RNA [7••]; however, it can be difficult to
obtain this much mRNA from specialized tissues or cell
types. In the future, methods of probe preparation that are
based on total RNA or adapted for use with very small
Chasing the dream: plant EST microarrays Richmond and Somerville
111
Table 3
Microarray data analysis software.
Name
Type
Reference/URL
QuantArray
Data visualization
GSI Lumonics
http://www.genscan.com/lifescience_frame/software.htm
Stingray
Data visualization, clustering
Molecular Applications Group
http://www.mag.com
Array Vision
Data extraction, visualization
Imaging Research, Inc.
http://www.imagingresearch.com
Spotfire
Data visualization, clustering
http://www.spotfire.com
GeneCluster
Clustering using self-organizing maps
http://www.genome.wi.mit.edu/MPR/GeneCluster/GeneCluster.html [34]
Clustering Tools
Clustering
Pangea Systems, Inc.
http://www.pangeasystems.com
Cluster
Hierarchical clustering
http://rana.stanford.edu/clustering/ [30]
GeneSight
Data visualization; hierarchical and
neural network clustering
Biodiscovery
http://www.biodiscovery.com/genesight.html
GeneSpring
Data visualization; clustering; database links
Silicon Genetics
http://www.sigenetics.com/GeneSpring/
GenEx
Public web database; data storage;
data visualization
National Center for Genome Resources; Silicon Genetics
http://www.ncgr.org/research/genex/; http://genex.sigenetics.com/
arraySCOUT
Data visualization; clustering, database links
Lion Biosciences
http://www.lionbioscience.com
ArrayDB
LIMS
National Human Genome Research Institute
http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/
LifeTools
LIMS
Incyte
http://www.incyte.com/products/lifetools/lifetools.html
GeneChip Information
Systems
LIMS
Affymetrix
http://www.affymetrix.com/products/bioinformatics.html
Microarray Database
LIMS
Stanford University
http://genome-www4.stanford.edu:8100/
amounts of mRNA, and that use more efficient methods of
label incorporation, will extend the utility of microarray
analysis to difficult samples.
One of the chief technical specifications of microarrays is
sensitivity. Sensitivity, determined by the minimal signal
that can be reliably detected above background, is dependent in part on background levels and on the specific
incorporation of label into the probes. Claims have been
made that EST microarrays can detect 1 part in 100,000
[20•] to 500,000 [21]. Incyte Pharmaceuticals indicates that
with their GEM microarrays [22•], 2 pg out of 600 ng of
poly(A) RNA (1:300,000) can be detected [23•]. These levels are generally considered sufficient to detect a mRNA
present at a few copies per cell [20•].
Another measure of sensitivity is the ability to reliably
determine minor changes in mRNA levels between two
samples. Incyte suggests the technical variation associated
with their GEM arrays, as measured by the coefficient of
variation, is 15% [23•]. Our own experiments, using replicate 2500-element Arabidopsis arrays and experimental
samples, show an average coefficient of variation that is
closer to 37%. Incyte GEM arrays can be used to determine
a differential expression greater than 1.75-fold. In our laboratory, we have established standards for minimal
expression levels of more than three standard deviations
above the background of the slide and for differential
expression of more than three-fold when analyzing three
independent replicates.
Data collection
Microarray hybridization signals are collected by specialized scanners, which are essentially computer-controlled
inverted scanning fluorescent microscopes, with most models providing confocal imaging. The output of these readers
is typically a 16-bit TIFF image file, limiting the dynamic
range of the data to less than five orders of magnitude.
The adoption of charge-coupled device (CCD)-based
detectors [24] may increase the range of signal detection as
improved methods and slide coatings [25] reduce background noise. The TIFF files are processed with image
analysis software (often supplied with the microarray reader) to produce a measure of the hybridization signal for
each DNA sample on the array. The ideal image analysis
software package would be able to detect automatically
each valid spot on the array, flag bad spots, subtract the
112
Genome studies and molecular genetics
Figure 1
Small number of changes
Medium number of changes
Large number of changes
6
5
Channel 2 (log10)
4
3
2
1
0
0
1
2
3
4
5
6 0
2
3
4
5
6 0
1
2
3
4
5
6
Channel 1 (log10)
100
Channel 1/Channel 2 ratio
1
House 200
House 20
Mean
Student
10
1
0.1
0
500
1000
1500
2000 2500 0
500
1000
1500
2000 2500 0
500
1000
1500
2000
2500
Clones
Current Opinion in Plant Biology
Comparison of normalization methods. Upper panels, plot of
normalized signal intensities (average of three replicates) from three
microarray experiments, demonstrating the range of differential
expression that can be observed. Signals were normalized using a
Student normalization and readjusted to original values using an overall
mean value. Dashed lines indicate two-fold and three-fold differential
expression cutoffs. Lower panels, channel 1/channel 2 ratio for
experiments depicted in upper panels. Data was normalized using
each of four normalization methods (Student, mean fluorescence, or
housekeeping) before calculation of the ratio value. For housekeeping
normalizations, a subset of 20 or 200 clones were chosen as
expression controls from an initial data set of 2462 clones, based on
the data from 114 hybridizations. Only genes that had medium to high
expression were used, and the average expression ratio of these genes
did not deviate >10% from 1.0 across all hybridizations.
local background, and output a mean signal-intensity
above background for each spot [26,27].
can be used to find differentially expressed genes. Our laboratory uses a collection of custom Perl scripts [28] to
process and analyze microarray data (T Richmond, unpublished data). Companies that provide microarrays or
microarray services often provide the necessary software
and information to examine the experimental data
(Table 3). For large-scale microarray projects, laboratory
information management systems (LIMS) are essential.
There are a number of microarray projects developing
database tools for storing, processing and retrieving
microarray data [27,29–33]. Although often freely available, the software tools being developed by these groups
are not amenable to inexperienced researchers, requiring
computer skills for their installation and maintenance that
are beyond those of the typical biologist.
Data analysis
The real heart of microarray experimentation and, ironically, the most underdeveloped part, is data analysis. When an
experiment is finished, the researcher is left with a series
of signal intensities from thousands of clones. The true
potential of microarrays as tools for gaining a global, highly integrated view of plant biology cannot be realized
without large-scale relational databases to house data from
multiple experiments and sophisticated data mining and
pattern-recognition software tools for extracting multidimensional information from microarray data sets [26].
For simple comparisons, as is typical for marker-discovery
experiments, there are a variety of data analysis methods.
For small data sets, spreadsheets and statistical software
Regardless of the software program used, the first step in
data analysis involves the normalization of data. In order to
Chasing the dream: plant EST microarrays Richmond and Somerville
compare separate fluorescent channels, or the results from
separate experiments, the raw data must be transformed
into normalized values. Several different methods of data
normalization can be used. One method identifies a standard set of ‘housekeeping’ genes with expression levels
that remain relatively constant under a variety of different
conditions and uses their aggregate expression level to
adjust the fluorescent signal of each target to a normalized
value. The challenge is to identify a significant number
(20–200) of medium to highly expressed genes with constant expression over a wide range of experimental
conditions for each plant species. This requires a relatively large initial set of data in order to choose such genes.
Another method of normalization, which is not dependent
on a predetermined set of housekeeping genes, makes the
assumption that under the conditions being tested, most
genes will not change in expression, and uses the mean,
mode or median fluorescence for each channel as a means
of normalization. Although this may work in most cases,
there are instances where this assumption can break down.
For specialized cDNA microarrays with a limited number
of genes, many of the genes may change in expression and
thus the addition of a large number of controls may be necessary. Figure 1 gives an example of the data collected
from three different experiments, showing the range of differential expression that can be seen.
Another simple method of normalization is to convert the
fluorescence of each target spot into a normalized value
that is based on the standard deviation (Student normalization) or the mean and standard deviation (Z-score
normalization) of the fluorescence of each channel as a
whole. (For Student normalization, S = χ/σ; for Z-score
normalization, Z = (χ–µ)/σ, where χ = sample value,
µ = mean of the channel data, and σ = standard deviation
of the channel data.) We carried out a series of experiments
in triplicate to compare the methods of normalization
described above. Figure 1 shows the comparison of methods for three experiments, demonstrating that there is
little difference among the normalization methods. The
chief differences appear at the cutoff thresholds for differential expression (typically three-fold); small changes in
the threshold (0.1–0.5-fold) can result in large changes in
the number of differentially expressed genes. We have
found the Student normalization to be the simplest to use
and typically use it for our experiments.
Side-by-side comparisons quickly inspire larger experiments that are based on time-courses, dose-response series
or multiple treatments. For examining multiple data sets,
cluster analysis is the most common method [30,32,34,35].
Genes can be grouped together according to their similarity in pattern of gene expression using a variety of different
computer algorithms. From such analysis, the patterns of
expression of thousands of genes across a wide number of
conditions can be reduced to a smaller number of distinct
patterns. Depending of the resolving power of the data set
113
and software programs, the number of clusters and their
associated patterns may reflect the number of distinct
responses associated with a treatment or developmental
sequence — responses that may not always be reflected in
cytological or physiological parameters that can be readily
measured. As an example of such methods, cluster analysis
of the expression data for the cell cycle in yeast revealed
the expected stages of G0, G1, G2, S and M in this wellcharacterized experimental system [36].
Clustering is also a powerful tool for predicting the function of previously uncharacterized genes. The utility of
this approach has been clearly demonstrated in a number
of systems [10•,36,37]. Iyer et al. [10•] identified over
200 previously unknown genes whose expression was regulated during the response of fibroblasts to serum. The
coordinate expression of uncharacterized genes with genes
of known function provides clues to the function of the
unknown genes (the ‘guilt-by-association’ rule for genefunction discovery [20•]). Plant biologists will be able to
conduct similar experiments, classifying genes on the basis
of their coordinated expression in response to environmental stresses or developmental stages. For plants with
fully sequenced genomes, another outcome of such cluster
analysis is the discovery of candidate cis-acting regulatory
elements from the identification of sequence elements
shared in common among all genes of a cluster [36,38].
Such discoveries are likely to accelerate future discoveries
of transcription factors and other DNA-binding proteins.
Microarray databases and information
management
The uniform goal of microarray data analysis software is to
provide an easy way for the researcher to identify potentially interesting candidate genes and to provide
convenient, detailed information about these genes. What
readily becomes apparent when studying thousands of
genes simultaneously is that one has personal knowledge
of only a small fraction of genes. Information that is readily accessible at the point of data analysis is therefore an
extremely valuable aid in interpreting results from
microarray experiments. Towards this goal, microarray
databases should ideally contain links to a multitude of
external information sources [39•], including citation databases [40], sequence databases [41], metabolic pathway
databases [42,43] and protein domain databases [44].
Where possible, microarray databases should also contain
links to information regarding expression data collected
from other sources, map position, predicted protein motifs,
functional classifications, related genes in both plants and
other organisms, known alleles and mutations, interacting
protein partners and availability of DNA stocks, seed
stocks and so on. For most plant species, including
Arabidopsis, such information is now being collected in a
systematic fashion.
The eventual goal is to have a unified and comprehensive
information resource, which will link microarray expression
114
Genome studies and molecular genetics
data to all available gene-related data. There are several
groups working on establishing public repositories for
microarray expression data [45,46]. The recently funded
TAIR (The Arabidopsis Information Resource) project [47],
with its aim of providing a link from Arabidopsis to all other
plants and vice versa (S Rhee, C Somerville, personal communication), is the first step in this direction for plant biologists.
One innovation is to use microarrays in ‘reverse Southern’
experiments to monitor changes in gene-copy number
[51]. The application of these methods can potentially be
very helpful in the analysis of the various kinds of chromosomal mutants that exist in plants. In addition, such an
analysis could be extended to population genetic studies of
gene evolution through duplication and deletions.
Data verification
Further in the future, a method of generating ‘virtual
masks’ using optical devices [52•,53•] and inkjet-based
methods (Packard Instruments Company, Meriden,
Connecticut) have been proposed as alternative methods to
the established photolithography-based method for producing oligo-based microarrays or the spotting method for
DNA-fragment microarrays. These technologies may lead
to reductions in production costs and make the technology
more accessible to all plant biologists. There are various
other technical innovations being introduced that will
enhance microarray experimentation, including multi-laser
scanners that permit the use of more than two fluorochromes at a time (GSI Lumonics, Watertown,
Massachusetts; Virtek, Woburn, Massachusetts), and new
dyes with improved properties (Molecular Probes, Eugene,
Oregon; New England Nuclear, Boston, Massachusetts).
Once candidate gene(s) of interest have been identified,
researchers must decide how to proceed. As mentioned
above, the gene identities of key genes should be verified
by sequencing the spotted EST. In addition, many reviewers will not yet accept microarray data on its own, requiring
secondary confirmation of differential expression.
There are a number of standard and newly emerging techniques that can be used to obtain secondary confirmation:
RNA gel blots, quantitative or competitive PCR, and
quantitative ribonuclease protection assays. Each has its
own advantages, depending on the tissue and/or conditions
being evaluated. RNA gel blots are simple and straightforward if sufficient amounts of tissue are available.
Quantitative PCR, especially using real-time instruments
such as the LightCycler [48] or the TaqMan [49] systems,
is the method of choice if source material is limiting.
Difficulties with primer design, genomic DNA contamination, and choice of internal controls, however, can make
setting up these systems tedious, especially if the expression pattern of a large number of genes must be evaluated.
Future prospects
For Arabidopsis and plants of significant commercial interest (e.g. maize, soybean, rice), commercial vendors will
probably produce microarrays for sale. There are many
advantages to commercializing the production of microarrays. As long as the product is commercially viable, the
companies can invest significantly more resources in producing large numbers of high-quality microarrays than can
an academic laboratory. Ideally, all but the most specialized
microarrays will be produced commercially in the future.
The future of Arabidopsis microarrays is likely to diverge
from those to be developed for other plant species. With
the completion of the genome sequence, full genome
microarrays can be constructed. In addition, the availability of high-quality, error-free genomic sequence will
permit the construction of oligo-based microarrays. The
National Science Foundation has recently funded a group
headed by J Ecker to produce an Arabidopsis gene expression oligo-array in collaboration with Affymetrix
(Sunnyvale, California; Table 1). In addition to geneexpression studies, oligo-based microarrays have been
designed for use in scoring single nucleotide polymorphisms molecular markers in population surveys and in
genetic crosses [50]. M Mindrinos in R Davis’ group
(Stanford University) is currently developing such a ‘mapping chip’ for Arabidopsis.
Conclusions
DNA-fragment-based microarrays offer the promise of dramatically enhancing our understanding of all aspects of
plant biology. Microarrays are proving to be very valuable
for the initial functional characterization of unknown genes
as well as enabling detailed analyses of already well-known
processes. This technology will greatly facilitate the analysis of mutants and other genetic resources that are
available for plants. The full potential of this technology
depends on the establishment of public databases to house
microarray expression data so that a maximum number of
researchers can bring their expertise and intuition to bear
on the interpretation of the expression data.
Acknowledgements
Supported in part by the US Department of Agriculture, the US
Department of Energy and the Carnegie Institution of Washington. We also
thank Robin Shealy (University of Illinois at Urbana-Champagne) for his
helpful advice about data normalization. We thank Gert de Boer, Greg
Galloway, Wolf Scheible and John Vogel for their comments and suggestions
on the manuscript. TAR thanks C Somerville for his support of his research.
Publication # 1427 of the Carnegie Institution of Washington.
References and recommended reading
Papers of particular interest, published within the annual period of review,
have been highlighted as:
• of special interest
•• of outstanding interest
1.
Baldwin D, Crane V, Rice D: A comparison of gel-based, nylon filter
and microarray techniques to detect differential RNA expression
in plants. Curr Opin Plant Biol 1999, 2:96-103.
2. Brown PO, Botstein D: Exploring the new world of the genome
•
with DNA microarrays. Nat Genet 1999, 21:33-37.
A thoughtful commentary on the impact of DNA microarrays.
3.
Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM: Expression
profiling using cDNA microarrays. Nat Genet 1999, 21:10-14.
Chasing the dream: plant EST microarrays Richmond and Somerville
4.
Granjeaud S, Bertucci F, Jordan BR: Expression profiling: DNA
arrays in many guises. Bioessays 1999, 21:781-790.
5.
Lemieux B, Aharoni A, Schena M: DNA chip technology. Mol Breed
1998, 4:277-289.
6.
Lipshutz RJ, Fodor SPA, Gingeras TR, Lockhart DJ: High density
synthetic oligonucleotide arrays. Nat Genet 1999, 21:20-24.
7.
Eisen MB, Brown PO: DNA arrays for analysis of gene expression.
•• Methods Enzymol 1999, 303:179-205.
This paper provides a thorough description of the DNA-fragment-based
microarray technology.
8.
Kehoe D, Villand P, Somerville S: DNA microarrays for studies of
higher plants and other photosynthetic organisms. Trends Plant
Sci 1999, 4:38-41.
9. Schena M (ed): DNA Microarrays: A Practical Approach. New York:
•• Oxford University Press; 1999.
A collection of chapters covering all aspects of DNA microarray technology,
including a number of useful protocols. Includes a broad overview of the field
and detailed experimental advice.
10. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM,
•
Staudt LM, Hudson J Jnr, Boguski MS et al.: The transcriptional
program in the response of human fibroblasts to serum. Science
1999, 283:83-87
Demonstrates the power of cluster analysis for identifying the function of
unknown genes by clustering with genes of known identity and function.
11. Newman T, de Bruijn FJ, Green P, Keegstra K, Kende H, McIntosh L,
Ohlrogge J, Raikhel N, Somerville S, Thomashaw M et al.: Genes
galore: a summary of methods for accessing results from largescale partial sequencing of anonymous Arabidopsis cDNA clones.
Plant Physiol 1994, 106:1241-1255.
12. Lin, X, Kaul, S, Rounsley, S, Shea, TP, Benito, M-I, Town, CD et al.:
Sequence and analysis of chromosome 2 of Arabidopsis thaliana.
Nature 1999, 402:761-768.
13. Mayer K, Schüller C, Wambutt R, Murphy G, Volckaert G, Pohl T,
Dusterhoft A, Stiekema W, Entian KD, Terryn N et al.: Sequence and
analysis of chromosome 4 of the plant Arabidopsis thaliana.
Nature 1999, 402:769-777.
14. The Arabidopsis Information Resource (TAIR): Arabidopsis Genome
Initiative. URL http://www.arabidopsis.org/agi.html
15. The Institute for Genomic Research (TIGR): Arabidopsis Genome
Annotation Database (AGAD). URL http://www.tigr.org/tdb/at/agad
16. Munich Information Center for Protein Sequences (MIPS):
Arabidopsis thaliana database (MATDB). URL
http://websvr.mips.biochem.mpg.de/proj/thal/
17.
The Institute for Genomic Research (TIGR): TIGR Tomato Gene
Index. URL http://www.tigr.org/tdb/lgi/index.html
18. Soybean EST Project: The Soybean EST Project Home Page.
URL http://macgrant.agron.iastate.edu/soybeanest.html
19. Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR: Genome-wide
•
expression profiling in Escherichia coli K-12. Nucleic Acids Res
1999, 27:3821-3835.
A detailed analysis comparing nylon-based arrays using 32P and glass slides
using fluorescent dyes. The authors show that radioactive- and fluorescentbased methods produce nearly equivalent results.
20. Ruan Y, Gilmore J, Conner T: Towards Arabidopsis genome
•
analysis: monitoring expression profiles of 1400 genes using
cDNA microarrays. Plant J 1998 15:821-833.
This is the first article that provides an example of the use of large DNA
microarrays in plant biology.
21. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel
human genome analysis: microarray-based expression
monitoring of 1 000 genes. Proc Natl Acad Sci USA
1996, 93:10614-10619.
22. Incyte Pharmaceuticals: Gene Expression Microarrays.
•
URL http://gem.incyte.com/gem/index.shtml
Incyte produces the first commercially available plant microarray, an
Arabidopsis GEM array containing 7 942 sequence verified elements representing 6 068 genes/clusters.
23. Incyte Pharmaceuticals: Incyte GEM Array Reproducibility Study.
•
URL http://gem.incyte.com/gem/GEM-reproducibility.pdf
This study, albeit unrefereed, is one of the few that examines the sensitivity
and reproducibility of DNA microarrays.
115
24. Che D, Bao P, Li R, Daywalt M, Ruffalo T, Shi J, Prokhorova A,
Lublinsky A, Lermer N, Müller UR: A novel surface, attachment
chemistry and CCD-based multicolor imaging system for analysis
of genomic DNA arrays. Proc Scanning 1999, 21:63-64.
25. Beier M, Hoheisel JD: Versatile derivatisation of solid support
media for covalent bonding on DNA-microchips. Nucleic Acids Res
1999, 27:1970-1977.
26. Bassett DE, Eisen MB, Boguski MS: Gene expression informatics:
it’s all in your mine. Nat Genet 1999, 21:51-55.
27.
Ermolaeva O, Rastogi M, Pruitt KD, Schuler GD, Bittner ML, Chen Y,
Simon R, Meltzer P, Trent JM, Boguski MS: Data management for
gene expression arrays. Nat Genet 1998, 20:19-23.
28. Perl: Perl Mongers: The Perl Advocacy People.
URL http://www.perl.org
29. Bittner M, Meltzer P, Trent J: Data analysis and integration: of steps
and arrows. Nat Genet 1999, 22:213-215.
30. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and
display of genome-wide expression patterns. Proc Natl Acad Sci
USA 1998, 95:14863-14868.
31. Imbert MC, Nguyen VK, Granjeaud S, Nguyen C, Jordan BR:
LABNOTE, a laboratory notebook system designed for academic
genomics groups. Nucleic Acids Res 1999, 27:601-607.
32. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E,
Lander ES, Golub TR: Interpreting patterns of gene expression
with self-organizing maps: methods and application to
hematopoietic differentiation. Proc Natl Acad Sci USA
1999 96:2907-2912.
33. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM:
Systematic determination of genetic network architecture. Nat
Genet 1999, 22:281-285.
34. Toronen P, Kolehmainen M, Wong G, Castren E: Analysis of gene
expression data using self-organizing maps. FEBS Lett
1999, 451:142-146.
35. Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S et al.: Largescale temporal gene expression mapping of central nervous
system development. Proc Natl Acad Sci USA 1998, 95:334-339.
36. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB,
Brown PO, Botstein D, Futcher B: Comprehensive identification of
cell cycle-regulated genes of the yeast Saccharomyces cerevisiae
by microarray hybridization. Mol Biol Cell 1998, 9:3273-3297.
37.
DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic
control of gene expression on a genomic scale. Science
1997, 278:680-686.
38. Zhang MQ: Promoter analysis of co-regulated genes in the yeast
genome. Comput Chem 1999, 23:233-250.
39. Burks C: Molecular biology database list. Nucleic Acids Res
•
1999, 27:1-9.
A comprehensive list of brief descriptions and pointers to internet sites for
various molecular biology databases.
40. National Library of Medicine: PubMed. URL http://www.ncbi.nlm.nih.gov/
PubMed/
41. National Center for Biotechnology Information: GenBank.
URL http://www.ncbi.nlm.nih.gov/
42. Japanese Human Genome Project: Kyoto Encyclopedia of Genes
and Genomes (KEGG). URL http://www.genome.ad.jp/kegg/kegg2.html
43. Kanehisa M: A database for post-genome analysis. Trends Genet
1997, 13:375-376.
44. Pfam Consortium: The Pfam database of protein domains and
HMMs. URL http://pfam.wustl.edu/
45. European Bioinformatics Institute: The ArrayExpress Database.
URL http://www.ebi.ac.uk/arrayexpress/
46. National Center for Genome Resources: GeneX: a Collaborative
Internet Database and Toolset for Gene Expression Data.
URL http://www.ncgr.org/research/genex/
47.
Carnegie Institution of Washington and National Center for Genome
Resources: The Arabidopsis Information Resource (TAIR).
URL http://www.arabidopsis.org
116
Genome studies and molecular genetics
48. Roche Molecular Biochemicals: Molecular Biochemicals.
URL http://biochem.roche.com/
49. PE Biosystems: Applied Biosystems. URL http://www.pebio.com/ab/
50. Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G,
Perkins N, Winchester E, Spencer J et al.: Large-scale identification,
mapping, and genotyping of single-nucleotide polymorphisms in the
human genome. Science 1998, 280:1077-1082.
51. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A,
Williams CF, Jeffrey SS, Botsein D, Brown PO: Genome-wide
analysis of DNA copy-number changes using cDNA microarrays.
Nat Genet 1999, 23:41-46.
52. Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F, Sussman M,
•
Cerrina F: Maskless fabrication of light-directed oligonucleotide
microarrays using a digital micromirror array. Nat Biotechnol 1999,
17:974-978.
Both this paper and [53•] outline an innovative use of new technology that
may revolutionize oligonucleotide array construction.
53. Gardner S: Digital Optical Chemistry System. URL
•
http://pompous.swmed.edu/expbio/doc/
This website, like [52•], describes a system that uses a unique UV light
projector to manufacture biological/chemical arrays using UV photochemistry. This new technology may revolutionize oligonucleotide array
construction.
Chasing the dream: plant EST microarrays
Todd Richmond* and Shauna Somerville†
DNA microarray technology is poised to make an important
contribution to the field of plant biology. Stimulated by recent
funding programs, expressed sequence tag sequencing and
microarray production either has begun or is being
contemplated for most economically important plant species.
Although the DNA microarray technology is still being refined,
the basic methods are well established. The real challenges lie
in data analysis and data management. To fully realize the value
of this technology, centralized databases that are capable of
storing microarray expression data and managing information
from a variety of sources will be needed. These information
resources are under development and will help usher in a new
era in plant functional genomics.
Addresses
Carnegie Institution of Washington, Department of Plant Biology,
Stanford, California 94305, USA
*e-mail: [email protected]
†e-mail: [email protected]
Current Opinion in Plant Biology 2000, 3:108–116
1369-5266/00/$ — see front matter © 2000 Elsevier Science Ltd.
All rights reserved.
Abbreviations
EST
expressed sequence tag
LIMS
laboratory information management system
oligo
oligonucleotide
Introduction
Microarrays are widely recognized as a significant technological advance providing genome-scale information about
gene expression patterns [1,2•,3–6]. Two types of microarrays currently exist: the DNA-fragment-based and the
oligonucleotide (oligo)-based microarrays. These differ
primarily in the nature of the DNA fixed to the solid support. In this review, we focus on the DNA-fragment-based
arrays, in which PCR-amplified inserts of partially
sequenced cDNA clones are spotted onto glass microscope
slides [7••].
A number of reviews describe the methods and outline the
advantages of DNA-fragment microarrays [2,3,7••,8,9••].
As stated in many of them, the major advantages of
microarrays derive from the fact that data can be collected
for large numbers of genes in one experiment permitting
true genome-scale sampling of gene expression patterns. It
is important to note that microarrays, like northern blots,
provide a measure of steady-state RNA levels. Although
such information can be very valuable, not all biological
processes will be regulated at this level of gene expression
and other methods are needed to characterize processes
regulated at other levels.
Three general types of experiments can be performed with
microarrays. In ‘marker discovery’ experiments, the goal is
to discover a limited number of highly specific marker
genes for a cell type, a developmental stage or an environmental treatment. In such experiments, the researcher is
often interested only in genes that show a dramatic and
selective induction or repression of expression. The second type of experiment, an exploratory or ‘biology
discovery’ experiment, is one in which expression information about all genes is used to provide an integrative
view of a plant’s response to a treatment. Such experiments have the potential to provide new insights into
biological processes. In an example drawn from human
biology, Iyer et al. [10•] demonstrated that serum-starved
cells respond to the addition of fresh serum in part by
inducing a suite of genes involved in wounding. The
wound response had not previously been implicated in this
widely studied biological model. The third type of experiments is the ‘gene-function discovery’ experiments. As
microarray data accumulates in centralized public databases, hypotheses as to the function of novel, uncharacterized
genes can be generated by in silico analyses of when and
where such genes exhibit altered expression. These kinds
of experiments will be very beneficial to the field of plant
biology in terms of both developing a broader view of how
plants function and rapidly identifying novel genes most
worthy of detailed characterization. In this review, we
highlight some of our observations on the use of expressed
sequence tag (EST)-based microarrays, and, in particular,
on some of the plant-specific concerns, on the basis of our
experience with Arabidopsis microarrays.
Construction of microarrays
The advantage of constructing microarrays from EST clones
rather than from anonymous clones is that the subsequent
analysis of gene expression patterns is much more informative. EST clones are the obvious choice for Arabidopsis,
because of the availability of a large collection from a public
stock center — the Arabidopsis Biological Resource Center.
There are a few limitations associated with this collection
that should be noted. Most of the Arabidopsis ESTs consist
of only 300–400 basepairs sequenced from the 5′ end of a
longer cDNA clone [11]. Thus, the occurrence of chimeric
clones consisting of sequences from unrelated genes will go
undetected. Genes specifically induced under specialized
environmental conditions or in specialized cells types will
be under-represented in the EST collection, which was prepared from RNA from only four plant tissues. Recent
estimates suggest that only a third of Arabidopsis genes
[12,13] are represented among the current set of ~45,000
Arabidopsis ESTs.
The complete sequence of the Arabidopsis genome is
scheduled to be available in 2000 [14], and the first comprehensive annotation to identify candidate genes will
be completed shortly afterwards [15,16]. Thus, for
Chasing the dream: plant EST microarrays Richmond and Somerville
109
Table 1
EST or microarray projects funded by the National Science Foundation Plant Genome Program in 1999 (see [1] for projects funded
in 1998).
Project
Principal investigator/Institution
Resources to be developed
Tools for potato structural and
functional genomics
B Baker/University of California
at Berkeley
55,000 Potato EST sequences
Potato cDNA microarrays
Global expression studies for the
Arabidopsis genome
J Ecker/University of Pennsylvania
8000 Fully sequenced cDNA clones
Arabidopsis oligoarrays
Expression information from 100 conditions
High-density DNA microarrays for global
gene expression studies in plants
K Ketchem/TIGR
DNA-Fragment-Based arrays of all sequences
of Arabidopsis chromosome 2
Functional and comparative genomics of
disease resistance gene homologs
R Michelmore/
University of California at Davis
Sequences of resistance genes from
Arabidopsis, rice and maize
Specialized microarrays for Arabidopsis,
rice and maize
Structure and function of the expressed
portion of the wheat genomes
C Qualaset/University of California
10,000 Wheat ESTs
Genomics of wood formation in Loblolly pine
R Sederoff/North Carolina State University
EST sequencing from Loblolly pine
Pine microarrays
High-throughput mapping tools for
maize genomics
P Schnable/Iowa State University
Sequence 3′ end of 20,000 maize ESTs
TIGR, The Institute of Genome Research.
Arabidopsis, genes not represented by an EST clone can be
recovered from genomic sequence information. It should
be noted that the computer programs used to identify candidate genes are imperfect, and it may take some time to
fully identify and describe all Arabidopsis genes properly.
Therefore, Arabidopsis microarrays constructed in the near
future may include sequences that are not expressed and
there may be missing genes that are unusual enough to be
missed by gene-finding software programs.
Newer EST projects have noted the problems with the
Arabidopsis EST collection and will sequence a larger number of clones from larger collections of diverse libraries. For
example, The Institute for Genomic Research will sequence
150,000 tomato clones [17], and 300,000 soybean clones will
be sequenced as part of the Soybean EST project [18]
(Tables 1 and 2). Extensive EST sequencing projects have
also been initiated for sorghum, rice, cotton, Medicago truncatula, maize and wheat (Tables 1 and 2), and these will
provide a very useful foundation for future microarray projects in agronomically and horticulturally important plants.
For organisms lacking extensive collections of sequenced
cDNA clones, anonymous clones from normalized or subtracted libraries can be used [8]. Targets that exhibit
expression patterns of interest to the user can be
sequenced at the end of the expression experiment. One
disadvantage of constructing arrays from anonymous
clones is that gene identity will only be determined for
sequenced clones, whereas information about genes with
static expression patterns will be lost.
One additional consideration that is unique to DNA-fragment-based microarrays is that handling mistakes during
the production phase can lead to DNAs being mislabeled or
contaminated before spotting on the slides. In most commercial microarray facilities, the DNAs are re-sequenced as
part of the quality assurance process. In the absence of such
quality control, microarray users should verify key microarray results (see section on Data verification).
Cross-hybridization between related sequences is another
limitation of using EST clones to construct microarrays.
Cross-hybridization to a weakly expressed gene by a probe
for a highly expressed, related gene will confound data collected for the weakly expressed gene. Current estimates are
that sequences with >70% sequence identity over >200
nucleotides are likely to exhibit some degree of crosshybridization under standard conditions [19•]. As estimates
of Arabidopsis genes occurring in gene families range from >
12% [13] to 50% [12], this problem is of substantial concern.
Thus, in future projects, there is some advantage to developing a non-redundant set of completely sequenced cDNA
clones. For Arabidopsis, the availability of genomic
sequence will help address some of the limitations associated with the EST clones. Primer pairs can be designed to
amplify gene-specific fragments for each member of a gene
family for spotting on microarrays.
Sampling and experimental design
The composite nature of plant organs adds additional constraints to the interpretation of microarray data. Gene
expression in plant leaves, for example, is a composite of
the expression patterns of various cell types (e.g. mesophyll, epidermal, vascular) and of various spatially distinct
parts (e.g. upper cell layers versus lower cell layers). At present, our ability to sample single cells or a small number of
uniform cells is restricted. Also, it is difficult or impossible
110
Genome studies and molecular genetics
Table 2
US Department of Agriculture funded EST or microarrays projects active in the year 2000.
Project
Principal investigator/Institution
Resources to be developed
A public soybean EST project
R Shoemaker/ARS, Ames, Iowa
250,000—300,000 EST Soybean
sequences
Identification of genes involved in
plant response to water deficit
J Mullet/Texas A&M University
Specialized sorghum cDNA
microarrays
Genetic engineering of oilseed crops
J Ohlrogge/Michigan State University
Specialized Arabidopsis cDNA
microarrays
The effects of external stimuli on plant responses
at the molecular and cellular levels
E Davies/North Carolina State University
Arabidopsis microarrays
Universal chip for genotyping single
nucleotide polymorphisms
B Lemieux/University of Delaware
Microarray for mapping
SNPs in Brassica oleracea
Genomic analysis of seed quality traits in corn
B Lemieux/University of Delaware
Specialized maize cDNA microarrays
Regulation of metabolism in developing
seeds of Arabidopsis
C Benning/Michigan State University
Arabidopsis cDNA microarrays
Molecular biology of plant development relating
to the needs of Michigan horticulture
S van Nocker/Michigan State University
Sour cherry cDNA microarrays
The effect of environment on grain development
productivity and quality in wheat
W Hurkman/USDA, Albany, California
Wheat cDNA microarrays
Nodulation in V. unguiculata with Rhizobium
or Bradyrhizobium after treatment with biosolids
C Cousin/University of the District of Columbia
Nodule ESTs for Vigna unguiculata
Novel methods for sequence analysis of the
maize genome
W McCombie/Cold Spring Harbor Laboratory
ESTs for maize
ARS, Agricultural Research Service; SNP, single nucleotide polymorphism; USDA, US Department of Agriculture.
to obtain highly synchronized populations of Arabidopsis
cells. These limitations should be considered when interpreting microarray expression data.
As with any experiment, experimental design and sample
replication should be considered when designing microarray experiments. One of the strengths of the microarray
technique is that relatively minor changes in gene expression can be measured. However, the resolving power of
each experiment will be determined by the biological variation in the plant samples and the technical variation
associated with the microarray technology (see Probe
preparation and hybridization).
In designing an experiment, researchers should consider
whether they wish to use the microarrays for ‘marker discovery’ or whether they wish to gain genome-scale knowledge of
a biological process in ‘biology discovery’ experiments. In
the first case, a limited number of genes are identified as of
interest from a microarray experiment and then used in subsequent experiments to characterize a specific treatment or
developmental stage. Such an experiment might consist of
several replicates of two samples harvested at one time point
under a single condition. A limited number of microarrays are
required and the microarray data serves as the starting point
for a project rather than the endpoint. In the second case, in
which a detailed profile of gene expression is desired, a large
number of samples are generated during the course of
analyzing a variety of time points, treatment levels, developmental stages or genetic lines. Such experiments are
extremely rich in information, although costly to perform.
Probe preparation and hybridization
The design of experiments and collection of sample materials is only the first step in microarray experimentation.
There is a whole range of technical issues involved with
microarray hybridization and the collection of hybridization signals [7••]. Only a few of these points are specific to
plant EST microarrays, and thus we will address them only
briefly here.
Typically, two samples (e.g. from treated and control
plants) are labeled with two distinct fluorochromes, mixed
and hybridized simultaneously to the DNA-fragmentbased microarrays. Thus, DNA-fragment microarrays are
best suited to side-by-side comparisons of the two samples
hybridized to the same slide (Type I experiments [7••] or
‘marker discovery’ experiments). The data values are often
expressed as a ratio of the signals collected for the two
samples. Comparisons of multiple samples can be accomplished by using one sample as a common reference
(Type II experiments [7••] or ‘biology discovery’ experiments). Thus, for each given gene, the ratio of the
fluorescent signals derived from the two samples provides
a relative measure of gene expression (i.e. whether mRNA
accumulation is higher, lower or unchanged in one sample
relative to the second sample).
The original methods for preparing a probe required >1 µg
of poly(A)+ RNA [7••]; however, it can be difficult to
obtain this much mRNA from specialized tissues or cell
types. In the future, methods of probe preparation that are
based on total RNA or adapted for use with very small
Chasing the dream: plant EST microarrays Richmond and Somerville
111
Table 3
Microarray data analysis software.
Name
Type
Reference/URL
QuantArray
Data visualization
GSI Lumonics
http://www.genscan.com/lifescience_frame/software.htm
Stingray
Data visualization, clustering
Molecular Applications Group
http://www.mag.com
Array Vision
Data extraction, visualization
Imaging Research, Inc.
http://www.imagingresearch.com
Spotfire
Data visualization, clustering
http://www.spotfire.com
GeneCluster
Clustering using self-organizing maps
http://www.genome.wi.mit.edu/MPR/GeneCluster/GeneCluster.html [34]
Clustering Tools
Clustering
Pangea Systems, Inc.
http://www.pangeasystems.com
Cluster
Hierarchical clustering
http://rana.stanford.edu/clustering/ [30]
GeneSight
Data visualization; hierarchical and
neural network clustering
Biodiscovery
http://www.biodiscovery.com/genesight.html
GeneSpring
Data visualization; clustering; database links
Silicon Genetics
http://www.sigenetics.com/GeneSpring/
GenEx
Public web database; data storage;
data visualization
National Center for Genome Resources; Silicon Genetics
http://www.ncgr.org/research/genex/; http://genex.sigenetics.com/
arraySCOUT
Data visualization; clustering, database links
Lion Biosciences
http://www.lionbioscience.com
ArrayDB
LIMS
National Human Genome Research Institute
http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/
LifeTools
LIMS
Incyte
http://www.incyte.com/products/lifetools/lifetools.html
GeneChip Information
Systems
LIMS
Affymetrix
http://www.affymetrix.com/products/bioinformatics.html
Microarray Database
LIMS
Stanford University
http://genome-www4.stanford.edu:8100/
amounts of mRNA, and that use more efficient methods of
label incorporation, will extend the utility of microarray
analysis to difficult samples.
One of the chief technical specifications of microarrays is
sensitivity. Sensitivity, determined by the minimal signal
that can be reliably detected above background, is dependent in part on background levels and on the specific
incorporation of label into the probes. Claims have been
made that EST microarrays can detect 1 part in 100,000
[20•] to 500,000 [21]. Incyte Pharmaceuticals indicates that
with their GEM microarrays [22•], 2 pg out of 600 ng of
poly(A) RNA (1:300,000) can be detected [23•]. These levels are generally considered sufficient to detect a mRNA
present at a few copies per cell [20•].
Another measure of sensitivity is the ability to reliably
determine minor changes in mRNA levels between two
samples. Incyte suggests the technical variation associated
with their GEM arrays, as measured by the coefficient of
variation, is 15% [23•]. Our own experiments, using replicate 2500-element Arabidopsis arrays and experimental
samples, show an average coefficient of variation that is
closer to 37%. Incyte GEM arrays can be used to determine
a differential expression greater than 1.75-fold. In our laboratory, we have established standards for minimal
expression levels of more than three standard deviations
above the background of the slide and for differential
expression of more than three-fold when analyzing three
independent replicates.
Data collection
Microarray hybridization signals are collected by specialized scanners, which are essentially computer-controlled
inverted scanning fluorescent microscopes, with most models providing confocal imaging. The output of these readers
is typically a 16-bit TIFF image file, limiting the dynamic
range of the data to less than five orders of magnitude.
The adoption of charge-coupled device (CCD)-based
detectors [24] may increase the range of signal detection as
improved methods and slide coatings [25] reduce background noise. The TIFF files are processed with image
analysis software (often supplied with the microarray reader) to produce a measure of the hybridization signal for
each DNA sample on the array. The ideal image analysis
software package would be able to detect automatically
each valid spot on the array, flag bad spots, subtract the
112
Genome studies and molecular genetics
Figure 1
Small number of changes
Medium number of changes
Large number of changes
6
5
Channel 2 (log10)
4
3
2
1
0
0
1
2
3
4
5
6 0
2
3
4
5
6 0
1
2
3
4
5
6
Channel 1 (log10)
100
Channel 1/Channel 2 ratio
1
House 200
House 20
Mean
Student
10
1
0.1
0
500
1000
1500
2000 2500 0
500
1000
1500
2000 2500 0
500
1000
1500
2000
2500
Clones
Current Opinion in Plant Biology
Comparison of normalization methods. Upper panels, plot of
normalized signal intensities (average of three replicates) from three
microarray experiments, demonstrating the range of differential
expression that can be observed. Signals were normalized using a
Student normalization and readjusted to original values using an overall
mean value. Dashed lines indicate two-fold and three-fold differential
expression cutoffs. Lower panels, channel 1/channel 2 ratio for
experiments depicted in upper panels. Data was normalized using
each of four normalization methods (Student, mean fluorescence, or
housekeeping) before calculation of the ratio value. For housekeeping
normalizations, a subset of 20 or 200 clones were chosen as
expression controls from an initial data set of 2462 clones, based on
the data from 114 hybridizations. Only genes that had medium to high
expression were used, and the average expression ratio of these genes
did not deviate >10% from 1.0 across all hybridizations.
local background, and output a mean signal-intensity
above background for each spot [26,27].
can be used to find differentially expressed genes. Our laboratory uses a collection of custom Perl scripts [28] to
process and analyze microarray data (T Richmond, unpublished data). Companies that provide microarrays or
microarray services often provide the necessary software
and information to examine the experimental data
(Table 3). For large-scale microarray projects, laboratory
information management systems (LIMS) are essential.
There are a number of microarray projects developing
database tools for storing, processing and retrieving
microarray data [27,29–33]. Although often freely available, the software tools being developed by these groups
are not amenable to inexperienced researchers, requiring
computer skills for their installation and maintenance that
are beyond those of the typical biologist.
Data analysis
The real heart of microarray experimentation and, ironically, the most underdeveloped part, is data analysis. When an
experiment is finished, the researcher is left with a series
of signal intensities from thousands of clones. The true
potential of microarrays as tools for gaining a global, highly integrated view of plant biology cannot be realized
without large-scale relational databases to house data from
multiple experiments and sophisticated data mining and
pattern-recognition software tools for extracting multidimensional information from microarray data sets [26].
For simple comparisons, as is typical for marker-discovery
experiments, there are a variety of data analysis methods.
For small data sets, spreadsheets and statistical software
Regardless of the software program used, the first step in
data analysis involves the normalization of data. In order to
Chasing the dream: plant EST microarrays Richmond and Somerville
compare separate fluorescent channels, or the results from
separate experiments, the raw data must be transformed
into normalized values. Several different methods of data
normalization can be used. One method identifies a standard set of ‘housekeeping’ genes with expression levels
that remain relatively constant under a variety of different
conditions and uses their aggregate expression level to
adjust the fluorescent signal of each target to a normalized
value. The challenge is to identify a significant number
(20–200) of medium to highly expressed genes with constant expression over a wide range of experimental
conditions for each plant species. This requires a relatively large initial set of data in order to choose such genes.
Another method of normalization, which is not dependent
on a predetermined set of housekeeping genes, makes the
assumption that under the conditions being tested, most
genes will not change in expression, and uses the mean,
mode or median fluorescence for each channel as a means
of normalization. Although this may work in most cases,
there are instances where this assumption can break down.
For specialized cDNA microarrays with a limited number
of genes, many of the genes may change in expression and
thus the addition of a large number of controls may be necessary. Figure 1 gives an example of the data collected
from three different experiments, showing the range of differential expression that can be seen.
Another simple method of normalization is to convert the
fluorescence of each target spot into a normalized value
that is based on the standard deviation (Student normalization) or the mean and standard deviation (Z-score
normalization) of the fluorescence of each channel as a
whole. (For Student normalization, S = χ/σ; for Z-score
normalization, Z = (χ–µ)/σ, where χ = sample value,
µ = mean of the channel data, and σ = standard deviation
of the channel data.) We carried out a series of experiments
in triplicate to compare the methods of normalization
described above. Figure 1 shows the comparison of methods for three experiments, demonstrating that there is
little difference among the normalization methods. The
chief differences appear at the cutoff thresholds for differential expression (typically three-fold); small changes in
the threshold (0.1–0.5-fold) can result in large changes in
the number of differentially expressed genes. We have
found the Student normalization to be the simplest to use
and typically use it for our experiments.
Side-by-side comparisons quickly inspire larger experiments that are based on time-courses, dose-response series
or multiple treatments. For examining multiple data sets,
cluster analysis is the most common method [30,32,34,35].
Genes can be grouped together according to their similarity in pattern of gene expression using a variety of different
computer algorithms. From such analysis, the patterns of
expression of thousands of genes across a wide number of
conditions can be reduced to a smaller number of distinct
patterns. Depending of the resolving power of the data set
113
and software programs, the number of clusters and their
associated patterns may reflect the number of distinct
responses associated with a treatment or developmental
sequence — responses that may not always be reflected in
cytological or physiological parameters that can be readily
measured. As an example of such methods, cluster analysis
of the expression data for the cell cycle in yeast revealed
the expected stages of G0, G1, G2, S and M in this wellcharacterized experimental system [36].
Clustering is also a powerful tool for predicting the function of previously uncharacterized genes. The utility of
this approach has been clearly demonstrated in a number
of systems [10•,36,37]. Iyer et al. [10•] identified over
200 previously unknown genes whose expression was regulated during the response of fibroblasts to serum. The
coordinate expression of uncharacterized genes with genes
of known function provides clues to the function of the
unknown genes (the ‘guilt-by-association’ rule for genefunction discovery [20•]). Plant biologists will be able to
conduct similar experiments, classifying genes on the basis
of their coordinated expression in response to environmental stresses or developmental stages. For plants with
fully sequenced genomes, another outcome of such cluster
analysis is the discovery of candidate cis-acting regulatory
elements from the identification of sequence elements
shared in common among all genes of a cluster [36,38].
Such discoveries are likely to accelerate future discoveries
of transcription factors and other DNA-binding proteins.
Microarray databases and information
management
The uniform goal of microarray data analysis software is to
provide an easy way for the researcher to identify potentially interesting candidate genes and to provide
convenient, detailed information about these genes. What
readily becomes apparent when studying thousands of
genes simultaneously is that one has personal knowledge
of only a small fraction of genes. Information that is readily accessible at the point of data analysis is therefore an
extremely valuable aid in interpreting results from
microarray experiments. Towards this goal, microarray
databases should ideally contain links to a multitude of
external information sources [39•], including citation databases [40], sequence databases [41], metabolic pathway
databases [42,43] and protein domain databases [44].
Where possible, microarray databases should also contain
links to information regarding expression data collected
from other sources, map position, predicted protein motifs,
functional classifications, related genes in both plants and
other organisms, known alleles and mutations, interacting
protein partners and availability of DNA stocks, seed
stocks and so on. For most plant species, including
Arabidopsis, such information is now being collected in a
systematic fashion.
The eventual goal is to have a unified and comprehensive
information resource, which will link microarray expression
114
Genome studies and molecular genetics
data to all available gene-related data. There are several
groups working on establishing public repositories for
microarray expression data [45,46]. The recently funded
TAIR (The Arabidopsis Information Resource) project [47],
with its aim of providing a link from Arabidopsis to all other
plants and vice versa (S Rhee, C Somerville, personal communication), is the first step in this direction for plant biologists.
One innovation is to use microarrays in ‘reverse Southern’
experiments to monitor changes in gene-copy number
[51]. The application of these methods can potentially be
very helpful in the analysis of the various kinds of chromosomal mutants that exist in plants. In addition, such an
analysis could be extended to population genetic studies of
gene evolution through duplication and deletions.
Data verification
Further in the future, a method of generating ‘virtual
masks’ using optical devices [52•,53•] and inkjet-based
methods (Packard Instruments Company, Meriden,
Connecticut) have been proposed as alternative methods to
the established photolithography-based method for producing oligo-based microarrays or the spotting method for
DNA-fragment microarrays. These technologies may lead
to reductions in production costs and make the technology
more accessible to all plant biologists. There are various
other technical innovations being introduced that will
enhance microarray experimentation, including multi-laser
scanners that permit the use of more than two fluorochromes at a time (GSI Lumonics, Watertown,
Massachusetts; Virtek, Woburn, Massachusetts), and new
dyes with improved properties (Molecular Probes, Eugene,
Oregon; New England Nuclear, Boston, Massachusetts).
Once candidate gene(s) of interest have been identified,
researchers must decide how to proceed. As mentioned
above, the gene identities of key genes should be verified
by sequencing the spotted EST. In addition, many reviewers will not yet accept microarray data on its own, requiring
secondary confirmation of differential expression.
There are a number of standard and newly emerging techniques that can be used to obtain secondary confirmation:
RNA gel blots, quantitative or competitive PCR, and
quantitative ribonuclease protection assays. Each has its
own advantages, depending on the tissue and/or conditions
being evaluated. RNA gel blots are simple and straightforward if sufficient amounts of tissue are available.
Quantitative PCR, especially using real-time instruments
such as the LightCycler [48] or the TaqMan [49] systems,
is the method of choice if source material is limiting.
Difficulties with primer design, genomic DNA contamination, and choice of internal controls, however, can make
setting up these systems tedious, especially if the expression pattern of a large number of genes must be evaluated.
Future prospects
For Arabidopsis and plants of significant commercial interest (e.g. maize, soybean, rice), commercial vendors will
probably produce microarrays for sale. There are many
advantages to commercializing the production of microarrays. As long as the product is commercially viable, the
companies can invest significantly more resources in producing large numbers of high-quality microarrays than can
an academic laboratory. Ideally, all but the most specialized
microarrays will be produced commercially in the future.
The future of Arabidopsis microarrays is likely to diverge
from those to be developed for other plant species. With
the completion of the genome sequence, full genome
microarrays can be constructed. In addition, the availability of high-quality, error-free genomic sequence will
permit the construction of oligo-based microarrays. The
National Science Foundation has recently funded a group
headed by J Ecker to produce an Arabidopsis gene expression oligo-array in collaboration with Affymetrix
(Sunnyvale, California; Table 1). In addition to geneexpression studies, oligo-based microarrays have been
designed for use in scoring single nucleotide polymorphisms molecular markers in population surveys and in
genetic crosses [50]. M Mindrinos in R Davis’ group
(Stanford University) is currently developing such a ‘mapping chip’ for Arabidopsis.
Conclusions
DNA-fragment-based microarrays offer the promise of dramatically enhancing our understanding of all aspects of
plant biology. Microarrays are proving to be very valuable
for the initial functional characterization of unknown genes
as well as enabling detailed analyses of already well-known
processes. This technology will greatly facilitate the analysis of mutants and other genetic resources that are
available for plants. The full potential of this technology
depends on the establishment of public databases to house
microarray expression data so that a maximum number of
researchers can bring their expertise and intuition to bear
on the interpretation of the expression data.
Acknowledgements
Supported in part by the US Department of Agriculture, the US
Department of Energy and the Carnegie Institution of Washington. We also
thank Robin Shealy (University of Illinois at Urbana-Champagne) for his
helpful advice about data normalization. We thank Gert de Boer, Greg
Galloway, Wolf Scheible and John Vogel for their comments and suggestions
on the manuscript. TAR thanks C Somerville for his support of his research.
Publication # 1427 of the Carnegie Institution of Washington.
References and recommended reading
Papers of particular interest, published within the annual period of review,
have been highlighted as:
• of special interest
•• of outstanding interest
1.
Baldwin D, Crane V, Rice D: A comparison of gel-based, nylon filter
and microarray techniques to detect differential RNA expression
in plants. Curr Opin Plant Biol 1999, 2:96-103.
2. Brown PO, Botstein D: Exploring the new world of the genome
•
with DNA microarrays. Nat Genet 1999, 21:33-37.
A thoughtful commentary on the impact of DNA microarrays.
3.
Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM: Expression
profiling using cDNA microarrays. Nat Genet 1999, 21:10-14.
Chasing the dream: plant EST microarrays Richmond and Somerville
4.
Granjeaud S, Bertucci F, Jordan BR: Expression profiling: DNA
arrays in many guises. Bioessays 1999, 21:781-790.
5.
Lemieux B, Aharoni A, Schena M: DNA chip technology. Mol Breed
1998, 4:277-289.
6.
Lipshutz RJ, Fodor SPA, Gingeras TR, Lockhart DJ: High density
synthetic oligonucleotide arrays. Nat Genet 1999, 21:20-24.
7.
Eisen MB, Brown PO: DNA arrays for analysis of gene expression.
•• Methods Enzymol 1999, 303:179-205.
This paper provides a thorough description of the DNA-fragment-based
microarray technology.
8.
Kehoe D, Villand P, Somerville S: DNA microarrays for studies of
higher plants and other photosynthetic organisms. Trends Plant
Sci 1999, 4:38-41.
9. Schena M (ed): DNA Microarrays: A Practical Approach. New York:
•• Oxford University Press; 1999.
A collection of chapters covering all aspects of DNA microarray technology,
including a number of useful protocols. Includes a broad overview of the field
and detailed experimental advice.
10. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM,
•
Staudt LM, Hudson J Jnr, Boguski MS et al.: The transcriptional
program in the response of human fibroblasts to serum. Science
1999, 283:83-87
Demonstrates the power of cluster analysis for identifying the function of
unknown genes by clustering with genes of known identity and function.
11. Newman T, de Bruijn FJ, Green P, Keegstra K, Kende H, McIntosh L,
Ohlrogge J, Raikhel N, Somerville S, Thomashaw M et al.: Genes
galore: a summary of methods for accessing results from largescale partial sequencing of anonymous Arabidopsis cDNA clones.
Plant Physiol 1994, 106:1241-1255.
12. Lin, X, Kaul, S, Rounsley, S, Shea, TP, Benito, M-I, Town, CD et al.:
Sequence and analysis of chromosome 2 of Arabidopsis thaliana.
Nature 1999, 402:761-768.
13. Mayer K, Schüller C, Wambutt R, Murphy G, Volckaert G, Pohl T,
Dusterhoft A, Stiekema W, Entian KD, Terryn N et al.: Sequence and
analysis of chromosome 4 of the plant Arabidopsis thaliana.
Nature 1999, 402:769-777.
14. The Arabidopsis Information Resource (TAIR): Arabidopsis Genome
Initiative. URL http://www.arabidopsis.org/agi.html
15. The Institute for Genomic Research (TIGR): Arabidopsis Genome
Annotation Database (AGAD). URL http://www.tigr.org/tdb/at/agad
16. Munich Information Center for Protein Sequences (MIPS):
Arabidopsis thaliana database (MATDB). URL
http://websvr.mips.biochem.mpg.de/proj/thal/
17.
The Institute for Genomic Research (TIGR): TIGR Tomato Gene
Index. URL http://www.tigr.org/tdb/lgi/index.html
18. Soybean EST Project: The Soybean EST Project Home Page.
URL http://macgrant.agron.iastate.edu/soybeanest.html
19. Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR: Genome-wide
•
expression profiling in Escherichia coli K-12. Nucleic Acids Res
1999, 27:3821-3835.
A detailed analysis comparing nylon-based arrays using 32P and glass slides
using fluorescent dyes. The authors show that radioactive- and fluorescentbased methods produce nearly equivalent results.
20. Ruan Y, Gilmore J, Conner T: Towards Arabidopsis genome
•
analysis: monitoring expression profiles of 1400 genes using
cDNA microarrays. Plant J 1998 15:821-833.
This is the first article that provides an example of the use of large DNA
microarrays in plant biology.
21. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel
human genome analysis: microarray-based expression
monitoring of 1 000 genes. Proc Natl Acad Sci USA
1996, 93:10614-10619.
22. Incyte Pharmaceuticals: Gene Expression Microarrays.
•
URL http://gem.incyte.com/gem/index.shtml
Incyte produces the first commercially available plant microarray, an
Arabidopsis GEM array containing 7 942 sequence verified elements representing 6 068 genes/clusters.
23. Incyte Pharmaceuticals: Incyte GEM Array Reproducibility Study.
•
URL http://gem.incyte.com/gem/GEM-reproducibility.pdf
This study, albeit unrefereed, is one of the few that examines the sensitivity
and reproducibility of DNA microarrays.
115
24. Che D, Bao P, Li R, Daywalt M, Ruffalo T, Shi J, Prokhorova A,
Lublinsky A, Lermer N, Müller UR: A novel surface, attachment
chemistry and CCD-based multicolor imaging system for analysis
of genomic DNA arrays. Proc Scanning 1999, 21:63-64.
25. Beier M, Hoheisel JD: Versatile derivatisation of solid support
media for covalent bonding on DNA-microchips. Nucleic Acids Res
1999, 27:1970-1977.
26. Bassett DE, Eisen MB, Boguski MS: Gene expression informatics:
it’s all in your mine. Nat Genet 1999, 21:51-55.
27.
Ermolaeva O, Rastogi M, Pruitt KD, Schuler GD, Bittner ML, Chen Y,
Simon R, Meltzer P, Trent JM, Boguski MS: Data management for
gene expression arrays. Nat Genet 1998, 20:19-23.
28. Perl: Perl Mongers: The Perl Advocacy People.
URL http://www.perl.org
29. Bittner M, Meltzer P, Trent J: Data analysis and integration: of steps
and arrows. Nat Genet 1999, 22:213-215.
30. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and
display of genome-wide expression patterns. Proc Natl Acad Sci
USA 1998, 95:14863-14868.
31. Imbert MC, Nguyen VK, Granjeaud S, Nguyen C, Jordan BR:
LABNOTE, a laboratory notebook system designed for academic
genomics groups. Nucleic Acids Res 1999, 27:601-607.
32. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E,
Lander ES, Golub TR: Interpreting patterns of gene expression
with self-organizing maps: methods and application to
hematopoietic differentiation. Proc Natl Acad Sci USA
1999 96:2907-2912.
33. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM:
Systematic determination of genetic network architecture. Nat
Genet 1999, 22:281-285.
34. Toronen P, Kolehmainen M, Wong G, Castren E: Analysis of gene
expression data using self-organizing maps. FEBS Lett
1999, 451:142-146.
35. Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S et al.: Largescale temporal gene expression mapping of central nervous
system development. Proc Natl Acad Sci USA 1998, 95:334-339.
36. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB,
Brown PO, Botstein D, Futcher B: Comprehensive identification of
cell cycle-regulated genes of the yeast Saccharomyces cerevisiae
by microarray hybridization. Mol Biol Cell 1998, 9:3273-3297.
37.
DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic
control of gene expression on a genomic scale. Science
1997, 278:680-686.
38. Zhang MQ: Promoter analysis of co-regulated genes in the yeast
genome. Comput Chem 1999, 23:233-250.
39. Burks C: Molecular biology database list. Nucleic Acids Res
•
1999, 27:1-9.
A comprehensive list of brief descriptions and pointers to internet sites for
various molecular biology databases.
40. National Library of Medicine: PubMed. URL http://www.ncbi.nlm.nih.gov/
PubMed/
41. National Center for Biotechnology Information: GenBank.
URL http://www.ncbi.nlm.nih.gov/
42. Japanese Human Genome Project: Kyoto Encyclopedia of Genes
and Genomes (KEGG). URL http://www.genome.ad.jp/kegg/kegg2.html
43. Kanehisa M: A database for post-genome analysis. Trends Genet
1997, 13:375-376.
44. Pfam Consortium: The Pfam database of protein domains and
HMMs. URL http://pfam.wustl.edu/
45. European Bioinformatics Institute: The ArrayExpress Database.
URL http://www.ebi.ac.uk/arrayexpress/
46. National Center for Genome Resources: GeneX: a Collaborative
Internet Database and Toolset for Gene Expression Data.
URL http://www.ncgr.org/research/genex/
47.
Carnegie Institution of Washington and National Center for Genome
Resources: The Arabidopsis Information Resource (TAIR).
URL http://www.arabidopsis.org
116
Genome studies and molecular genetics
48. Roche Molecular Biochemicals: Molecular Biochemicals.
URL http://biochem.roche.com/
49. PE Biosystems: Applied Biosystems. URL http://www.pebio.com/ab/
50. Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G,
Perkins N, Winchester E, Spencer J et al.: Large-scale identification,
mapping, and genotyping of single-nucleotide polymorphisms in the
human genome. Science 1998, 280:1077-1082.
51. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A,
Williams CF, Jeffrey SS, Botsein D, Brown PO: Genome-wide
analysis of DNA copy-number changes using cDNA microarrays.
Nat Genet 1999, 23:41-46.
52. Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F, Sussman M,
•
Cerrina F: Maskless fabrication of light-directed oligonucleotide
microarrays using a digital micromirror array. Nat Biotechnol 1999,
17:974-978.
Both this paper and [53•] outline an innovative use of new technology that
may revolutionize oligonucleotide array construction.
53. Gardner S: Digital Optical Chemistry System. URL
•
http://pompous.swmed.edu/expbio/doc/
This website, like [52•], describes a system that uses a unique UV light
projector to manufacture biological/chemical arrays using UV photochemistry. This new technology may revolutionize oligonucleotide array
construction.