Analytical Method Evaluation and Discove
* S Supporting Information
ABSTRACT: Profiling techniques such as microarrays, proteomics, and metabolomics are used widely to assess the overall effects of genetic background, environmental stimuli, growth stage, or transgene expression in plants. To assess the potential regulatory use of these techniques in agricultural biotechnology, we carried out microarray and metabolomic studies of 3 different tissues from 11 conventional maize varieties. We measured technical variations for both microarrays and metabolomics, compared results from individual plants and corresponding pooled samples, and documented variations detected among different varieties with individual plants or pooled samples. Both microarray and metabolomic technologies are reproducible and can be used to detect plant-to-plant and variety-to-variety differences. A pooling strategy lowered sample variations for both microarray and metabolomics while capturing variety-to-variety variation. However, unknown genomic sequences differing between maize varieties might hinder the application of microarrays. High-throughput metabolomics could be useful as a tool for the characterization of transgenic crops. However, researchers will have to take into consideration the impact on the detection and quantitation of a wide range of metabolites on experimental design as well as validation and interpretation of results.
KEYWORDS: metabolomics, Zea mays, maize, microarray
Global demand for food is increasing rapidly, a trend that is ■ equivalence and applied to the assessment of the safety of GM expected to continue for many years. This trend coincides with
INTRODUCTION tional regulatory agencies as part of the concept of substantial
crops. 9,14,15
the growth of the world’s population, the limited availability of The development of “-omics” profiling offers powerful high- arable land and irrigation water, and global environmental
throughput tools for biomedical and agricultural studies. changes. 1−3 In addition to traditional plant breeding,
Because nontargeted profiling technologies can screen many biotechnology has become a main focus in the effort to meet
components simultaneously, they have the potential to provide the global food demand. The main crops targeted for genetic
insight into complicated metabolic pathways and their engineering include maize, soy, cotton, oilseed, canola/
interconnections. Such technologies therefore could represent rapeseed, rice, potato, staple cereal plants, and vegetables. 2 valuable analytical approaches for the assessment of substantial 10,15−17
The challenges in the use of presented technical, regulatory, and social challenges. 4,5 De-
The introduction of genetically modified (GM) crops has
equivalence for GM plants.
these methods are due to the complexity of the data sets and tailed studies are required to demonstrate that food and feed
the use of different technological platforms and software that produced from agricultural products developed through
might generate artifacts, biases, and nonuniform data biotechnology are as safe as conventional counterparts, not
representations. 18
posing risks to the environment or human health. 6−8 In the Although nontargeted surveys of the overall transcriptome, early 2000s, the concept of substantial equivalence emerged for
proteome, or metabolome of a plant at one snapshot in time testing the equivalence of GM and corresponding conventional
and tissue are gaining attention, 19,20 these technologies are not crops. 5,9 The introduction of a single gene of interest should
yet validated within the regulatory framework and therefore not preferably affect only the desired trait. The biochemical
at present officially recommended for safety evaluations of GM composition of the crop should otherwise be comparable to
plants. A major challenge is to determine whether any detected that of a parental strain or a variety similar to the parental
differences are due to genetic manipulation through bio- line. 10 Therefore, compositional analysis covering key nutrients and antinutrients is recommended by the Organization for
Received: August 5, 2013
Economic Cooperation and Development (OECD). This
Revised:
February 23, 2014
targeted approach, focusing on the majority of the composi-
Accepted: February 24, 2014
tional components, 11−14 has been widely accepted by interna-
Published: February 24, 2014
© 2014 American Chemical Society
dx.doi.org/10.1021/jf405652j | J. Agric. Food Chem. 2014, 62, 2997−3009 dx.doi.org/10.1021/jf405652j | J. Agric. Food Chem. 2014, 62, 2997−3009
the laboratory on dry ice, and stored at −80 °C before processing for environments, or even stochastic differences between plants.
microarray analysis.
For this purpose it is necessary to evaluate the reproducibility For 25 DAP kernels, 10 kernels in the middle row of the ear were of these analytical methods and natural variation of the results
collected for metabolomics, and the remaining kernels were used for of applying these methods to crop species, such as maize. microarray analysis. The ears at 25 DAP were removed from the plants and placed on wet ice immediately after harvest and transported to the
Without this understanding it would be impossible to interpret laboratory on wet ice. Immature kernels were removed from the cobs, the -omics data and declare equivalence. Therefore, the
frozen in the liquid nitrogen, and stored at −80 °C before processing International Life Sciences Institute (ILSI) recommended
for microarray and metabolomics analyses. establishing baseline ranges for natural variations and validating
Mature kernels at R6 growth stage (about 60 DAP) were also these -omics technologies before they can be used for
collected for metabolomics analysis. The ears at R6 stage were
removed from the plants and placed on wet ice immediately after toward fulfilling this function for transcriptomic and metab-
regulatory assessment of biotech crops. 15 This paper is directed
harvest and transported to the laboratory on wet ice. Ten mature olomic methods.
kernels in the middle row of the ear were removed from the cob, Microarray analysis of transcriptomes is available for both
frozen in the liquid nitrogen, and stored at −80 °C before processing for metabolomics analyses.
model and crop plants, including Arabidopsis, maize, rice, potato, tomato, soy, pepper, barley, Brassica, and sugar cane. 21 metabolomics analysis, tissues were lyophilized before they were For microarray analysis, tissues were ground into fine powders. For Microarrays provide high-throughput, simultaneous detection
ground to fine powders. Additional pooled samples were obtained by of differences in mRNA abundance between samples for
combining equivalent amounts of ground material from three thousands of genes. Use of microarray technology for safety
individual plants.
assessment of GM crops faces some challenges. First, nucleic Microarray. Total RNA was isolated from ground frozen tissue acid probe hybridization is not able to detect genes expressed at
using the EZNA SQ RNA Isolation Kit (Omega Bio-Tek, Norcross, very low level or genes with alternate splicing forms. Second, it
GA, USA), treated with DNase-I, and used for mRNA isolation with is difficult to achieve high reproducibility for microarray an Illustra mRNA Purification Kit (GE Biosciences, Pittsburgh, PA, USA). The total RNA and mRNA samples were visualized and
experiments due to variations resulting from sample handling, quantified on a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, experiment processes, environmental impact on plants, and
CA, USA). Each mRNA sample was converted into double-stranded crop variety differences.
DNA by an in vitro transcription reaction and labeled with Cy3 Technologies for simultaneous analysis of metabolites have
22−24
fl uorescent dye using the Low RNA Input Fluorescent Linear been developed 25,26 and offer the possibility of surveying
Amplification Kit (Agilent Technologies). The cRNA product was significantly more metabolites than conventional chemical
purified with an Agencourt RNAClean Kit (Beckman Coulter, analyses in a much shorter time and with much lower cost
Indianapolis, IN, USA). Hybridizations were performed overnight per analyte. However, comparing data from different
with equal amounts of labeled cRNA to a custom 4x44K Maize Oligo laboratories remains challenging. This challenge is usually due
Microarray from Agilent Technologies according to Agilent’s One- to relative rather than absolute quantification and different Color Microarray-Based Gene Expression Analysis protocol. After hybridization, the microarray slides were washed and immediately
methodologies adopted by different groups, including equip- scanned with the a G2505C DNA microarray scanner (Agilent ment platforms and statistical analysis methods. High sample-
Technologies). The images were visually inspected for artifacts, and to-sample and experiment-to-experiment variability, even
feature intensities were extracted, filtered, and normalized with Feature within the same laboratory, and the wide concentration range
Extraction software (v 10.5.1.1) (Agilent Technologies). Quality of the same metabolite between plants add to the complexity of
control and downstream analysis were performed using data analysis the analysis. 10 We applied microarray and metabolomic
tools in Genedata Expressionist and the statistical language R. Further technologies to a randomized field study as conventionally
data analysis and bioinformatic analyses were carried out according to used in regulatory studies. To evaluate the reproducibility and
methods described in Hayes et al. 27
Metabolomics. technical variations of the microarray and metabolomic Metabolites were extracted from approximately 3 mg (dry weight) of lyophilized tissues for each sample. In a 1.1 mL
technologies, the samples were tested individually or as pools polypropylene microtube containing two 5 / 32 in. stainless steel ball of plants and RNA and metabolites were extracted and analyzed
bearings, each sample was added with 500 μL of chloroform/ by microarray and GC-MS. Overall, we evaluated the
methanol/water (2:5:2, v/v/v) solution containing a 0.015 mg ribitol reproducibility of the microarray and metabolomic technologies
internal standard. Samples were homogenized in a 2000 Geno/ to explore the capability of these methodologies in our
Grinder ball mill at setting 1650 for 1 min and then rotated at 4 °C for experimental settings to detect the natural variation of gene
30 min before being centrifuged at 1454g for 15 min at 4 °C. Aliquots expression and metabolite levels between plants and maize
(300 μL) were transferred to 1.8 mL high recovery glass autosampler varieties.
vials, evaporated to dryness in a speed vac, and redissolved in 50 μL of 20 mg mL −1 methoxyamine hydrochloride in pyridine. The vials were capped, agitated with a vortex mixer, and incubated in an orbital shaker
■ at 30 °C for 90 min to form methoxyamine derivatives. Next, 80 μL of
MATERIALS AND METHODS
Plant Tissue. Seven inbred and four non-GM commercial hybrid N-methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA) was added maize varieties were planted in a randomized plot at DuPont Stine
to each sample to form trimethylsilyl (TMS) derivatives by a Gerstel Haskell Research Center, Newark, DE, USA. Twenty-five seeds were
autosampler 30 min prior to injection to minimize sample variations sowed per row for each variety. Leaves at the V5 growth stage and
due to derivatization differences. This “just in time” derivatization immature kernels at 25 days after pollination (DAP) were collected in
eliminates variation due to differences in reaction time or temperature. the morning between 8:30 a.m. and 12 p.m. for microarray and GC-
Furthermore, the gas chromatograph inlet liner and septum were MS-based metabolomics.
replaced daily, mitigating the known influence of sample residue in the Three leaf punches avoiding midribs were collected at the middle of
inlet on trimethylsilylation completeness. 28 However, trimethylsilyla- the V5 leaf area and placed on dry ice immediately after harvest,
tion can vary with the sample matrix. 28 Thus, for molecules such as transported to the laboratory on dry ice, and stored at −80 °C before
amino acids that present multiple reaction sites leading to the processing for metabolomic analysis. The remaining leaf was collected
possibility of two or more chemical derivatives, the relative abundance
Figure 1. CVs of gene expression calculated from technical repeat microarrays: (A) percentages of genes within different CV ranges; (B) CV distributions generated by TIBCO Spotfire (x-axis, CV values in log scale; y-axis, gene counts); (C) plots of log10 (CV)s (y-axis) and against log10 (mean)s (x-axis) (curves are polynomial fittings generated by TIBCO Spotfire); (D) log10 (CV) values of inflection points calculated from curves in panel C.
of these trimethylsilylated forms can vary among the three different selected ion chromatogram, and detecting nominal mass peaks, using tissue types assayed in this study.
empirically optimized settings for each process. Data from each of the The derivatized samples were separated by gas chromatography on
three tissue types were processed separately to maximize alignment a Restek 30 m × 0.25 mm × 0.25 μm film thickness Rtx-5Sil MS
and peak peaking. The resulting three matrices consisted of intensities column with a 10 m Integra-Guard column. One microliter injections
for each m/z value−retention index combination and each sample. were made with a 1:30 split ratio using a Gerstel autosampler. An
The aligned and denoised data matrices were passed to Genedata Agilent 6890N gas chromatograph was programmed for an initial
Analyst ver. 2.1 software, where each intensity value by sample was temperature of 80 °C for 0.5 min and increased to 350 °C at a rate of
normalized for both the ribitol internal standard signal and sample dry , at which it was held for 2 min before being cooled
18 °C min −1
weight.
rapidly to 80 °C and held there for 5 min in preparation for the next Because m/z value−retention index fingerprint data are redundant, run. The injector and transfer line temperatures were 230 and 250 °C,
significant signatures were reduced to named known metabolites on respectively, and the source temperature was 200 °C. Helium was used
the basis of matching both the retention index and mass spectrum to as the carrier gas with a constant flow rate of 1 mL min maintained
those of authentic standards. Relative quantitation of each metabolite by electronic pressure control. Data acquisition was performed on a
LECO Pegasus III time-of-flight mass spectrometer with an acquisition in each sample was derived from the intensity of each metabolite’s rate of 10 spectra s −1 in the mass range of m/z 45−600. An electron
representative m/z value obtained from the Genedata Analyst output. beam of 70 eV was used to generate spectra. Detector voltage was
In a few cases, peak heights obtained from ChromaTof quantification 1750 V. An instrument autotune for mass calibration using
ion chromatograms were used instead when signals were below the perfluorotributylamine (PFTBA) was performed prior to each sample
threshold set for fingerprinting and thus not present in the Genedata sequence.
Analyst output. Metabolite detection from either source was Metabolomics Data Processing and Analysis. Raw Leco GC-
dependent on reaching a conservative limit of detection to mitigate MS .peg datafiles were converted into .netcdf (Andi) formats using
false-positive peaks that would have an undue effect on subsequent Leco ChromaTof ver. 4.13 software. Data preprocessing was
statistical analyses. Percent CV values were calculated for each performed with Genedata Refiner MS ver. 5.2.1 software. For each
metabolite across selected samples. Data matrices were reformatted .netcdf file, retention times were converted into retention indices using
and imported into the PLS_Toolbox version 7.0.1 (Eigenvector an in-house program. Preprocessing consisted of gridding chromato-
Research, Inc.), with which principal component analysis (PCA) was grams in the m/z value (80−437) and retention index dimensions,
performed on autoscaled (mean centered and each variable scaled to subtracting chemical noise, aligning the retention indices of each
unit variance) data.
Experimental Design. For both microarray and metabolomics showed log-normal distribution centered at 0.1 (Figure 1B), experiments, 11 maize varieties were used, including (1) 7 inbred lines,
indicating good reproducibility. However, the reproducibility of PHG9B (high oil), H31 (low oil), PH2WBS (high protein), PH2WBR
eight technical repeat microarrays for PHG9B V5 leaves was (low protein), PH0GP (median starch), PH14T (median starch), and
658 (low starch); and (2) 4 commodity hybrid lines, 38B85, 37Y12, little higher than other technical repeat microarrays (Figure
34A15, and 34P88. These lines were chosen as a partial representation 1A). Alternatively, the CV values were log transformed and of the range of U.S. cultivated maize diversity and include lines
plotted against the log transformed mean values. Polynomial differing in protein, oil, and starch contents. Three types of tissues, V5
curve fitting showed as expected that CV values decreased as leaf, 25 DAP immature kernel, and mature kernel, were used for
the mean intensities increased (Figure 1C). The inflection metabolomic experiments. Because the mature kernels are dormant
points calculated on the basis of the polynomial curves showed and have very limited gene expression, 29,30 only the V5 leaf and the 25
that technical repeat microarrays for PHG9B V5 leaves have DAP immature kernels were used for microarray experiments. Due to
higher background noise (Figure 1D), similar to that shown by limited tissue availability for some varieties, some microarray or
the CV distributions (Figure 1A).
We further investigated the reproducibility of microarray samples were isolated from either V5 leaves or 25 DAP immature
metabolomic experiments were not conducted. For microarray technical repeat controls, eight independent RNA
results by a linear regression model correlating data between kernels of a single plant from the high-oil variety PHG9B and the low-
any pair of microarrays within each group. Four groups were oil line H31 and used for eight different microarray hybridizations. The
analyzed this way for both V5 leaf and 25 DAP kernel samples, signal differences among these hybridizations were considered the
including the eight technical arrays for H31, the six individual technical variation of the microarray methodology. Similarly, eight
biological repeats for H31, the eight technical arrays for independent metabolite extractions were made from bulk collections
PHG9B, and the nine individual biological repeats for PHG9B. of V5 leaves or 25 DAP immature kernels of PHG9B and H31 and
The box plots represent the distributions of R square values of from bulk mature kernels of PH2WBS (high-protein line) and
all pairwise comparisons of linear regression modeling (Figure PHG9B. They were used for independent GC-MS analyses and
technical variation assessment. Multiple sample preparation and testing 2). The technical replicates had consistently higher correlations
steps were used to evaluate the reproducibility of both technical methods. Sample variations were also evaluated by comparing data from different individual plants and different pooled plants.
qRT-PCR. Genes, primers, and probes are listed in Supplementary Table 1 in the Supporting Information. Primers and probes were designed with Primer Express 3.0.1 (Applied Biosystems, Carlsbad, CA, USA) and purchased from Integrated DNA Technologies, Inc. (Coralville, IA, USA). First-strand cDNA was synthesized from the same mRNA samples used for microarray. Fifteen pooled samples
from either V5 leaf or 25 DAP kernel were chosen on the basis of sample availability. For each RT reaction, 240 ng of mRNA was used as a template in a total volume of 80 μL following the manufacturer’s instruction for the SuperScript VILO cDNA synthesis kit (Invitrogen, Carlsbad, CA, USA). All qRT-PCR primers and Taqman probes were designed using the Primer Express program (Applied Biosystems) and
Figure 2. Technical reproducibility of microarrays. Pairwise correlation tested for specificity by Blast search against the NCBI public sequence
coefficients between pairs of technical replicates (T) or between pairs database. The qPCR reactions were carried out in 384-well plates in a
of biological repeats (I). Box plots were generated with TIBCO ViiA 7 real-time-PCR machine (Applied Biosystems) using the
Spotfire. The white bar represents the median value. Edges of boxes TaqMan Gene Expression Master Mix (Applied Biosystems). The
represent values at 75 and 25% percentiles. Edges of bars represent the qPCR program was 50 °C for 2 min, 95 °C for 10 min, followed with
ranges of values with outside dots as outliers. 40 cycles of 95 °C for 15 s and 60 °C for 1 min. Each reaction contains
200 nM of each primer, 100 nM of probe, and 2 μL of the RT reaction than the biological replicates (Figure 2). We conclude that gene solution as template in a final volume of 20 μL. Every reaction was
expression variation from microarrays resulted primarily from repeated three times. The ViiA 7 software V1.2 was used to record and
maize variety differences rather than from plant-to-plant process the data. The Rn (normalized reporter) values of each reaction
differences, pooled sample-to-sample differences, or technical for every cycle were exported and used to calculate the single-well
variations, indicating that the method is sensitive enough to qPCR efficiencies using a real-time PCR Miner program. 31 detect biological variation among individual or pooled plant samples.
■ Correlation between Gene Expression of Individual
RESULTS
Microarray Reproducibility. To compare the reproduci- and Pooled Samples. Next, CV values for the microarrays bility of expression levels of the same genes on repeated
from the six varieties analyzed both as individual samples (I) microarrays, the data were analyzed using correlation statistics.
and as pooled samples (P) were calculated. For each variety, six The coefficient of variation (CV) for each set of repeats was
or nine individual plants and two or three pools of samples calculated and compared as an indication of reproducibility. 32 (three plants per pool) were analyzed. The pools were created
Mean CV values of gene transcripts for technical repeats were by combining equal amounts of RNA extracts from individual
0.25, 0.23, 0.33, and 0.23 for 25 DAP PHG9B, 25 DAP H31, plants. Overall, 73.7% (34A15_I)−93.3% (38B85_P) of the V5 leaf PHG9B, and V5 leaf H31 samples, respectively,
genes had CV values below 0.5 from V5 leaf samples, and relatively low compared to the microarray literature, 33−35
73.8% (38B85_I) − 96.8% (PHG9B_P) of genes had CV indicating good technical reproducibility. Expression of most of
values below 0.5 from 25 DAP kernel samples (Supporting genes on the microarrays had low CV values, with 82.4, 92.1,
Information Supplementary Table 2), representing very good
90.4, and 91.6% of genes from V5 leaf PHG9B, V5 leaf H31, 25 experimental reproducibility. The distributions of the CV values DAP PHG9B, and 25 DAP H31 microarrays exhibiting CV
from both 25 DAP kernel and V5 leaf samples are shown in values below 0.5 (Figure 1A). In addition, these CV values
Supplementary Figure 1 in the Supporting Information. When
Table 1. (Rows 1 and 2) Mean Gene Expression Levels Detected in Individual Plants (I) and Pooled Plants (P) for the Same a Tissue Type of a Variety Are Highly Correlated; b (Rows 3 and 4) Mean Gene Expression Detected Microarrays Are Not Correlated between V5 Leaf and 25 DAP
row sample
PHG9B H31 1 V5 leaf
34A15
38B85
PH2WBR
PH2WBS
a Values are Pearson correlation coefficients (R) comparing mean gene expression intensities between all I and all P samples for each variety−tissue combination, calculated using Excel function PEARSON. b Values are Pearson correlation coefficients (R) comparing gene expression intensities
from V5 leaf and 25 DAP for the same plant samples, calculated using Excel function PEARSON, for individual plants (I) or pooled plants (P). the overall CV distribution patterns of individual or pooled
68% of genes showed a CV value <0.5 when different maize samples were compared, microarrays for 25 DAP kernel
varieties were compared (Supporting Information Supplemen- samples showed larger CV differences, compared to V5 leaf
tary Figure 2 and Table 4), but 73.8% (38B85)−85.1% (H31) samples. Additionally, log10 (CV) versus log10 (mean) plots
of genes had a CV <0.5 when individual plants within a given were generated for all of the microarrays to reveal the
variety were compared (Supporting Information Supplemen- relationship between CV and mean intensities (data not
tary Figure 1 and Table 2). These results indicate that higher shown). The inflection point values were very similar to what
variations exist among different maize varieties compared to was shown by the CV distribution patterns (Supporting
those among individual plants of the same variety, likely due to Information Supplementary Figure 1).
the genetic differences and/or genetic and environmental The Pearson’s correlation coefficient was calculated by
interactions affecting gene expression among varieties. comparing mean gene expressions between individual samples
In addition, the variety-to-variety variation detected in 25 and pooled samples for each variety and tissue type. The R
DAP kernels is similar to that among V5 leaf tissues on the values were between 0.9743 and 0.9959 (Table 1), indicating
basis of their CV distributions (Supporting Information that the signals obtained from individual plants and pooled
Supplementary Figure 2), indicating that gene expression plants were highly correlated and similar. When samples from
variations among different maize varieties are similar between the same variety but different tissue types were compared, the
these two tissue types.
Pearson correlation R values were between 0.234 and 0.317 Confirmation of Microarray Results by qRT-PCR. To (Table 1), indicating significant gene expression differences
confirm the gene expression levels measured by the microarray between leaves and kernels, as expected. For every variety−
experiments, two groups of 18 different genes were chosen for tissue combination, the pooled samples showed a smaller mean
V5 leaf and 25 DAP kernel, respectively (Supporting CV value than the one from corresponding individual samples
Information Supplementary Table 1). The expression levels (Supporting Information Supplementary Table 3). Therefore,
of these genes are ranked across all microarrays at 80 or 50% the plant-to-plant variation detected from the same variety was
percentiles. Expression of these genes was measured by qRT- reduced by pooling three plants into a single sample, essentially
PCR reactions using the same RNA samples used for transforming plant-to-plant variation into sample-to-sample
microarrays. Due to limited sample availability and possible variation. In addition, the distribution patterns of CV values
polymorphisms among different maize varieties at primer from I and P samples were very similar (Supporting
annealing locations, we used a real-time PCR Miner program 31 Information Supplementary Figure 2), indicating that our
that has been validated by many other groups 36−43 to monitor pooling strategy was efficient in capturing the variations existing
the single-well qRT-PCR efficiency. The dapA gene was used as among maize varieties while realizing a cost savings.
a control for comparison between microarray and qRT-PCR Gene Expression Differences between Varieties. To
data. Gene expression levels from qRT-PCR reactions were evaluate gene expression variation between maize varieties,
calculated on the basis of the dapA expression and compared to mean microarray spot intensities from six or nine individual
levels detected on microarrays that were also quantified on the samples (I) and two or three pooled samples, with three
basis of the dapA expression. The ratio of expression levels for individuals per each pool (P), were determined and used for
each gene detected by these two techniques was log CV calculations comparing the six varieties that had both I and
transformed for proper comparison (Figure 3). P samples.
For V5 leaf tissue samples, two genes (pco602011 and When the CV distributions of individual samples represent-
pco603626) were not amplified from any of the 15 templates ing variety-to-variety variations (Supporting Information
by qRT-PCR and therefore not included in the analysis. Two Supplementary Figure 2 and Table 4) were compared to the
genes, pco627753 and pco643043 (genes 2 and 11 in Figure CV distributions representing plant-to-plant variation within a
3A, respectively), had expression detected only in some of the certain variety (Supporting Information Supplementary Figure
samples, and four genes, pco624384, pco521467, pco652567,
3 and Table 2), we found that the former were larger. For V5 and pco658406 (genes 13, 14, 15, and 16 in Figure 3A, leaves, 65.4% of genes showed a CV value <0.5 when different
respectively), showed higher expression (ca. 2−32-fold higher), maize varieties were compared (Supporting Information
relative to microarray, in all 15 samples. For the remaining Supplementary Figure 2 and Table 4), but 73.7% (34A15)−
genes tested, the expression levels were close to those measured 93.1% (38B85) of genes had a CV <0.5 when individual plants
by the microarrays, although there were some variety-specific within a given variety were compared (Supporting Information
expression differences between the two techniques (Figure 3A). Supplementary Figure 1 and Table 2). For 25 DAP kernels,
For 25 DAP kernel samples, one gene (pco621453) did not For 25 DAP kernel samples, one gene (pco621453) did not
Metabolomic Data Analysis. The three processed data matrices contained 3891 metabolomic signatures or fingerprints (m/z value−RI combinations) for V5 leaves, 4300 for 25 DAP immature kernels, and 3891 for mature kernels. Of these, 87− 103 metabolites were successfully identified in tissues examined. These numbers, reduced relative to the raw data set, take into account the elimination of the inherent redundancy in metabolomics signatures and ignore metabolites for which the identity could not be unambiguously established. The substantial reductions are due to (1) eliminating the inherent redundancy in metabolomics signatures wherein each metabolite can be represented by multiple m/z values in its electron impact mass spectrum and (2) ignoring metabolites for which the identity could not be unambiguously established.
Figure 3. Gene expression analysis comparing qRT-PCR to microarray To evaluate technical variations including the sampling, hybridization with V5 leaf (A) or 25 DAP (B) samples. Numbers are
analytical, and data analysis variability, eight technical repeats log 2 transformed ratios between expression levels detected by qRT-
were produced from V5 leaves of PHG9B and H31, 25 DAP PCR and microarray that were defined against dapA expression levels.
kernels from PHG9B and H31, and mature kernels from PH2WBS and PHG9B. For each tissue−variety combination, a
show any amplification from qRT-PCR and was not included single bulk tissue sample was aliquoted into eight extraction for further analysis. Two genes, pco653893 and pco598383
tubes, producing eight metabolomic samples. Mean CV values (genes 14 and 15 in Figure 3B, respectively), were amplified
calculated from technical repeat metabolite relative levels were
Figure 4. CV distribution of metabolite levels detected from V5 leaf (A), 25 DAP immature kernel (B), and mature kernels (C).
between 0.33 and 0.54, and median values were between 0.27 Tissue or Variety Separation Based on Metabolomics. and 0.46, indicating good reproducibility despite some outliers
When the metabolites detected from the three different tissues in the upper ranges (Supporting Information Supplementary
were compared, PCA clearly indicated tissue separation (Figure Figure 3). The majority of metabolites detected showed CV
5), reflecting tissue specificity of metabolic processes, as values <0.6, with 76.7 and 87.4% metabolites from V5 leaves of
expected. However, PCA revealed variety specificity for only PHG9B and H31, 76.1 and 80.2% of metabolites from 25 DAP
certain variety−tissue combinations. For example, for V5 leaf immature kernels of PHG9B and H31, and 77.6 and 60.9% of
tissues, there was clear separation of PH2WBR, PH14T, and metabolites from mature seeds of PH2WBS and PHG9B,
H31 from other varieties based on PC1 and PC3 (Figure 6A). respectively (Supporting Information Supplementary Figure 3).
Likewise, PH2WBS and PH2WBR in 25 DAP kernels were We also found that pooled samples had lower variances
readily distinguished from other varieties with PC1 and PC4 compared to the individual samples (Figure 4; Supporting
(Figure 6E). For mature kernels, PH2WBS and PHG9B Information Supplementary Table 5), similar to what was
showed good separation from other varieties based on PC2 and observed from the transcript data. Particularly, metabolomics
PC3 (Figure 6I).
for mature seed samples showed much less variation than for For the most part, the tissue and variety classifications immature seeds (Supporting Information Supplementary Table
observed with individual plants were also evident in pooled 5), probably due to a less complex metabolome and terminal
plant samples, although sometimes with different principal differentiation state of mature kernels.
component projections (Figures 5 and 6A,J). This result Mean metabolite levels detected from individual and pooled
suggests that pooling did not degrade the discriminating power samples of the same tissue−variety combination are highly
afforded by individual samples. Interestingly, the combined correlated, with Pearson correlation R values all close to 1
percent variance included in the PCA scores plots was slightly (Table 2). For PHG9B and H31, the high correlations are
higher for pooled samples compared to that generated for analogous individual samples, suggesting that pooling removed
Table 2. Mean Metabolite Levels Detected from Individual
some uninformative signal.
Plant (I) and Pooled Plants (P) Samples for the Same Tissue a Loadings associated with examples of the above variety Type of a Variety Are Highly Correlated
classifications were selected graphically (Figures 6C,D; G,H; and K,L; in purple) and listed in Supplementary Table 6 in the
variety V5 leaf
25 DAP
mature
Supporting Information. The very significant increases in the
amount of amino acids in developing kernels, including
glutamic acid, glutamine, histidine, leucine, lysine, pyroglutamic
38B85 0.9988
0.9994
acid (which could be derived from glutamine during sample
PH2WBS
0.9089
0.9952
preparation), and tryptophan, are expected for PH2WBS, a
PH2WBR 0.9985
genotype with elevated grain protein. Explanations for the
PH0GP 0.9970
0.9971
genotype-specific differences (loadings) in the other tissues are
less obvious. For the three examples shown, loadings from
PH14T 0.9994
0.9981
0.9992
pooled plants were very similar to those from individual plants.
PHG9B 0.9993
0.9933
0.9940
Thus, pooling generated similar PCA scores and loadings,
maintaining the ability to classify sample groups (varieties) as Pearson correlation coefficients (R) by Excel function PEARSON.
H31 0.9980
0.9891
well as to identify the prominent metabolites underlying said classifications.
observed despite the fact that the individual samples were from In this GC-MS metabolomic study, we also found that some nine different plants, compared to the technical variation tests
metabolites were detected in only one or two tissue types. for which metabolites were extracted from eight aliquots of the
Among individual plant samples, there were 19 metabolites same plant sample. These results also demonstrate consistent
uniquely detected in V5 leaves, 2 only in immature kernels, and derivatization, GC, MS data acquisition, and data processing.
3 only in mature kernels (Table 3). It is expected that the
Figure 5. PCA score plots from individual or pooled plants showing tissue specificity of metabolomes.
Figure 6. PCA scores and loadings plots from individual plants (A, C; E, G; I, K) or pooled plants (B, D; F, H; J, L) showing classifications of PH2WBR from the leaf metabolome (A−D), PH2WBS from the 25 DAP kernel metabolome (E−H), and PH2WBS from the mature kernel metabolome (I−L). Significant loadings are shown in purple.
Table 3. Apparent Tissue-Specific Metabolites metabolome of leaves is more divergent than that of immature
or mature kernels. Some of these metabolites are present but not detected in other tissues, given our conservative limit of
V5 leaf tyramine
polyamine
peak detection. Moreover, immature and mature kernels
tryptophan
amino acid
contain more polysaccharides by weight than leaves. Because
chlorogenic acid
phenolic acid
approximately 3 mg dry weight samples were used for all three
citramalic acid
organic acid
tissue types, it is expected that the concentration of many small
dehydroascorbic acid, secondary
vitamin
peak 1
molecule metabolites will be greater in leaf than in kernel
dehydroascorbic acid, secondary
vitamin
samples. This could result in apparent tissue specificity, as seen
peak 2
in Table 3.
dehydroascorbic acid, secondary
vitamin
Range and Variations of Metabolite Abundances. We
peak 3
observed large ranges in relative levels for many metabolites
heptadecanoic acid
fatty acid
across all varieties. The ratio between the maximum value and
itaconic acid
organic acid
the minimum value detected for a metabolite in individual
maleic acid
organic acid
samples ranged from 1.8 to 1663 for V5 leaf tissues, from 3.3 to
pyruvic acid
organic acid
salicylic acid
phenolic acid
16815 for immature kernels, and from 2.7 to 585 for mature
cis-caffeic acid
phenolic acid
kernels (Supporting Information Supplementary Table 7).
trans-caffeic acid
phenolic acid
However, when samples were pooled, the ranges narrowed to
α -tocopherol
vitamin
1.4−167 for V5 leaves, 2.1−4828 for immature kernels, and
rhamnose
sugar
1.6−86 for mature kernels (Supporting Information Supple-
trehalose
sugar
mentary Table 7). Similarly, when the mean values within each
glyceric acid-3-phosphate
phosphorylated acid
variety for each metabolite from either individual samples or
phytol
alkane alcohol
pooled samples were compared across all varieties with box
margaric acid
fatty acid
plots, the pooled samples showed much narrower distribution compared to the individual samples (data not shown). This
25 DAP myristic acid
fatty acid
observation indicated that the biological variation among
cysteine, partial derivative
amino acid
individual plants combined with variety variation was very large. However, our pooling strategy effectively decreased the
mature kernel adenosine-5-monophosphate
nucleic acid
biological variation between plants. The actual relative levels are
pipecolic acid
organic acid
specific to the current data set and should not be compared to other data sets, unless they were processed (aligned and scaled) together.
Table 4. Metabolites with Relatively Stable or Highly Variable Levels among Different Maize Varieties and Their CV Values a
mature kernel 0.31 acetic acid
V5 leaf
25 DAP immature kernel
0.29 β -sitosterol 0.30 arabinose
0.30 acetic acid
0.28 campesterol 0.13 β -sitosterol
0.39 arabinose
0.36 erythritol 0.23 campesterol
0.34 aspartic acid part. deriv
0.37 glycerol-3-phosphate 0.40 cellobiose deriv 1
0.32 benzoic acid
0.37 linoleic acid 0.38 cis-aconitic acid
0.30 ethanolamine
0.38 malic acid 0.39 ferulic acid
0.37 fructose deriv 1
0.35 myo-inositol 0.33 galactitol
0.39 galactose
0.31 palmitic acid 0.24 glycerol
0.24 glucose deriv 1
0.31 stigmasterol 0.38 glycerol-3-phosphate
0.38 glyceric acid
0.19 sucrose 0.40 heptadecanoic acid
0.36 isoleucine part. deriv
0.39 tyrosine 0.30 linoleic acid
0.34 leucine part. deriv
0.36 malic acid
0.17 myo-inositol
0.36 mannose
0.30 palmitic acid
0.23 myo-inositol
0.17 phytol
0.25 phosphoric acid 0.37 serine part. deriv
0.35 stigmasterol
0.30 succinic acid
0.14 sucrose
0.38 sucrose
0.36 trans-caffeic acid
0.31 threonine part. deriv
0.31 xylitol
0.35 tyrosine
0.28 xylose deriv 1 0.32 xylose deriv 2
1.24 asparagine 1.79 asparagine part. deriv
1.60 2-aminobutyric acid
1.10 β -alanine part. deriv
2.15 β -amyrin 1.52 aspartic acid
2.14 cis-aconitic acid
2.63 cellobiose deriv 1 2.04 benzoic acid
1.39 citric acid
1.05 cysteine part. deriv 1.21 dehydroascorbic acid
2.26 gluconic acid
1.28 dehydroascorbic acid 1.54 dehydroascorbic acid second peak 1
1.49 glutamic acid
1.03 GABA 1.71 ethanolamine
2.70 glutamine part. deriv
1.04 glucose deriv 2 1.36 glutamic acid
1.84 histidine
1.83 glucose-6-phosphate deriv 2 2.14 glutamine part. deriv
1.63 isocitric acid
1.70 glutamine part. deriv 1.19 glycine
1.25 linolenic acid
1.04 glyceric acid 1.22 glycine part. deriv
2.56 myristic acid
1.60 maltose 1.07 isoleucine
1.03 oleic acid
1.25 pipecolic acid 1.11 leucine
3.43 p-coumaric acid
1.14 pipecolic acid part. deriv 1.45 ornithine
1.57 pyroglutamic acid
1.32 pyroglutamic acid 1.06 serine
1.16 xylitol 1.60 trehalose
1.64 xylose deriv 2 1.21 tyrosine
a CV values were calculated from metabolite levels from all individual (I) and pooled (P) samples for all varieties. Only metabolites that were detected from all I and P samples of all varieties were included for the calculation. Only metabolites with CV values of <0.30 (relatively stable) and
>1.00 (highly variable) are shown. Metabolites in italic were found in all three tissues, and metabolites in bold were found in two tissues for the same category.
Multiple derivative forms for certain metabolites are leaves, 20 metabolites from 25 DAP immature kernels, and 11 characteristic of GC-MS-based metabolomics, as illustrated by
metabolites from the mature kernels that showed a CV value of asparagine in Supplementary Table 7 in the Supporting
<0.4 across all varieties (Table 4), representing tissue-specific Information. Asparagine with four TMS groups (one attached
stable metabolomes. Among them, sucrose and myo-inositol to the carboxyl and three to the amines) was found in mature
were identified from all three tissues, and another 10 kernels, whereas asparagine with just three TMS moieties (one
metabolites appeared in two tissue types. On the other hand, attached to the carboxyl and two to the amines) was specific to
there were 17 metabolites from the V5 leaves, 13 from the 25 V5 leaves. This dichotomy might be explained by differential
DAP immature kernels, and 16 from the mature kernels that trimethylsilylation due to the different sample matrices. 28 showed CV values >1, indicating that these metabolites are
Consequently, comparing metabolomes across tissue types or highly variable among different maize varieties (Table 4). A species should be undertaken with caution.
partial derivative form of glutamine seemed to be highly We also compared the levels of metabolites in all samples
variable in all three tissue types, and another three metabolites across all varieties for a given tissue type and identified
were highly variable for two tissue types. The high variability of metabolites that are quite stable as well as those that are highly
the partial derivative of glutamine may be due, at least in part, variable among varieties. There were 21 metabolites from V5
to inconsistent transformation to pyroglutamic acid, which was to inconsistent transformation to pyroglutamic acid, which was
design of multiple pools with multiple individual samples in associated with trimethylsilylation.
each pool was established as a compromise. 44−47 Thus, the As with gene expression levels detected from microarrays,
ability to detect the difference between biological subject-to- mean metabolite abundances from individual samples or pooled
subject variations and the experimental technical variations is samples were calculated for each variety and used to calculate
combined with the efficiency of the pooling strategy designed CV values among varieties. For all three tissue types, metabolite
to reduce overall variance. The larger the individual-to- variations among different varieties detected from individual or
individual variability is, as compared to technical variability,
the greater the reduction of variability is achieved by pooling are 0.90, 0.92, and 0.82 for V5 leaves, 25 DAP developing
pooled samples are well-correlated. Linear regression R 2 values
samples. 44,45
kernels, and mature kernels, respectively. Furthermore, the CV We designed the microarray and metabolomics experiments distribution patterns are very similar between individual and
to include both individual samples and sample pools. Gene pooled samples (Supporting Information Supplementary Figure
expression levels detected from microarray and metabolite 4). For V5 leaves, 43.7% of metabolites showed higher CV
abundances both showed very good correlation between values in pooled samples compared to individual samples. In 25
individual and pooled samples within the same tissue type DAP developing kernels and mature kernels, the numbers are
and variety (Tables 1 and 2). Using pooled samples lowered
61.5 and 47.6%. This observation indicated that using pooled sample-to-sample variation, resulting in lower CV values samples revealed variety-to-variety metabolomic variation
(Figure 4; Supporting Information Supplementary Figure 1 similar to that using individual samples.
and Tables 2, 3, 5, and 6). Interestingly, pooling microarray samples reduced the CV values more dramatically for 25 DAP
■ samples compared to V5 leaf samples (Supporting Information
DISCUSSION
Thorough evaluation of the applicability and limitations of the Supplementary Figure 1C,D), presumably due to the higher -omics technologies for food safety assessment is necessary
transcriptome variation among 25 DAP individual samples before their acceptance for this purpose. Toward this end, we
compared to V5 leaf individual samples. evaluated high-throughput gene expression and metabolomic
The mean CV values for each variety calculated from either technologies by characterizing the transcriptomes and metab-
individual plants or pooled samples represent variety-to-variety olomes of several conventional maize varieties using alternative
variations. The CV values representing variety-to-variety protocols. Our observations led us to conclude that in applying
variations were similar when obtained from either individual these methods to regulatory issues, consideration should be
plants or pooled samples for both microarrays and metab- given to natural variation in maize transcriptome and to the
olomics (Supporting Information Supplementary Figures 2 and high degree of variation in metabolite concentrations between
4). Furthermore, the distribution patterns of variety CV values plant varieties and individuals of the same variety.
were similar for both microarrays and metabolomics. A slight Technical Variation. To validate methods for both
increase of variation in pooled samples compared with microarray and metabolomics, selected samples were analyzed
individual samples from microarrays was detected (Supporting multiple times to serve as technical repeats. The CV
Information Supplementary Figure 2 and Table 3), presumably distribution for the technical microarrays showed small
due to fewer pooled samples.
variations between different microarray runs for the same Pooling plants prior to analysis also did not adversely affect sample (Figure 1), validating the method and technical
the ability to classify tissues or varieties or identify consistency. When compared to CVs detected from individual
discriminating metabolites by PCA (Figures 5 and 6). In fact, plant samples, the technical CVs are much smaller (Figure 1A;
pooling appeared to enhance discriminating power, presumably Supporting Information Supplementary Table 2), indicating
by eliminating some noise from the data sets. Overall, our that our microarray technology is consistent and sensitive
pooling strategy of three sample pools of three is a cost-saving enough to detect biological variations outside technical
design that does not sacrifice analytical power. variations. The data correlation analysis among repeat arrays
qRT-PCR and Microarray. The use of microarray profiling comparing individual samples and technical repeat samples
for comparative assessment of biotech crops requires a gene confirmed this conclusion (Figure 2). Technical plus biological
expression sequence database for probe design, gene CVs detected from metabolomics, however, were much larger
annotation, and expression level interpretation. For many compared to microarrays (Supporting Information Supplem-
plant species, genomes or transcriptomes have not been etary Figure 3). This increase is not unexpected because
completely sequenced except for a few model genotypes. The expression of many metabolites is dynamically affected by
maize genome has an especially high level of DNA sequence microenvironment. Furthermore, different metabolites have
polymorphisms, approximately an order of magnitude higher very different physical and biochemical properties as well as