Analytical Method Evaluation and Discove

* S Supporting Information

ABSTRACT: Profiling techniques such as microarrays, proteomics, and metabolomics are used widely to assess the overall effects of genetic background, environmental stimuli, growth stage, or transgene expression in plants. To assess the potential regulatory use of these techniques in agricultural biotechnology, we carried out microarray and metabolomic studies of 3 different tissues from 11 conventional maize varieties. We measured technical variations for both microarrays and metabolomics, compared results from individual plants and corresponding pooled samples, and documented variations detected among different varieties with individual plants or pooled samples. Both microarray and metabolomic technologies are reproducible and can be used to detect plant-to-plant and variety-to-variety differences. A pooling strategy lowered sample variations for both microarray and metabolomics while capturing variety-to-variety variation. However, unknown genomic sequences differing between maize varieties might hinder the application of microarrays. High-throughput metabolomics could be useful as a tool for the characterization of transgenic crops. However, researchers will have to take into consideration the impact on the detection and quantitation of a wide range of metabolites on experimental design as well as validation and interpretation of results.

KEYWORDS: metabolomics, Zea mays, maize, microarray

Global demand for food is increasing rapidly, a trend that is ■ equivalence and applied to the assessment of the safety of GM expected to continue for many years. This trend coincides with

INTRODUCTION tional regulatory agencies as part of the concept of substantial

crops. 9,14,15

the growth of the world’s population, the limited availability of The development of “-omics” profiling offers powerful high- arable land and irrigation water, and global environmental

throughput tools for biomedical and agricultural studies. changes. 1−3 In addition to traditional plant breeding,

Because nontargeted profiling technologies can screen many biotechnology has become a main focus in the effort to meet

components simultaneously, they have the potential to provide the global food demand. The main crops targeted for genetic

insight into complicated metabolic pathways and their engineering include maize, soy, cotton, oilseed, canola/

interconnections. Such technologies therefore could represent rapeseed, rice, potato, staple cereal plants, and vegetables. 2 valuable analytical approaches for the assessment of substantial 10,15−17

The challenges in the use of presented technical, regulatory, and social challenges. 4,5 De-

The introduction of genetically modified (GM) crops has

equivalence for GM plants.

these methods are due to the complexity of the data sets and tailed studies are required to demonstrate that food and feed

the use of different technological platforms and software that produced from agricultural products developed through

might generate artifacts, biases, and nonuniform data biotechnology are as safe as conventional counterparts, not

representations. 18

posing risks to the environment or human health. 6−8 In the Although nontargeted surveys of the overall transcriptome, early 2000s, the concept of substantial equivalence emerged for

proteome, or metabolome of a plant at one snapshot in time testing the equivalence of GM and corresponding conventional

and tissue are gaining attention, 19,20 these technologies are not crops. 5,9 The introduction of a single gene of interest should

yet validated within the regulatory framework and therefore not preferably affect only the desired trait. The biochemical

at present officially recommended for safety evaluations of GM composition of the crop should otherwise be comparable to

plants. A major challenge is to determine whether any detected that of a parental strain or a variety similar to the parental

differences are due to genetic manipulation through bio- line. 10 Therefore, compositional analysis covering key nutrients and antinutrients is recommended by the Organization for

Received: August 5, 2013

Economic Cooperation and Development (OECD). This

Revised:

February 23, 2014

targeted approach, focusing on the majority of the composi-

Accepted: February 24, 2014

tional components, 11−14 has been widely accepted by interna-

Published: February 24, 2014

© 2014 American Chemical Society

dx.doi.org/10.1021/jf405652j | J. Agric. Food Chem. 2014, 62, 2997−3009 dx.doi.org/10.1021/jf405652j | J. Agric. Food Chem. 2014, 62, 2997−3009

the laboratory on dry ice, and stored at −80 °C before processing for environments, or even stochastic differences between plants.

microarray analysis.

For this purpose it is necessary to evaluate the reproducibility For 25 DAP kernels, 10 kernels in the middle row of the ear were of these analytical methods and natural variation of the results

collected for metabolomics, and the remaining kernels were used for of applying these methods to crop species, such as maize. microarray analysis. The ears at 25 DAP were removed from the plants and placed on wet ice immediately after harvest and transported to the

Without this understanding it would be impossible to interpret laboratory on wet ice. Immature kernels were removed from the cobs, the -omics data and declare equivalence. Therefore, the

frozen in the liquid nitrogen, and stored at −80 °C before processing International Life Sciences Institute (ILSI) recommended

for microarray and metabolomics analyses. establishing baseline ranges for natural variations and validating

Mature kernels at R6 growth stage (about 60 DAP) were also these -omics technologies before they can be used for

collected for metabolomics analysis. The ears at R6 stage were

removed from the plants and placed on wet ice immediately after toward fulfilling this function for transcriptomic and metab-

regulatory assessment of biotech crops. 15 This paper is directed

harvest and transported to the laboratory on wet ice. Ten mature olomic methods.

kernels in the middle row of the ear were removed from the cob, Microarray analysis of transcriptomes is available for both

frozen in the liquid nitrogen, and stored at −80 °C before processing for metabolomics analyses.

model and crop plants, including Arabidopsis, maize, rice, potato, tomato, soy, pepper, barley, Brassica, and sugar cane. 21 metabolomics analysis, tissues were lyophilized before they were For microarray analysis, tissues were ground into fine powders. For Microarrays provide high-throughput, simultaneous detection

ground to fine powders. Additional pooled samples were obtained by of differences in mRNA abundance between samples for

combining equivalent amounts of ground material from three thousands of genes. Use of microarray technology for safety

individual plants.

assessment of GM crops faces some challenges. First, nucleic Microarray. Total RNA was isolated from ground frozen tissue acid probe hybridization is not able to detect genes expressed at

using the EZNA SQ RNA Isolation Kit (Omega Bio-Tek, Norcross, very low level or genes with alternate splicing forms. Second, it

GA, USA), treated with DNase-I, and used for mRNA isolation with is difficult to achieve high reproducibility for microarray an Illustra mRNA Purification Kit (GE Biosciences, Pittsburgh, PA, USA). The total RNA and mRNA samples were visualized and

experiments due to variations resulting from sample handling, quantified on a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, experiment processes, environmental impact on plants, and

CA, USA). Each mRNA sample was converted into double-stranded crop variety differences.

DNA by an in vitro transcription reaction and labeled with Cy3 Technologies for simultaneous analysis of metabolites have

22−24

fl uorescent dye using the Low RNA Input Fluorescent Linear been developed 25,26 and offer the possibility of surveying

Amplification Kit (Agilent Technologies). The cRNA product was significantly more metabolites than conventional chemical

purified with an Agencourt RNAClean Kit (Beckman Coulter, analyses in a much shorter time and with much lower cost

Indianapolis, IN, USA). Hybridizations were performed overnight per analyte. However, comparing data from different

with equal amounts of labeled cRNA to a custom 4x44K Maize Oligo laboratories remains challenging. This challenge is usually due

Microarray from Agilent Technologies according to Agilent’s One- to relative rather than absolute quantification and different Color Microarray-Based Gene Expression Analysis protocol. After hybridization, the microarray slides were washed and immediately

methodologies adopted by different groups, including equip- scanned with the a G2505C DNA microarray scanner (Agilent ment platforms and statistical analysis methods. High sample-

Technologies). The images were visually inspected for artifacts, and to-sample and experiment-to-experiment variability, even

feature intensities were extracted, filtered, and normalized with Feature within the same laboratory, and the wide concentration range

Extraction software (v 10.5.1.1) (Agilent Technologies). Quality of the same metabolite between plants add to the complexity of

control and downstream analysis were performed using data analysis the analysis. 10 We applied microarray and metabolomic

tools in Genedata Expressionist and the statistical language R. Further technologies to a randomized field study as conventionally

data analysis and bioinformatic analyses were carried out according to used in regulatory studies. To evaluate the reproducibility and

methods described in Hayes et al. 27

Metabolomics. technical variations of the microarray and metabolomic Metabolites were extracted from approximately 3 mg (dry weight) of lyophilized tissues for each sample. In a 1.1 mL

technologies, the samples were tested individually or as pools polypropylene microtube containing two 5 / 32 in. stainless steel ball of plants and RNA and metabolites were extracted and analyzed

bearings, each sample was added with 500 μL of chloroform/ by microarray and GC-MS. Overall, we evaluated the

methanol/water (2:5:2, v/v/v) solution containing a 0.015 mg ribitol reproducibility of the microarray and metabolomic technologies

internal standard. Samples were homogenized in a 2000 Geno/ to explore the capability of these methodologies in our

Grinder ball mill at setting 1650 for 1 min and then rotated at 4 °C for experimental settings to detect the natural variation of gene

30 min before being centrifuged at 1454g for 15 min at 4 °C. Aliquots expression and metabolite levels between plants and maize

(300 μL) were transferred to 1.8 mL high recovery glass autosampler varieties.

vials, evaporated to dryness in a speed vac, and redissolved in 50 μL of 20 mg mL −1 methoxyamine hydrochloride in pyridine. The vials were capped, agitated with a vortex mixer, and incubated in an orbital shaker

■ at 30 °C for 90 min to form methoxyamine derivatives. Next, 80 μL of

MATERIALS AND METHODS

Plant Tissue. Seven inbred and four non-GM commercial hybrid N-methyl-N-(trimethylsilyl)trifluoroacetamide (MSTFA) was added maize varieties were planted in a randomized plot at DuPont Stine

to each sample to form trimethylsilyl (TMS) derivatives by a Gerstel Haskell Research Center, Newark, DE, USA. Twenty-five seeds were

autosampler 30 min prior to injection to minimize sample variations sowed per row for each variety. Leaves at the V5 growth stage and

due to derivatization differences. This “just in time” derivatization immature kernels at 25 days after pollination (DAP) were collected in

eliminates variation due to differences in reaction time or temperature. the morning between 8:30 a.m. and 12 p.m. for microarray and GC-

Furthermore, the gas chromatograph inlet liner and septum were MS-based metabolomics.

replaced daily, mitigating the known influence of sample residue in the Three leaf punches avoiding midribs were collected at the middle of

inlet on trimethylsilylation completeness. 28 However, trimethylsilyla- the V5 leaf area and placed on dry ice immediately after harvest,

tion can vary with the sample matrix. 28 Thus, for molecules such as transported to the laboratory on dry ice, and stored at −80 °C before

amino acids that present multiple reaction sites leading to the processing for metabolomic analysis. The remaining leaf was collected

possibility of two or more chemical derivatives, the relative abundance

Figure 1. CVs of gene expression calculated from technical repeat microarrays: (A) percentages of genes within different CV ranges; (B) CV distributions generated by TIBCO Spotfire (x-axis, CV values in log scale; y-axis, gene counts); (C) plots of log10 (CV)s (y-axis) and against log10 (mean)s (x-axis) (curves are polynomial fittings generated by TIBCO Spotfire); (D) log10 (CV) values of inflection points calculated from curves in panel C.

of these trimethylsilylated forms can vary among the three different selected ion chromatogram, and detecting nominal mass peaks, using tissue types assayed in this study.

empirically optimized settings for each process. Data from each of the The derivatized samples were separated by gas chromatography on

three tissue types were processed separately to maximize alignment a Restek 30 m × 0.25 mm × 0.25 μm film thickness Rtx-5Sil MS

and peak peaking. The resulting three matrices consisted of intensities column with a 10 m Integra-Guard column. One microliter injections

for each m/z value−retention index combination and each sample. were made with a 1:30 split ratio using a Gerstel autosampler. An

The aligned and denoised data matrices were passed to Genedata Agilent 6890N gas chromatograph was programmed for an initial

Analyst ver. 2.1 software, where each intensity value by sample was temperature of 80 °C for 0.5 min and increased to 350 °C at a rate of

normalized for both the ribitol internal standard signal and sample dry , at which it was held for 2 min before being cooled

18 °C min −1

weight.

rapidly to 80 °C and held there for 5 min in preparation for the next Because m/z value−retention index fingerprint data are redundant, run. The injector and transfer line temperatures were 230 and 250 °C,

significant signatures were reduced to named known metabolites on respectively, and the source temperature was 200 °C. Helium was used

the basis of matching both the retention index and mass spectrum to as the carrier gas with a constant flow rate of 1 mL min maintained

those of authentic standards. Relative quantitation of each metabolite by electronic pressure control. Data acquisition was performed on a

LECO Pegasus III time-of-flight mass spectrometer with an acquisition in each sample was derived from the intensity of each metabolite’s rate of 10 spectra s −1 in the mass range of m/z 45−600. An electron

representative m/z value obtained from the Genedata Analyst output. beam of 70 eV was used to generate spectra. Detector voltage was

In a few cases, peak heights obtained from ChromaTof quantification 1750 V. An instrument autotune for mass calibration using

ion chromatograms were used instead when signals were below the perfluorotributylamine (PFTBA) was performed prior to each sample

threshold set for fingerprinting and thus not present in the Genedata sequence.

Analyst output. Metabolite detection from either source was Metabolomics Data Processing and Analysis. Raw Leco GC-

dependent on reaching a conservative limit of detection to mitigate MS .peg datafiles were converted into .netcdf (Andi) formats using

false-positive peaks that would have an undue effect on subsequent Leco ChromaTof ver. 4.13 software. Data preprocessing was

statistical analyses. Percent CV values were calculated for each performed with Genedata Refiner MS ver. 5.2.1 software. For each

metabolite across selected samples. Data matrices were reformatted .netcdf file, retention times were converted into retention indices using

and imported into the PLS_Toolbox version 7.0.1 (Eigenvector an in-house program. Preprocessing consisted of gridding chromato-

Research, Inc.), with which principal component analysis (PCA) was grams in the m/z value (80−437) and retention index dimensions,

performed on autoscaled (mean centered and each variable scaled to subtracting chemical noise, aligning the retention indices of each

unit variance) data.

Experimental Design. For both microarray and metabolomics showed log-normal distribution centered at 0.1 (Figure 1B), experiments, 11 maize varieties were used, including (1) 7 inbred lines,

indicating good reproducibility. However, the reproducibility of PHG9B (high oil), H31 (low oil), PH2WBS (high protein), PH2WBR

eight technical repeat microarrays for PHG9B V5 leaves was (low protein), PH0GP (median starch), PH14T (median starch), and

658 (low starch); and (2) 4 commodity hybrid lines, 38B85, 37Y12, little higher than other technical repeat microarrays (Figure

34A15, and 34P88. These lines were chosen as a partial representation 1A). Alternatively, the CV values were log transformed and of the range of U.S. cultivated maize diversity and include lines

plotted against the log transformed mean values. Polynomial differing in protein, oil, and starch contents. Three types of tissues, V5

curve fitting showed as expected that CV values decreased as leaf, 25 DAP immature kernel, and mature kernel, were used for

the mean intensities increased (Figure 1C). The inflection metabolomic experiments. Because the mature kernels are dormant

points calculated on the basis of the polynomial curves showed and have very limited gene expression, 29,30 only the V5 leaf and the 25

that technical repeat microarrays for PHG9B V5 leaves have DAP immature kernels were used for microarray experiments. Due to

higher background noise (Figure 1D), similar to that shown by limited tissue availability for some varieties, some microarray or

the CV distributions (Figure 1A).

We further investigated the reproducibility of microarray samples were isolated from either V5 leaves or 25 DAP immature

metabolomic experiments were not conducted. For microarray technical repeat controls, eight independent RNA

results by a linear regression model correlating data between kernels of a single plant from the high-oil variety PHG9B and the low-

any pair of microarrays within each group. Four groups were oil line H31 and used for eight different microarray hybridizations. The

analyzed this way for both V5 leaf and 25 DAP kernel samples, signal differences among these hybridizations were considered the

including the eight technical arrays for H31, the six individual technical variation of the microarray methodology. Similarly, eight

biological repeats for H31, the eight technical arrays for independent metabolite extractions were made from bulk collections

PHG9B, and the nine individual biological repeats for PHG9B. of V5 leaves or 25 DAP immature kernels of PHG9B and H31 and

The box plots represent the distributions of R square values of from bulk mature kernels of PH2WBS (high-protein line) and

all pairwise comparisons of linear regression modeling (Figure PHG9B. They were used for independent GC-MS analyses and

technical variation assessment. Multiple sample preparation and testing 2). The technical replicates had consistently higher correlations

steps were used to evaluate the reproducibility of both technical methods. Sample variations were also evaluated by comparing data from different individual plants and different pooled plants.

qRT-PCR. Genes, primers, and probes are listed in Supplementary Table 1 in the Supporting Information. Primers and probes were designed with Primer Express 3.0.1 (Applied Biosystems, Carlsbad, CA, USA) and purchased from Integrated DNA Technologies, Inc. (Coralville, IA, USA). First-strand cDNA was synthesized from the same mRNA samples used for microarray. Fifteen pooled samples

from either V5 leaf or 25 DAP kernel were chosen on the basis of sample availability. For each RT reaction, 240 ng of mRNA was used as a template in a total volume of 80 μL following the manufacturer’s instruction for the SuperScript VILO cDNA synthesis kit (Invitrogen, Carlsbad, CA, USA). All qRT-PCR primers and Taqman probes were designed using the Primer Express program (Applied Biosystems) and

Figure 2. Technical reproducibility of microarrays. Pairwise correlation tested for specificity by Blast search against the NCBI public sequence

coefficients between pairs of technical replicates (T) or between pairs database. The qPCR reactions were carried out in 384-well plates in a

of biological repeats (I). Box plots were generated with TIBCO ViiA 7 real-time-PCR machine (Applied Biosystems) using the

Spotfire. The white bar represents the median value. Edges of boxes TaqMan Gene Expression Master Mix (Applied Biosystems). The

represent values at 75 and 25% percentiles. Edges of bars represent the qPCR program was 50 °C for 2 min, 95 °C for 10 min, followed with

ranges of values with outside dots as outliers. 40 cycles of 95 °C for 15 s and 60 °C for 1 min. Each reaction contains

200 nM of each primer, 100 nM of probe, and 2 μL of the RT reaction than the biological replicates (Figure 2). We conclude that gene solution as template in a final volume of 20 μL. Every reaction was

expression variation from microarrays resulted primarily from repeated three times. The ViiA 7 software V1.2 was used to record and

maize variety differences rather than from plant-to-plant process the data. The Rn (normalized reporter) values of each reaction

differences, pooled sample-to-sample differences, or technical for every cycle were exported and used to calculate the single-well

variations, indicating that the method is sensitive enough to qPCR efficiencies using a real-time PCR Miner program. 31 detect biological variation among individual or pooled plant samples.

■ Correlation between Gene Expression of Individual

RESULTS

Microarray Reproducibility. To compare the reproduci- and Pooled Samples. Next, CV values for the microarrays bility of expression levels of the same genes on repeated

from the six varieties analyzed both as individual samples (I) microarrays, the data were analyzed using correlation statistics.

and as pooled samples (P) were calculated. For each variety, six The coefficient of variation (CV) for each set of repeats was

or nine individual plants and two or three pools of samples calculated and compared as an indication of reproducibility. 32 (three plants per pool) were analyzed. The pools were created

Mean CV values of gene transcripts for technical repeats were by combining equal amounts of RNA extracts from individual

0.25, 0.23, 0.33, and 0.23 for 25 DAP PHG9B, 25 DAP H31, plants. Overall, 73.7% (34A15_I)−93.3% (38B85_P) of the V5 leaf PHG9B, and V5 leaf H31 samples, respectively,

genes had CV values below 0.5 from V5 leaf samples, and relatively low compared to the microarray literature, 33−35

73.8% (38B85_I) − 96.8% (PHG9B_P) of genes had CV indicating good technical reproducibility. Expression of most of

values below 0.5 from 25 DAP kernel samples (Supporting genes on the microarrays had low CV values, with 82.4, 92.1,

Information Supplementary Table 2), representing very good

90.4, and 91.6% of genes from V5 leaf PHG9B, V5 leaf H31, 25 experimental reproducibility. The distributions of the CV values DAP PHG9B, and 25 DAP H31 microarrays exhibiting CV

from both 25 DAP kernel and V5 leaf samples are shown in values below 0.5 (Figure 1A). In addition, these CV values

Supplementary Figure 1 in the Supporting Information. When

Table 1. (Rows 1 and 2) Mean Gene Expression Levels Detected in Individual Plants (I) and Pooled Plants (P) for the Same a Tissue Type of a Variety Are Highly Correlated; b (Rows 3 and 4) Mean Gene Expression Detected Microarrays Are Not Correlated between V5 Leaf and 25 DAP

row sample

PHG9B H31 1 V5 leaf

34A15

38B85

PH2WBR

PH2WBS

a Values are Pearson correlation coefficients (R) comparing mean gene expression intensities between all I and all P samples for each variety−tissue combination, calculated using Excel function PEARSON. b Values are Pearson correlation coefficients (R) comparing gene expression intensities

from V5 leaf and 25 DAP for the same plant samples, calculated using Excel function PEARSON, for individual plants (I) or pooled plants (P). the overall CV distribution patterns of individual or pooled

68% of genes showed a CV value <0.5 when different maize samples were compared, microarrays for 25 DAP kernel

varieties were compared (Supporting Information Supplemen- samples showed larger CV differences, compared to V5 leaf

tary Figure 2 and Table 4), but 73.8% (38B85)−85.1% (H31) samples. Additionally, log10 (CV) versus log10 (mean) plots

of genes had a CV <0.5 when individual plants within a given were generated for all of the microarrays to reveal the

variety were compared (Supporting Information Supplemen- relationship between CV and mean intensities (data not

tary Figure 1 and Table 2). These results indicate that higher shown). The inflection point values were very similar to what

variations exist among different maize varieties compared to was shown by the CV distribution patterns (Supporting

those among individual plants of the same variety, likely due to Information Supplementary Figure 1).

the genetic differences and/or genetic and environmental The Pearson’s correlation coefficient was calculated by

interactions affecting gene expression among varieties. comparing mean gene expressions between individual samples

In addition, the variety-to-variety variation detected in 25 and pooled samples for each variety and tissue type. The R

DAP kernels is similar to that among V5 leaf tissues on the values were between 0.9743 and 0.9959 (Table 1), indicating

basis of their CV distributions (Supporting Information that the signals obtained from individual plants and pooled

Supplementary Figure 2), indicating that gene expression plants were highly correlated and similar. When samples from

variations among different maize varieties are similar between the same variety but different tissue types were compared, the

these two tissue types.

Pearson correlation R values were between 0.234 and 0.317 Confirmation of Microarray Results by qRT-PCR. To (Table 1), indicating significant gene expression differences

confirm the gene expression levels measured by the microarray between leaves and kernels, as expected. For every variety−

experiments, two groups of 18 different genes were chosen for tissue combination, the pooled samples showed a smaller mean

V5 leaf and 25 DAP kernel, respectively (Supporting CV value than the one from corresponding individual samples

Information Supplementary Table 1). The expression levels (Supporting Information Supplementary Table 3). Therefore,

of these genes are ranked across all microarrays at 80 or 50% the plant-to-plant variation detected from the same variety was

percentiles. Expression of these genes was measured by qRT- reduced by pooling three plants into a single sample, essentially

PCR reactions using the same RNA samples used for transforming plant-to-plant variation into sample-to-sample

microarrays. Due to limited sample availability and possible variation. In addition, the distribution patterns of CV values

polymorphisms among different maize varieties at primer from I and P samples were very similar (Supporting

annealing locations, we used a real-time PCR Miner program 31 Information Supplementary Figure 2), indicating that our

that has been validated by many other groups 36−43 to monitor pooling strategy was efficient in capturing the variations existing

the single-well qRT-PCR efficiency. The dapA gene was used as among maize varieties while realizing a cost savings.

a control for comparison between microarray and qRT-PCR Gene Expression Differences between Varieties. To

data. Gene expression levels from qRT-PCR reactions were evaluate gene expression variation between maize varieties,

calculated on the basis of the dapA expression and compared to mean microarray spot intensities from six or nine individual

levels detected on microarrays that were also quantified on the samples (I) and two or three pooled samples, with three

basis of the dapA expression. The ratio of expression levels for individuals per each pool (P), were determined and used for

each gene detected by these two techniques was log CV calculations comparing the six varieties that had both I and

transformed for proper comparison (Figure 3). P samples.

For V5 leaf tissue samples, two genes (pco602011 and When the CV distributions of individual samples represent-

pco603626) were not amplified from any of the 15 templates ing variety-to-variety variations (Supporting Information

by qRT-PCR and therefore not included in the analysis. Two Supplementary Figure 2 and Table 4) were compared to the

genes, pco627753 and pco643043 (genes 2 and 11 in Figure CV distributions representing plant-to-plant variation within a

3A, respectively), had expression detected only in some of the certain variety (Supporting Information Supplementary Figure

samples, and four genes, pco624384, pco521467, pco652567,

3 and Table 2), we found that the former were larger. For V5 and pco658406 (genes 13, 14, 15, and 16 in Figure 3A, leaves, 65.4% of genes showed a CV value <0.5 when different

respectively), showed higher expression (ca. 2−32-fold higher), maize varieties were compared (Supporting Information

relative to microarray, in all 15 samples. For the remaining Supplementary Figure 2 and Table 4), but 73.7% (34A15)−

genes tested, the expression levels were close to those measured 93.1% (38B85) of genes had a CV <0.5 when individual plants

by the microarrays, although there were some variety-specific within a given variety were compared (Supporting Information

expression differences between the two techniques (Figure 3A). Supplementary Figure 1 and Table 2). For 25 DAP kernels,

For 25 DAP kernel samples, one gene (pco621453) did not For 25 DAP kernel samples, one gene (pco621453) did not

Metabolomic Data Analysis. The three processed data matrices contained 3891 metabolomic signatures or fingerprints (m/z value−RI combinations) for V5 leaves, 4300 for 25 DAP immature kernels, and 3891 for mature kernels. Of these, 87− 103 metabolites were successfully identified in tissues examined. These numbers, reduced relative to the raw data set, take into account the elimination of the inherent redundancy in metabolomics signatures and ignore metabolites for which the identity could not be unambiguously established. The substantial reductions are due to (1) eliminating the inherent redundancy in metabolomics signatures wherein each metabolite can be represented by multiple m/z values in its electron impact mass spectrum and (2) ignoring metabolites for which the identity could not be unambiguously established.

Figure 3. Gene expression analysis comparing qRT-PCR to microarray To evaluate technical variations including the sampling, hybridization with V5 leaf (A) or 25 DAP (B) samples. Numbers are

analytical, and data analysis variability, eight technical repeats log 2 transformed ratios between expression levels detected by qRT-

were produced from V5 leaves of PHG9B and H31, 25 DAP PCR and microarray that were defined against dapA expression levels.

kernels from PHG9B and H31, and mature kernels from PH2WBS and PHG9B. For each tissue−variety combination, a

show any amplification from qRT-PCR and was not included single bulk tissue sample was aliquoted into eight extraction for further analysis. Two genes, pco653893 and pco598383

tubes, producing eight metabolomic samples. Mean CV values (genes 14 and 15 in Figure 3B, respectively), were amplified

calculated from technical repeat metabolite relative levels were

Figure 4. CV distribution of metabolite levels detected from V5 leaf (A), 25 DAP immature kernel (B), and mature kernels (C).

between 0.33 and 0.54, and median values were between 0.27 Tissue or Variety Separation Based on Metabolomics. and 0.46, indicating good reproducibility despite some outliers

When the metabolites detected from the three different tissues in the upper ranges (Supporting Information Supplementary

were compared, PCA clearly indicated tissue separation (Figure Figure 3). The majority of metabolites detected showed CV

5), reflecting tissue specificity of metabolic processes, as values <0.6, with 76.7 and 87.4% metabolites from V5 leaves of

expected. However, PCA revealed variety specificity for only PHG9B and H31, 76.1 and 80.2% of metabolites from 25 DAP

certain variety−tissue combinations. For example, for V5 leaf immature kernels of PHG9B and H31, and 77.6 and 60.9% of

tissues, there was clear separation of PH2WBR, PH14T, and metabolites from mature seeds of PH2WBS and PHG9B,

H31 from other varieties based on PC1 and PC3 (Figure 6A). respectively (Supporting Information Supplementary Figure 3).

Likewise, PH2WBS and PH2WBR in 25 DAP kernels were We also found that pooled samples had lower variances

readily distinguished from other varieties with PC1 and PC4 compared to the individual samples (Figure 4; Supporting

(Figure 6E). For mature kernels, PH2WBS and PHG9B Information Supplementary Table 5), similar to what was

showed good separation from other varieties based on PC2 and observed from the transcript data. Particularly, metabolomics

PC3 (Figure 6I).

for mature seed samples showed much less variation than for For the most part, the tissue and variety classifications immature seeds (Supporting Information Supplementary Table

observed with individual plants were also evident in pooled 5), probably due to a less complex metabolome and terminal

plant samples, although sometimes with different principal differentiation state of mature kernels.

component projections (Figures 5 and 6A,J). This result Mean metabolite levels detected from individual and pooled

suggests that pooling did not degrade the discriminating power samples of the same tissue−variety combination are highly

afforded by individual samples. Interestingly, the combined correlated, with Pearson correlation R values all close to 1

percent variance included in the PCA scores plots was slightly (Table 2). For PHG9B and H31, the high correlations are

higher for pooled samples compared to that generated for analogous individual samples, suggesting that pooling removed

Table 2. Mean Metabolite Levels Detected from Individual

some uninformative signal.

Plant (I) and Pooled Plants (P) Samples for the Same Tissue a Loadings associated with examples of the above variety Type of a Variety Are Highly Correlated

classifications were selected graphically (Figures 6C,D; G,H; and K,L; in purple) and listed in Supplementary Table 6 in the

variety V5 leaf

25 DAP

mature

Supporting Information. The very significant increases in the

amount of amino acids in developing kernels, including

glutamic acid, glutamine, histidine, leucine, lysine, pyroglutamic

38B85 0.9988

0.9994

acid (which could be derived from glutamine during sample

PH2WBS

0.9089

0.9952

preparation), and tryptophan, are expected for PH2WBS, a

PH2WBR 0.9985

genotype with elevated grain protein. Explanations for the

PH0GP 0.9970

0.9971

genotype-specific differences (loadings) in the other tissues are

less obvious. For the three examples shown, loadings from

PH14T 0.9994

0.9981

0.9992

pooled plants were very similar to those from individual plants.

PHG9B 0.9993

0.9933

0.9940

Thus, pooling generated similar PCA scores and loadings,

maintaining the ability to classify sample groups (varieties) as Pearson correlation coefficients (R) by Excel function PEARSON.

H31 0.9980

0.9891

well as to identify the prominent metabolites underlying said classifications.

observed despite the fact that the individual samples were from In this GC-MS metabolomic study, we also found that some nine different plants, compared to the technical variation tests

metabolites were detected in only one or two tissue types. for which metabolites were extracted from eight aliquots of the

Among individual plant samples, there were 19 metabolites same plant sample. These results also demonstrate consistent

uniquely detected in V5 leaves, 2 only in immature kernels, and derivatization, GC, MS data acquisition, and data processing.

3 only in mature kernels (Table 3). It is expected that the

Figure 5. PCA score plots from individual or pooled plants showing tissue specificity of metabolomes.

Figure 6. PCA scores and loadings plots from individual plants (A, C; E, G; I, K) or pooled plants (B, D; F, H; J, L) showing classifications of PH2WBR from the leaf metabolome (A−D), PH2WBS from the 25 DAP kernel metabolome (E−H), and PH2WBS from the mature kernel metabolome (I−L). Significant loadings are shown in purple.

Table 3. Apparent Tissue-Specific Metabolites metabolome of leaves is more divergent than that of immature

or mature kernels. Some of these metabolites are present but not detected in other tissues, given our conservative limit of

V5 leaf tyramine

polyamine

peak detection. Moreover, immature and mature kernels

tryptophan

amino acid

contain more polysaccharides by weight than leaves. Because

chlorogenic acid

phenolic acid

approximately 3 mg dry weight samples were used for all three

citramalic acid

organic acid

tissue types, it is expected that the concentration of many small

dehydroascorbic acid, secondary

vitamin

peak 1

molecule metabolites will be greater in leaf than in kernel

dehydroascorbic acid, secondary

vitamin

samples. This could result in apparent tissue specificity, as seen

peak 2

in Table 3.

dehydroascorbic acid, secondary

vitamin

Range and Variations of Metabolite Abundances. We

peak 3

observed large ranges in relative levels for many metabolites

heptadecanoic acid

fatty acid

across all varieties. The ratio between the maximum value and

itaconic acid

organic acid

the minimum value detected for a metabolite in individual

maleic acid

organic acid

samples ranged from 1.8 to 1663 for V5 leaf tissues, from 3.3 to

pyruvic acid

organic acid

salicylic acid

phenolic acid

16815 for immature kernels, and from 2.7 to 585 for mature

cis-caffeic acid

phenolic acid

kernels (Supporting Information Supplementary Table 7).

trans-caffeic acid

phenolic acid

However, when samples were pooled, the ranges narrowed to

α -tocopherol

vitamin

1.4−167 for V5 leaves, 2.1−4828 for immature kernels, and

rhamnose

sugar

1.6−86 for mature kernels (Supporting Information Supple-

trehalose

sugar

mentary Table 7). Similarly, when the mean values within each

glyceric acid-3-phosphate

phosphorylated acid

variety for each metabolite from either individual samples or

phytol

alkane alcohol

pooled samples were compared across all varieties with box

margaric acid

fatty acid

plots, the pooled samples showed much narrower distribution compared to the individual samples (data not shown). This

25 DAP myristic acid

fatty acid

observation indicated that the biological variation among

cysteine, partial derivative

amino acid

individual plants combined with variety variation was very large. However, our pooling strategy effectively decreased the

mature kernel adenosine-5-monophosphate

nucleic acid

biological variation between plants. The actual relative levels are

pipecolic acid

organic acid

specific to the current data set and should not be compared to other data sets, unless they were processed (aligned and scaled) together.

Table 4. Metabolites with Relatively Stable or Highly Variable Levels among Different Maize Varieties and Their CV Values a

mature kernel 0.31 acetic acid

V5 leaf

25 DAP immature kernel

0.29 β -sitosterol 0.30 arabinose

0.30 acetic acid

0.28 campesterol 0.13 β -sitosterol

0.39 arabinose

0.36 erythritol 0.23 campesterol

0.34 aspartic acid part. deriv

0.37 glycerol-3-phosphate 0.40 cellobiose deriv 1

0.32 benzoic acid

0.37 linoleic acid 0.38 cis-aconitic acid

0.30 ethanolamine

0.38 malic acid 0.39 ferulic acid

0.37 fructose deriv 1

0.35 myo-inositol 0.33 galactitol

0.39 galactose

0.31 palmitic acid 0.24 glycerol

0.24 glucose deriv 1

0.31 stigmasterol 0.38 glycerol-3-phosphate

0.38 glyceric acid

0.19 sucrose 0.40 heptadecanoic acid

0.36 isoleucine part. deriv

0.39 tyrosine 0.30 linoleic acid

0.34 leucine part. deriv

0.36 malic acid

0.17 myo-inositol

0.36 mannose

0.30 palmitic acid

0.23 myo-inositol

0.17 phytol

0.25 phosphoric acid 0.37 serine part. deriv

0.35 stigmasterol

0.30 succinic acid

0.14 sucrose

0.38 sucrose

0.36 trans-caffeic acid

0.31 threonine part. deriv

0.31 xylitol

0.35 tyrosine

0.28 xylose deriv 1 0.32 xylose deriv 2

1.24 asparagine 1.79 asparagine part. deriv

1.60 2-aminobutyric acid

1.10 β -alanine part. deriv

2.15 β -amyrin 1.52 aspartic acid

2.14 cis-aconitic acid

2.63 cellobiose deriv 1 2.04 benzoic acid

1.39 citric acid

1.05 cysteine part. deriv 1.21 dehydroascorbic acid

2.26 gluconic acid

1.28 dehydroascorbic acid 1.54 dehydroascorbic acid second peak 1

1.49 glutamic acid

1.03 GABA 1.71 ethanolamine

2.70 glutamine part. deriv

1.04 glucose deriv 2 1.36 glutamic acid

1.84 histidine

1.83 glucose-6-phosphate deriv 2 2.14 glutamine part. deriv

1.63 isocitric acid

1.70 glutamine part. deriv 1.19 glycine

1.25 linolenic acid

1.04 glyceric acid 1.22 glycine part. deriv

2.56 myristic acid

1.60 maltose 1.07 isoleucine

1.03 oleic acid

1.25 pipecolic acid 1.11 leucine

3.43 p-coumaric acid

1.14 pipecolic acid part. deriv 1.45 ornithine

1.57 pyroglutamic acid

1.32 pyroglutamic acid 1.06 serine

1.16 xylitol 1.60 trehalose

1.64 xylose deriv 2 1.21 tyrosine

a CV values were calculated from metabolite levels from all individual (I) and pooled (P) samples for all varieties. Only metabolites that were detected from all I and P samples of all varieties were included for the calculation. Only metabolites with CV values of <0.30 (relatively stable) and

>1.00 (highly variable) are shown. Metabolites in italic were found in all three tissues, and metabolites in bold were found in two tissues for the same category.

Multiple derivative forms for certain metabolites are leaves, 20 metabolites from 25 DAP immature kernels, and 11 characteristic of GC-MS-based metabolomics, as illustrated by

metabolites from the mature kernels that showed a CV value of asparagine in Supplementary Table 7 in the Supporting

<0.4 across all varieties (Table 4), representing tissue-specific Information. Asparagine with four TMS groups (one attached

stable metabolomes. Among them, sucrose and myo-inositol to the carboxyl and three to the amines) was found in mature

were identified from all three tissues, and another 10 kernels, whereas asparagine with just three TMS moieties (one

metabolites appeared in two tissue types. On the other hand, attached to the carboxyl and two to the amines) was specific to

there were 17 metabolites from the V5 leaves, 13 from the 25 V5 leaves. This dichotomy might be explained by differential

DAP immature kernels, and 16 from the mature kernels that trimethylsilylation due to the different sample matrices. 28 showed CV values >1, indicating that these metabolites are

Consequently, comparing metabolomes across tissue types or highly variable among different maize varieties (Table 4). A species should be undertaken with caution.

partial derivative form of glutamine seemed to be highly We also compared the levels of metabolites in all samples

variable in all three tissue types, and another three metabolites across all varieties for a given tissue type and identified

were highly variable for two tissue types. The high variability of metabolites that are quite stable as well as those that are highly

the partial derivative of glutamine may be due, at least in part, variable among varieties. There were 21 metabolites from V5

to inconsistent transformation to pyroglutamic acid, which was to inconsistent transformation to pyroglutamic acid, which was

design of multiple pools with multiple individual samples in associated with trimethylsilylation.

each pool was established as a compromise. 44−47 Thus, the As with gene expression levels detected from microarrays,

ability to detect the difference between biological subject-to- mean metabolite abundances from individual samples or pooled

subject variations and the experimental technical variations is samples were calculated for each variety and used to calculate

combined with the efficiency of the pooling strategy designed CV values among varieties. For all three tissue types, metabolite

to reduce overall variance. The larger the individual-to- variations among different varieties detected from individual or

individual variability is, as compared to technical variability,

the greater the reduction of variability is achieved by pooling are 0.90, 0.92, and 0.82 for V5 leaves, 25 DAP developing

pooled samples are well-correlated. Linear regression R 2 values

samples. 44,45

kernels, and mature kernels, respectively. Furthermore, the CV We designed the microarray and metabolomics experiments distribution patterns are very similar between individual and

to include both individual samples and sample pools. Gene pooled samples (Supporting Information Supplementary Figure

expression levels detected from microarray and metabolite 4). For V5 leaves, 43.7% of metabolites showed higher CV

abundances both showed very good correlation between values in pooled samples compared to individual samples. In 25

individual and pooled samples within the same tissue type DAP developing kernels and mature kernels, the numbers are

and variety (Tables 1 and 2). Using pooled samples lowered

61.5 and 47.6%. This observation indicated that using pooled sample-to-sample variation, resulting in lower CV values samples revealed variety-to-variety metabolomic variation

(Figure 4; Supporting Information Supplementary Figure 1 similar to that using individual samples.

and Tables 2, 3, 5, and 6). Interestingly, pooling microarray samples reduced the CV values more dramatically for 25 DAP

■ samples compared to V5 leaf samples (Supporting Information

DISCUSSION

Thorough evaluation of the applicability and limitations of the Supplementary Figure 1C,D), presumably due to the higher -omics technologies for food safety assessment is necessary

transcriptome variation among 25 DAP individual samples before their acceptance for this purpose. Toward this end, we

compared to V5 leaf individual samples. evaluated high-throughput gene expression and metabolomic

The mean CV values for each variety calculated from either technologies by characterizing the transcriptomes and metab-

individual plants or pooled samples represent variety-to-variety olomes of several conventional maize varieties using alternative

variations. The CV values representing variety-to-variety protocols. Our observations led us to conclude that in applying

variations were similar when obtained from either individual these methods to regulatory issues, consideration should be

plants or pooled samples for both microarrays and metab- given to natural variation in maize transcriptome and to the

olomics (Supporting Information Supplementary Figures 2 and high degree of variation in metabolite concentrations between

4). Furthermore, the distribution patterns of variety CV values plant varieties and individuals of the same variety.

were similar for both microarrays and metabolomics. A slight Technical Variation. To validate methods for both

increase of variation in pooled samples compared with microarray and metabolomics, selected samples were analyzed

individual samples from microarrays was detected (Supporting multiple times to serve as technical repeats. The CV

Information Supplementary Figure 2 and Table 3), presumably distribution for the technical microarrays showed small

due to fewer pooled samples.

variations between different microarray runs for the same Pooling plants prior to analysis also did not adversely affect sample (Figure 1), validating the method and technical

the ability to classify tissues or varieties or identify consistency. When compared to CVs detected from individual

discriminating metabolites by PCA (Figures 5 and 6). In fact, plant samples, the technical CVs are much smaller (Figure 1A;

pooling appeared to enhance discriminating power, presumably Supporting Information Supplementary Table 2), indicating

by eliminating some noise from the data sets. Overall, our that our microarray technology is consistent and sensitive

pooling strategy of three sample pools of three is a cost-saving enough to detect biological variations outside technical

design that does not sacrifice analytical power. variations. The data correlation analysis among repeat arrays

qRT-PCR and Microarray. The use of microarray profiling comparing individual samples and technical repeat samples

for comparative assessment of biotech crops requires a gene confirmed this conclusion (Figure 2). Technical plus biological

expression sequence database for probe design, gene CVs detected from metabolomics, however, were much larger

annotation, and expression level interpretation. For many compared to microarrays (Supporting Information Supplem-

plant species, genomes or transcriptomes have not been etary Figure 3). This increase is not unexpected because

completely sequenced except for a few model genotypes. The expression of many metabolites is dynamically affected by

maize genome has an especially high level of DNA sequence microenvironment. Furthermore, different metabolites have

polymorphisms, approximately an order of magnitude higher very different physical and biochemical properties as well as

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

EVALUASI PENGELOLAAN LIMBAH PADAT MELALUI ANALISIS SWOT (Studi Pengelolaan Limbah Padat Di Kabupaten Jember) An Evaluation on Management of Solid Waste, Based on the Results of SWOT analysis ( A Study on the Management of Solid Waste at Jember Regency)

4 28 1

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

1 59 47

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22