NanoUPLC-MSE proteomic data assessment of soybean seeds using the Uniprot database
نویسندگان
چکیده
BACKGROUND Recombinant DNA technology has been extensively employed to generate a variety of products from genetically modified organisms (GMOs) over the last decade, and the development of technologies capable of analyzing these products is crucial to understanding gene expression patterns. Liquid chromatography coupled with mass spectrometry is a powerful tool for analyzing protein contents and possible expression modifications in GMOs. Specifically, the NanoUPLC-MSE technique provides rapid protein analyses of complex mixtures with supported steps for high sample throughput, identification and quantization using low sample quantities with outstanding repeatability. Here, we present an assessment of the peptide and protein identification and quantification of soybean seed EMBRAPA BR16 cultivar contents using NanoUPLC-MSE and provide a comparison to the theoretical tryptic digestion of soybean sequences from Uniprot database. RESULTS The NanoUPLC-MSE peptide analysis resulted in 3,400 identified peptides, 58% of which were identified to have no miscleavages. The experiment revealed that 13% of the peptides underwent in-source fragmentation, and 82% of the peptides were identified with a mass measurement accuracy of less than 5 ppm. More than 75% of the identified proteins have at least 10 matched peptides, 88% of the identified proteins have greater than 30% of coverage, and 87% of the identified proteins occur in all four replicates. 78% of the identified proteins correspond to all glycinin and beta-conglycinin chains.The theoretical Uniprot peptide database has 723,749 entries, and 548,336 peptides have molecular weights of greater than 500 Da. Seed proteins represent 0.86% of the protein database entries. At the peptide level, trypsin-digested seed proteins represent only 0.3% of the theoretical Uniprot peptide database. A total of 22% of all database peptides have a pI value of less than 5, and 25% of them have a pI value between 5 and 8. Based on the detection range of typical NanoUPLC-MSE experiments, i.e., 500 to 5000 Da, 64 proteins will not be identified. CONCLUSIONS NanoUPLC-MSE experiments provide good protein coverage within a peptide error of 5 ppm and a wide MW detection range from 500 to 5000 Da. A second digestion enzyme should be used depending on the tissue or proteins to be analyzed. In the case of seed tissue, trypsin protein digestion results offer good databank coverage. The Uniprot database has many duplicate entries that may result in false protein homolog associations when using NanoUPLC-MSE analysis. The proteomic profile of the EMBRAPA BR-16 seed lacks certain described proteins relative to the profiles of transgenic soybeans reported in other works.
منابع مشابه
Label-Free Quantitative Proteomic Analysis of Puccinia psidii Uredospores Reveals Differences of Fungal Populations Infecting Eucalyptus and Guava
Puccinia psidii sensu lato (s.l.) is the causal agent of eucalyptus and guava rust, but it also attacks a wide range of plant species from the myrtle family, resulting in a significant genetic and physiological variability among populations accessed from different hosts. The uredospores are crucial to P. psidii dissemination in the field. Although they are important for the fungal pathogenesis,...
متن کاملGel-based and gel-free proteome data associated with controlled deterioration treatment of Glycine max seeds
Data presented here are associated with the article: "In-depth proteomic analysis of soybean (Glycine max) seeds during controlled deterioration treatment (CDT) reveals a shift in seed metabolism" (Min et al., 2017) [1]. Seed deterioration is one of the major problems, affecting the seed quality, viability, and vigor in a negative manner. Here, we display the gel-based and gel-free proteomic da...
متن کاملThe PIR integrated protein databases and data retrieval system
The Protein Information Resource (PIR) provides many databases and tools to support genomic and proteomic research. PIR is a member of UniProt––Universal Protein Resource––the central repository of protein sequence and function, which maintains UniProt Knowledgebase with extensively curated annotation, UniProt Reference databases to speed sequence searches, and UniProt Archive to reflect sequen...
متن کاملWhole-Genome Resequencing Identifies the Molecular Genetic Cause for the Absence of a Gy5 Glycinin Protein in Soybean PI 603408
During ongoing proteomic analysis of the soybean (Glycine max (L.) Merr) germplasm collection, PI 603408 was identified as a landrace whose seeds lack accumulation of one of the major seed storage glycinin protein subunits. Whole genomic resequencing was used to identify a two-base deletion affecting glycinin 5 The newly discovered deletion was confirmed to be causative through immunological, g...
متن کاملMethod optimization for proteomic analysis of soybean leaf: Improvements in identification of new and low-abundance proteins
The most critical step in any proteomic study is protein extraction and sample preparation. Better solubilization increases the separation and resolution of gels, allowing identification of a higher number of proteins and more accurate quantitation of differences in gene expression. Despite the existence of published results for the optimization of proteomic analyses of soybean seeds, no compar...
متن کامل