硕士毕业论文 基于表达序列标签的玉米单核苷酸多态性标记开发.rar
硕士毕业论文 基于表达序列标签的玉米单核苷酸多态性标记开发,摘 要单核苷酸多态性(single nucleotide polymorphism, snp)是由单碱基颠换、转换、插入或缺失引起的dna序列变异。与限制性片段长度多态性(restriction fragment length polymorphism,rflp)和简单重复序列(simple sequence repe...
该文档为压缩文件,包含的文件列表如下:
内容介绍
原文档由会员 ljjwl8321 发布
摘 要
单核苷酸多态性(Single Nucleotide Polymorphism, SNP)是由单碱基颠换、转换、插入或缺失引起的DNA序列变异。与限制性片段长度多态性(Restriction Fragment Length Polymorphism,RFLP)和简单重复序列(Simple Sequence Repeat,SSR)等常用分子标记相比,SNP具有分布密度高、遗传稳定、与表型性状相关性强、检测简便、易于实现自动化分析等诸多优点,被称为第三代分子标记。针对SSR分子标记密度不足, 现有的SNP分子标记开发一般只基于两种基因型的序列差异,用以检测其他基因型的材料时多态性不高,不能满足目前玉米基因精细定位需要的现状,本研究从公共数据库下载来自不同遗传背景的玉米表达序列标签(Expressed Sequence Tag,EST),运用各种生物信息学软件和自主编写的程序,开发基于EST序列的SNP分子标记,并通过高分辨率熔解(High Resolution Melt,HRM)技术验证部分标记的多态性。
以生物信息学理论为基础,利用Blast、Cross_Match、Phrap、ePrimer3、e-PCR和自主开发的Perl脚本程序,基于Windows XP和Linux操作系统构建数据分析系统,并使用Perl语言结合Bioperl模块编写的脚本程序让整个分析过程自动化,使EST数据得到快速、有效的分析。该数据分析系统完成了EST序列聚类,载体序列、低质量序列和重复序列的去除,EST序列拼接,SNP位点的发掘,PCR引物的设计和筛选,SNP分子标记定位以及多态性信息含量(Polymorphism Information Content,PIC)计算。
通过对2018530条玉米EST序列的聚类与拼接发掘出遍布全基因组的80363个SNP位点。在SNP位点两侧的保守序列上设计PCR引物,开发出12388个SNP分子标记,包含34721个SNP位点,12762个位点的多态性信息含量(PIC)大于0.4,具有高度多态性。其中,6008个标记只含单一的SNP位点。根据SNP标记所在“B73”自交系基因组BAC序列在染色体上的位置,12117个标记可定位于玉米的10条染色体上,以第1染色体上的标记最多。并基于Microsoft Windows XP操作系统,利用Adobe Dreamweaver等软件在四川农业大学玉米研究所网站上建立标记公共数据库的静态平台(http://www.sicau.edu.cn/web/yms/snp/snp.html),所有SNP分子标记及相关信息可登录该网页查阅。
从所开发的SNP分子标记中随机挑选9对标记引物,以46个玉米自交系的为材料,根据差异核苷酸高分辨率熔解曲线形状的不同,利用HRM技术对加有饱和荧光染料的实时定量PCR扩增产物进行基因分型来验证SNP的多态性。荧光值差异曲线显示:在有荧光信号的自交系中,9对引物的PCR产物都被HRM分型,表现出多态性,P5和P6出现两种基因型,其余引物都显示出3种基因型,SNP检测率为100%。熔解峰显示:对于有信号的自交系,9对引物的PCR产物熔解峰都为单一峰,无非特异性扩增及引物二聚体产生,且峰值相近,说明这些引物的特异性高较好。由于在SNP标记开发过程中,引物是基于模式自交系“B73”的BAC文库设计及筛选的,扩增信号的有无可能跟B73与其他自交系在引物结合区域的核苷酸差异有关。每对引物都有少数自交系目的条带没有被扩增出来,HRM熔解无荧光信号,其中,P4只有“丹340”无信号,P1最多,有14个自交系没有信号。
综上所述,本研究开发SNP分子标记共计12388个,其中6008个标记只含单一的SNP位点,这些标记具有较高的多态性,在基因精细定位、关联分析及分子标记辅助育种上具有一定的应用价值。
关键词:玉米;表达序列标签;单核苷酸多态性分子标记;高分辨率熔解曲线
Abstract
A single-nucleotide polymorphism (SNP) is a DNA sequence variation caused by nucleotides substitution, deletion or insertion. Compared with the commonly used molecular makers such as restriction fragment length polymorphism (RFLP) and simple sequence repeat (SSR), SNP is recognized as the third generation of molecular makers for its high density coverage, stable heredity, high correlation with phenotype, easy detection and automatical analysis, and so on. Due to the inadequate density of SSR markers, and the inadequate polymorphism of SNP markers developed based on sequence difference between only two genotypes, these two kinds of DNA molecular markers do not meet the current demand of fine mapping for maize genes. In this study, expressed sequence tags (EST) of maize (Zea mays) with different genetic background were downloaded from public databases, and used for the development of SNP molecular markers, with the help of different bioinformatical softwares and self-developed Perl scripts. High resolution melt (HRM) was used to eva luate the predicted SNP makers.
Based on Windows XP and Linux operating systems, software such as Blast, cross_match, Phrap, ePrimer3, e-PCR and self-developed Perl scripts were used to construct an analysis system by bioinformatics approach. Moreover, using Bioperl modules, the scripts written with Perl language make automatization of the whole analysis process come true. The data of EST could be analyzed promptly and effectively. The analysis system includes Clustering EST sequences, clipping vectors, detecting low quality and repeated sequences, aligning EST sequences, mining SNP sites, designing and filtrating PCR primers, mapping SNP molecular marker on chromosome and calculating polymorphism information content (PIC).
On the basis of the Cluster and alignment among 2018530 pieces of EST sequences, 80363 SNP loci were found out throughout the genome. According to the flanking conserved sequences beyond these SNP loci, 12388 pairs of PCR primers were designed to amplify sequences involving 34721 SNP loci, and provided to be used as SNP molecular markers, among which 6008 contain only one S..
单核苷酸多态性(Single Nucleotide Polymorphism, SNP)是由单碱基颠换、转换、插入或缺失引起的DNA序列变异。与限制性片段长度多态性(Restriction Fragment Length Polymorphism,RFLP)和简单重复序列(Simple Sequence Repeat,SSR)等常用分子标记相比,SNP具有分布密度高、遗传稳定、与表型性状相关性强、检测简便、易于实现自动化分析等诸多优点,被称为第三代分子标记。针对SSR分子标记密度不足, 现有的SNP分子标记开发一般只基于两种基因型的序列差异,用以检测其他基因型的材料时多态性不高,不能满足目前玉米基因精细定位需要的现状,本研究从公共数据库下载来自不同遗传背景的玉米表达序列标签(Expressed Sequence Tag,EST),运用各种生物信息学软件和自主编写的程序,开发基于EST序列的SNP分子标记,并通过高分辨率熔解(High Resolution Melt,HRM)技术验证部分标记的多态性。
以生物信息学理论为基础,利用Blast、Cross_Match、Phrap、ePrimer3、e-PCR和自主开发的Perl脚本程序,基于Windows XP和Linux操作系统构建数据分析系统,并使用Perl语言结合Bioperl模块编写的脚本程序让整个分析过程自动化,使EST数据得到快速、有效的分析。该数据分析系统完成了EST序列聚类,载体序列、低质量序列和重复序列的去除,EST序列拼接,SNP位点的发掘,PCR引物的设计和筛选,SNP分子标记定位以及多态性信息含量(Polymorphism Information Content,PIC)计算。
通过对2018530条玉米EST序列的聚类与拼接发掘出遍布全基因组的80363个SNP位点。在SNP位点两侧的保守序列上设计PCR引物,开发出12388个SNP分子标记,包含34721个SNP位点,12762个位点的多态性信息含量(PIC)大于0.4,具有高度多态性。其中,6008个标记只含单一的SNP位点。根据SNP标记所在“B73”自交系基因组BAC序列在染色体上的位置,12117个标记可定位于玉米的10条染色体上,以第1染色体上的标记最多。并基于Microsoft Windows XP操作系统,利用Adobe Dreamweaver等软件在四川农业大学玉米研究所网站上建立标记公共数据库的静态平台(http://www.sicau.edu.cn/web/yms/snp/snp.html),所有SNP分子标记及相关信息可登录该网页查阅。
从所开发的SNP分子标记中随机挑选9对标记引物,以46个玉米自交系的为材料,根据差异核苷酸高分辨率熔解曲线形状的不同,利用HRM技术对加有饱和荧光染料的实时定量PCR扩增产物进行基因分型来验证SNP的多态性。荧光值差异曲线显示:在有荧光信号的自交系中,9对引物的PCR产物都被HRM分型,表现出多态性,P5和P6出现两种基因型,其余引物都显示出3种基因型,SNP检测率为100%。熔解峰显示:对于有信号的自交系,9对引物的PCR产物熔解峰都为单一峰,无非特异性扩增及引物二聚体产生,且峰值相近,说明这些引物的特异性高较好。由于在SNP标记开发过程中,引物是基于模式自交系“B73”的BAC文库设计及筛选的,扩增信号的有无可能跟B73与其他自交系在引物结合区域的核苷酸差异有关。每对引物都有少数自交系目的条带没有被扩增出来,HRM熔解无荧光信号,其中,P4只有“丹340”无信号,P1最多,有14个自交系没有信号。
综上所述,本研究开发SNP分子标记共计12388个,其中6008个标记只含单一的SNP位点,这些标记具有较高的多态性,在基因精细定位、关联分析及分子标记辅助育种上具有一定的应用价值。
关键词:玉米;表达序列标签;单核苷酸多态性分子标记;高分辨率熔解曲线
Abstract
A single-nucleotide polymorphism (SNP) is a DNA sequence variation caused by nucleotides substitution, deletion or insertion. Compared with the commonly used molecular makers such as restriction fragment length polymorphism (RFLP) and simple sequence repeat (SSR), SNP is recognized as the third generation of molecular makers for its high density coverage, stable heredity, high correlation with phenotype, easy detection and automatical analysis, and so on. Due to the inadequate density of SSR markers, and the inadequate polymorphism of SNP markers developed based on sequence difference between only two genotypes, these two kinds of DNA molecular markers do not meet the current demand of fine mapping for maize genes. In this study, expressed sequence tags (EST) of maize (Zea mays) with different genetic background were downloaded from public databases, and used for the development of SNP molecular markers, with the help of different bioinformatical softwares and self-developed Perl scripts. High resolution melt (HRM) was used to eva luate the predicted SNP makers.
Based on Windows XP and Linux operating systems, software such as Blast, cross_match, Phrap, ePrimer3, e-PCR and self-developed Perl scripts were used to construct an analysis system by bioinformatics approach. Moreover, using Bioperl modules, the scripts written with Perl language make automatization of the whole analysis process come true. The data of EST could be analyzed promptly and effectively. The analysis system includes Clustering EST sequences, clipping vectors, detecting low quality and repeated sequences, aligning EST sequences, mining SNP sites, designing and filtrating PCR primers, mapping SNP molecular marker on chromosome and calculating polymorphism information content (PIC).
On the basis of the Cluster and alignment among 2018530 pieces of EST sequences, 80363 SNP loci were found out throughout the genome. According to the flanking conserved sequences beyond these SNP loci, 12388 pairs of PCR primers were designed to amplify sequences involving 34721 SNP loci, and provided to be used as SNP molecular markers, among which 6008 contain only one S..