(最后更新:2017/10/01)
Lepmap3 遗传图谱构建流程
Lepmap3基本流程
运行前数据准备
vcf文件
vcf文件必须有likelihood信息,lepmap会用likelihood信息进行分析
pedigree文件
pedigree文件包含了样本名及家系等相关信息,pedigree文件为tab分隔的文本文件,示例如下:
CHR | POS | F | F | F | F | F | F |
CHR | POS | female | male | progeny_1 | progeny_2 | progeny_3 | progeny_4 |
CHR | POS | 0 | 0 | male | male | male | male |
CHR | POS | 0 | 0 | female | female | female | female |
CHR | POS | 2 | 1 | 0 | 0 | 0 | 0 |
CHR | POS | 0 | 0 | 0 | 0 | 0 | 0 |
每行涵义如下
1 | line 1:家系(Family)名称 |
其他软件输出格式转换为Lepmap3格式
JoinMap loc files:
1 | awk -f locsingle.awk file.loc|awk -f loc2genotypes.awk|awk -f genotypes2post.awk |java -cp Lep-MAP3/bin ... data=- ... |
Genotype assay data (in tab separated affx_genotypes.txt):
1 | SNP IND1 IND2 IND3 IND4 ... |
1 | awk -f affx2post.awk affx_genotypes.txt|java -cp Lep-MAP3/bin ParentCall2 data=pedigree.txt posteriorFile=- ... |
运行方法
ParentCall2
该步骤是为了call亲本的基因型(如果已经有,可以矫正亲本基因型,如果没有,可以补缺失亲本基因型),也就是猜测亲本的基因型,并且可以查看性染色体的marker。 支持Grandparents信息,以及半同胞家系。
The module ParentCall2 is used to call (possible missing or erroneous) parental genotypes and markers on sex chromosomes (XY or ZW system, assumes known offspring sex). Grandparents and half-sib familes are also supported (especially the grandparents should be provided if there is data on them).
1 | java ParentCall2 data=pedigree.txt vcfFile=input.vcf removeNonInformative=1|gzip >data.call.gz |
Calling markers on autosomes and on X chromosome (X-)
1 | java ParentCall2 data=... XLimit=2 ... |
Calling markers on autosomes and on Z chromosome (Z-)
1 | java ParentCall2 data=... ZLimit=2 ... |
Filtering2
该步骤可以根据偏分离(dataTolerance)和缺失率过滤标记(missingLimit)
1 | java Filtering2 data=data1.call dataTolerance=0.001 >data_f.call |
SeparateChromosomes2
该步骤可以将标记进行分群
1 | java SeparateChromosomes2 data=data_f.call lodLimit=5 >map5.txt |
(parameter numThreads utilises multiple cores)
The size distribution of linkage groups can be obtained like this:
1 | sort map5.txt|uniq -c|sort -n |
JoinSingles2All
该步骤可以将上一步中未分到连锁群的标记重新插入到已有的连锁群中
The JoinSingles2All module assigns singular markers to existing LGs by computing LOD scores between each single marker and markers from the existing LGs. It needs a map file as input parameter and then it outputs a new map file with possible more markers assigned to LGs.
1 | java -cp bin/ JoinSingles2All map=map5.txt data=data_f.call lodLimit=4 >map5_js.txt |
(parameter numThreads utilises muliple cores)
(iterated joinSingles2All yields same result as iterating it until no markers can be added)
1 | java -cp bin/ JoinSingles2All map=map5.txt data=data_f.call lodLimit=4 iterate=1 >map5_js_iterated.txt |
The size distribution of linkage groups can be obtained like this:
1 | cut -f 1 map5_js.txt|sort|uniq -c|sort -n |
OrderMarkers2
对每个连锁群内的标记进行排序,以最大化likelihood
OrderMarkers2 orders the markers within each LG by maximizing the likelihood of the data given the order. Either a map file (map=map_file) or output of previous run of OrderMarkers2 (evaluateOrder=order_file) have to be provided (only the first column (the marker name or lg name) is read from these files).
1 | java -cp bin/ OrderMarkers2 map=map5_js.txt data=data_f.call chromosome=1 >order1.txt |
已知标记顺序,计算遗传距离
1 | java -cp bin/ OrderMarkers2 evaluateOrder=order2.txt data=data_f.call improveOrder=0 >order1.3.txt |
已知标记顺序,计算中性图距离
1 | java -cp bin/ OrderMarkers2 evaluateOrder=order1.txt data=data_f.call improveOrder=0 sexAveraged=1 >order1.SA.txt |
无交叉减数分裂achiasmatic meiosis (父本中无重组交换):
1 | java -cp bin/ OrderMarkers2 map=map.txt data=data_f.call recombination1=0 |
无交叉减数分裂achiasmatic meiosis (母本中无重组交换):
1 | java -cp bin/ OrderMarkers2 map=map.txt data=data_f.call recombination2=0 |
分染色体排序
1 | java -cp bin/ OrderMarkers2 map=mapBig.txt data=dataBig.call chromosome=1 >order1.1.txt |
LMPlot
绘制点图
for every chromosome :
1 | java -cp bin/ OrderMarkers2 map=map.txt data=data_f.call outputPhasedData=1 chromosome=1 >order_wp_1.txt |
or:
1 | java -cp bin/ OrderMarkers2 evaluateOrder=order1.txt data=data_f.call outputPhasedData=1 >order_wp_1.txt |