跳转至

Collapse

塌缩contig常见于多倍体的杂交种,频繁杂交引入了近乎一致的区域,这些区域目前难以被从头组装软件直接组装成正常的拷贝。我们提供了一个解决方案,该方案基于contig测序深度(HiFi / ONT / Pore-C)计算拷贝数鉴定塌缩contig(CN>=2),然后通过contig本身和互作信号,将contig复制并插入到正确的位置。该方案适用于少量的塌缩,暂时不适用于整条染色体塌缩的情况, 大量塌缩应采用另外一种策略。

塌缩contig鉴定#

  • Custom mapping
    minimap2 -cx map-hifi -I 16g -t 40 --secondary=no draft.asm.fasta hifi.fastq.gz | pigz -p 10 -c > hifi.align.paf.gz
    
    cphasing-rs paf2depth hifi.align.paf.gz -w 5000 -s 1000 -o hifi.align.depth
    cphasing collapse from-depth hifi.align.depth
    
  • Directly use the hitig results
    output.collapsed.contigs.list
  • Custom mapping
    minimap2 -cx map-ont -I 16g -t 40 --secondary=no draft.asm.fasta ont.fastq.gz | pigz -p 10 -c > ont.align.paf.gz
    
    cphasing-rs paf2depth hifi.align.paf.gz -w 5000 -s 1000 -o ont.align.depth
    cphasing collapse from-depth ont.align.depth
    
  • Directly use the hitig results
    output.collapsed.contigs.list

The porec.align.paf.gz generated from cphasing mapper.

cphasing-rs paf2depth porec.align.paf.gz -w 5000 -s 1000 -o porec.align.depth
cphasing collapse from-depth porec.align.depth

Hifiasm输出的GFA 文件里记录了unitig 的read number

cphasing collapse from-gfa hifi.p_utg.noseq.gfa.gz 

塌缩contig补救#

cphasing collapse rescue 3.hyperpartition/porec.align.porec.q1.e5m.hg draft.asm.contigsizes 3.hyperpartition/output.clusters.txt contigs.collapsed.contig.list -n 4 -at 3.hyperpartition/draft.asm.allele.table

Note

目前,此步骤输出的为collapsed.rescue.clusters.txt格式,需要用户自行运行后续的4.scaffolding以完成contig的排序和定向。

运行完scaffolding#

  • 生成一个新的contig水平的fasta和agp文件,主要目的是重命名复制的contig(如utg000001l -> utg000001_d2)
cphasing agp2fasta groups.rescued.agp draft.asm.fasta --contigs > draft.dup.fasta
cphasing collapse agp-dup groups.rescued.agp > groups.dup.agp
  • 生成新的pairs.gz or pairs.pqs文件
    cphasing collapse pairs-dup sample.pairs.pqs collapsed.rescue.contigs.list -o sample.dup.pairs.pqs 
    
  • Juicebox输入文件
    cphasing utils agp2assembly groups.dup.agp > groups.dup.assembly
    cphasing pairs2mnd sample.dup.pairs.pqs -o sample.dup.mnd.txt
    
  • Rename
    cphasing rename -r ref.fa -f draft.dup.fasta -a groups.dup.agp -t 40 
    
  • Plot
    cphasing pairs2cool sample.dup.pairs.pqs sample.dup.pairs.pqs/_contigsizes sample.10k.cool 
    cphasing plot -a groups.dup.agp -m sample.10k.cool -o sample.500k.wg.png