Collapse
Collapsed contigs are commonly observed in polyploid hybrids due to the presence of highly similar homologous regions introduced through hybridization. These regions pose significant challenges for direct de novo assembly using current computational approaches (e.g. hifiasm). To address this limitation, we developed a strategy comprising two steps:
(1) Collapsed contigs detection: Identification of candidate contigs (copy number ≥2) through integrated analysis of sequencing depth profiles from HiFi, ONT, and Pore-C data.
(2) Collapsed contigs rescuing: Duplicating and putting collapsed contigs into correctly groups.
Warning
This methodology’s particular efficacy in resolving localized collapsed regions can not resolve the collapsed regions that nearly whole chromosomes.
Collapsed contigs detection#
- Custom mapping
- Directly use the hitig results
output.collapsed.contigs.list
- Custom mapping
- Directly use the hitig results
output.collapsed.contigs.list
The porec.align.paf.gz
generated from cphasing mapper
.
Collapsed contigs rescuing#
cphasing collapse rescue 3.hyperpartition/porec.align.porec.q1.e5m.hg draft.asm.contigsizes 3.hyperpartition/output.clusters.txt contigs.collapsed.contig.list -n 4 -at 3.hyperpartition/draft.asm.allele.table
Note
Currently, we output the rescue result into a collapsed.rescue.clusters.txt
file, which the user executes the next step of 4.scaffolding
manually. And the input file can directly use the previous.
After scaffolding#
- Generate new contig-level fasta and agp, which renamed the duplicated contigs (e.g. utg000001l -> utg000001_d2) The name of collapsed contigs in agp is need to rename for subsequence processes.
- Generate a new
pairs.gz
orpairs.pqs
file
- Curation by juicebox
- Rename
- Plot