Methalign¶
To improve mapping accuracy by utilizing information from allele-specific 5mC sites, we developed a module called Methalign. This module comprises a pipeline for correcting alignments by the fifth base (5mCG). This module is designed for ultra-complex polyploids, which contain many high-similarity homologous regions and are hard to partition (unstable).
Info
If you want to process Pore-C/HiFi-C data without 5mC information, please use the Mapper
Note
The input bam should contain the MM/ML tags
Activate the environment of methalign¶
Info
The network should be accessible if the methalign environment is first activated.
Calculate the 5mCG sites of contig assembly¶
Align the HiFi reads by pbmm2
pbmm2 index --preset CCS contigs.fasta index.mmi
pbmm2 align --preset CCS index.mmi HiFi_reads.bam | samtools view - -b -o HiFi.align.bam
samtools sort HiFi.align.bam -o HiFi.align.sorted.bam
samtools index HiFi.align.sorted.bam
Calculate the 5mC sites by pb-cpg-tools
Align the candidate reads to contig assembly¶
Note
Replace --secondary=yes to --mm2-opts "--secondary=yes" when using dorado >= 0.8.0
Refine alignments¶
- Refine the alignments by methylation information
Note
This step will output methalign.refined.paf.gz and methalign.refined.porec.gz
- After refine
Input
porec.align.refined.porec.gz