Methalign#
To improve mapping accuracy by utilizing information from allele-specific 5mC sites, we developed a module called Methalign
. This module comprises a pipeline for correcting alignments by the fifth base (5mCG). This module is designed for ultra-complex polyploids, which contain many high-similarity homologous regions and are hard to partition (unstable).
Info
If you want to process Pore-C/HiFi-C data without 5mC information, please use the Mapper
Note
The input bam should contain the MM/ML
tags
Activate the environment of methalign#
Info
The network should be accessible if the methalign environment is first activated.
Calculate the 5mCG sites of contig assembly#
Align the HiFi reads by pbmm2
pbmm2 index --preset CCS contigs.fasta index.mmi
pbmm2 align --preset CCS index.mmi HiFi_reads.bam | samtools view - -b -o HiFi.align.bam
samtools sort HiFi.align.bam -o HiFi.align.sorted.bam
samtools index HiFi.align.sorted.bam
Calculate the 5mC sites by pb-cpg-tools
Align the candidate reads to contig assembly#
Note
Replace --secondary=yes
to --mm2-opts "--secondary=yes"
when using dorado >= 0.8.0
Refine alignments#
- Split bam to speed up the refine step
- Refine the alignments by methylation information
Note
This step will output porec.align.refined.paf.gz
and porec.align.refined.porec.gz
- After refine
Input
porec.align.refined.porec.gz