prepartition: Introduced a method to use a monoploid reference for guiding the initial contig clustering. This can be integrated into the pipeline using the -fc flag to bypass the 1st-round hyperpartition.
pipeline: Added the --gfa option to facilitate the removal of redundant contigs post-scaffolding.
Improved anchor rate for Pore-C data
- mapper
- Minimized the incidence of h-trans artifacts in Pore-C alignments specifically for reads with a MAPQ of 1.
- pipeline
- Adjusted -q2 to 1 to leverage more hyperedges for phasing haplotypes.
Enhanced performance of collapsed rescue
- pipeline
- Refactored the step ordering to seamlessly integrate the collapsed rescue module.
- The collapsed rescue module can now be activated by providing a collapsed contig list (--collapsed-contigs) or by using the --collapsed-rescue flag (which delivers higher performance implicitly suited for Pore-C data).
- collapsed rescue
- Refactored the underlying code to boost overall execution speed.
rename
- support for rename chromosome-level fasta (cphasing rename -r mono.fa -f groups.asm.fasta)
alleles
- Contig length support has been significantly scaled up, increasing the maximum supported length from 134 Mb to 34 Gb (handling up to 1M contigs).
hyperpartition
- Fixed a performance bug that caused kprune to execute slowly.
- Resolved a bug reported in issue #45.
pipeline
- Fixed an issue where the second round inadvertently re-ran even if pairs.pqs already existed.
scaffolding
- Fixed a bug preventing the scaffolding module from properly processing duplicated contigs.
curation
- Fixed an issue where separating haplotypes resulted in only one chromosome appearing in the final assembly.
hic mapper
- Fixed an error in Minimap2 reporting self.prefix as unassigned.
- Resolved a bug that caused the repeated conversion of pairs files to pairs.pqs.
cphasing-rs bam2paf
- Fixed a bug that generated incorrect read positions during alignment parsing.
alleles
- fixed bug that alleles load two times fasta
mapper
- fixed bug that conflicts with the loading of .fai when submitting multiple jobs.
- fixed bug that can not load multiple fastq files
plot
- fixed bug that can not plot duplicated contigs (without rename)
- fixed bug that of "TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'" error when plot each chromosome
sort-chromosomes, normalized by length
hyperpartition
- fixed bug that reported in issue #44, which can not generate .hg file (to specified --output-hg)
mapper
- Changed default --mm2-params from "-x map-ont" to "-x lr:hq", which was recommended by ONT developer for chemistry v14.
hic mapper
- Updated chromap from v0.2.5 to v0.3.2, which will reduce the runtime on the used CPU larger than 12.
- Enabled input multiple Hi-C data
methalign
- optimized the pipeline of 5mC sites calling on reference
- speed up the algorithm using rust and slightly increase the accuracy of the refined alignments.
pipeline
- add --split-length to automatically split contig to partition, which avoid the extremely long contig errors clustered together
- add --merge-use-allele, use the allelic information to help the homologous chromosome clustering
hyperpartition
- Partially resolved problem that extremely long contig errors aggregated together by splitting contigs
- Enable the first round partition to merge groups that contain slightly h-trans signals (--merge-use-allele)
- Improved the accuracy of results when merging N groups to k groups.
scaffolding
- Significantly reduce the number of large-scale false orientation errors.
- Sort haplotypes by pairwise similarity.
plot
- add --avoid-overlap-yticks parameter to avoid the overlapping of yticks
- enable using --add-hap-border to add border of each chromosome
- add --no-x-ticks and --no-y-ticks
- To reduce the size of the heatmap output file, we reduced the size of the picture by half
- change default colormap from red1p_r to red1p_r_half.
- add automatically vmax for --scale log1p or --scale log
- only balance cis contacts and add custom iced balance function to accelerate it
cphasing-rs pairs-intersect, fix bug that "an 'Err' value: ShapeMismatch(ErrString("filter's length: 999999 differs from that of the series: 1000000"))"
cphasing-rs splitclm, fixed bug that it will lose several pairs.
chimeric, improve the performance that avoids too high of false correct
pipeline, if input pairs or pairs.gz, it will first convert it to pairs.pqs to speed up subsequence pairs data load. About a 15 percent increase in speed.
plot, enable plot the border or haplotypes (--add-hap-border)
alleles, add --trim-length to trim the both end of contigs to remove the
effect of overlapping from hifiasm assembly graph.
pipeline
- --preset precision: Optimize parameters to improve accuracy at the expense of anchor rate
- --preset sensitive: Using in some complex genome, which contain many fragmented contigs and low signals contigs
activate_cphasing, fixed bug that exit shell window when pixi failed to install
pipeline
- fixed bug that program can not exit when error occurred
- fixed bug that alleles parameters cannot affect in phasing mode
- fixed bug that reported in issue #14, which --chimeric-correct mode cannot use correct fasta in 3.hyperpartition
plot, fixed bug that program can not coarsen adjusted matrix to plot another binsize matrix
higig
- support for hifi data. Moreover, support for junk and collapsed identification
- correct-alignments, increase performance by filtering low quality LIS
pairs2cool, add min_mapq filtering
plot, add triangle plot
scaffolding, speed up split clm by cphasing-rs splitclm
HyperPartition
- add post check to increase the accuracy of partition
- add min-contacts parameters to remove low contact contig pairs
- add negative allelic algorithm
- change multi to incremental
- add whitelist and blacklist parameters