One command pipeline of C-Phasing#

Pipeline#

The -n 8:4 parameter of the following commands means assembling a tetraploid (4) with 8 chromosome basic numbers. If you set -n 0:0 means partition in both rounds automatically, also support it set to -n 8:0 or -n 0:4.

Note

CPhasing also support the monoploid scaffolding, when you set one group number, e.g. -n 8. The pipeline will automatically skip the step 1.alleles, and only run one round partition.

Note

If the user's genome is an allopolyploid with low subgenome similarity, the initial grouping may not be optimal. In such cases, users should adjust the -n parameter based on the ploidy and genome structure:

For allotetraploids (2n = 4x = 32), the genome can be treated as a diploid assembly. Use -n 16 if you want to assemble two subgenomes, set -n 16:2, if you want to phase the subgenome, respectively.
For allohexaploids:
- AAABBB type (2n = 6x = 48): use -n 16:3.
- AABBCC type (2n = 6x = 48): use -n 24:2.

Start from a pore-c data:#

cphasing pipeline -f draft.asm.fasta -pcd sample.fastq.gz -t 10 -n 8:4

Start from multiple Pore-C data:#

specify multiple -pcd parameters.

cphasing pipeline -f draft.asm.fasta -pcd sample1.fastq.gz -pcd sample2.fastq.gz -t 10 -n 8:4

Note

If you want to run on cluster system and submit them to multiple nodes, you can use cphasing mapper and cphasing-rs porec-merge to generate the merged porec.gz file and input it by -pct parameter. Please check the doc:Mapper

Start from a Pore-C table (`.porec.gz`):#

which is generated by cphasing mapper.

cphasing pipeline -f draft.asm.fasta -pct sample.porec.gz -t 10 -n 8:4

Start from HiFi-C data#

Run pipeline or mapper with --mm2-params "-x map-hifi" parameter. And the output similar to the results of pore-c data.

cphasing pipeline -f draft.asm.fasta -pcd hific.fastq.gz --mm2-params "-x map-hifi "  -t 10 -n 8:4

Note

The mapping results of HiFi-C is similar to Pore-C, such as output suffix with porec.gz, and process it use porec-merge, porec-intersect, et al.

Start from Hi-C data#

cphasing pipeline -f draft.asm.fasta -hic1 Lib_R1.fastq.gz -hic2 Lib_R2.fastq.gz -t 10 -n 8:4

Note

1 | If you want to run multiple samples, you can use cphasing hic mapper and cphasing-rs pairs-merge to generate the merged pairs.pqs file, and input it by -prs parameter.
2 | If the total length of your input genome is larger than 8 Gb, the -hic-mapper-k 27 -hic-mapper-w 14 should be specified, to avoid the error of chromap.

Start from 4DN pairs (`pairs.pqs` or `pairs.gz`) file#

pairs.pqspairs.gz

cphasing pipeline -f draft.asm.fasta -prs sample.pairs.pqs -t 10 -n 8:4

cphasing pipeline -f draft.asm.fasta -prs sample.pairs.gz -t 10 -n 8:4

Skip some steps#

## skip steps 1.alleles and 2.prepare steps 
cphasing pipeline -f draft.asm.fasta -pct sample.porec.gz -t 10 -ss 1,2

Perform only specified steps#

## run 3.hyperpartition 
cphasing pipeline -f draft.asm.fasta -pct sample.porec.gz -t 10 -s 3

Improve performance#

Add the -hcr parameter to remove the greedy contacts (several regions contact with the whole genome) to improve the phasing quality. And specified -p AAGCTT to normalize the depth calculation to avoid biased of RE's distribution, which AAGCTT is the pattern of restriction enzyme.

cphasing pipeline -f draft.asm.fasta -pct sample.porec.gz -t 10 -hcr -p AAGCTT

Curation by Juicebox#

generate .assembly and .hic, depend on 3d-dna

cphasing pairs2mnd sample.pairs.gz -o sample.mnd.txt
cphasing utils agp2assembly groups.agp > groups.assembly
bash ~/software/3d-dna/visualize/run-assembly-visualizer.sh sample.assembly sample.mnd.txt

Note

if chimeric corrected, please use groups.corrected.agp and generate a new corrected.pairs.pqs by cphasing-rs pairs-break

After curation

## convert assembly to agp
cphasing utils assembly2agp groups.review.assembly -n 8:4 
## or haploid or a homologous group
cphasing utils assembly2agp groups.review.assembly -n 8
## extract contigs from agp 
cphasing agp2fasta groups.review.agp draft.asm.fasta --contigs > contigs.fasta
## extract chromosome-level fasta from agp
cphasing agp2fasta groups.review.agp draft.asm.fasta > groups.review.asm.fasta

Rename#

Rename and orient chromosome according a monoploid reference (or genome of closely related species). More details please check Rename

cphasing rename -r mono.fasta -f draft.asm.fasta -a groups.review.agp -t 20

Note

To reduce the time consumed, we only align the first haplotype (g1) to the monoploid, which the orientation among different haplotypes has already been set to the same in the scaffolding step. If not, you can set —-unphased to align all haplotypes to the monoploid to adjust the orientation.

Heatmap plotting#

Please check the doc: Plot

One command pipeline of C-Phasing#

Pipeline#

Start from a pore-c data:#

Start from multiple Pore-C data:#

Start from a Pore-C table (.porec.gz):#

Start from HiFi-C data#

Start from Hi-C data#

Start from 4DN pairs (pairs.pqs or pairs.gz) file#

Skip some steps#

Perform only specified steps#

Improve performance#

Curation by Juicebox#

Rename#

Heatmap plotting#

Start from a Pore-C table (`.porec.gz`):#

Start from 4DN pairs (`pairs.pqs` or `pairs.gz`) file#