Rule specific to the Twist Solid pipeline that are not defined in hydra¶

bcftools_id_snps¶

Trim .fastq files by removing adapter sequences and other unwanted sequences. Adapter sequences are specified in units.tsv under the adapter column. See further ID-SNPs info.

Rule¶

rule bcftools_id_snps:
    input:
        bam="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam",
        bai="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai",
        bed=config.get("bcftools_id_snps", {}).get("snps_bed", "missing bcftools_id_snps snps_bed file"),
        ref=config.get("reference", {}).get("fasta_rna", "missing reference fasta_rna"),
    output:
        vcf=temp("snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf"),
    wildcard_constraints:
        type="[R]",
    log:
        "snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf.log",
    benchmark:
        repeat(
            "snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf.benchmark.tsv",
            config.get("bcftools_id_snps", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("bcftools_id_snps", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("bcftools_id_snps", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("bcftools_id_snps", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("bcftools_id_snps", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("bcftools_id_snps", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("bcftools_id_snps", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("bcftools_id_snps", {}).get("container", config["default_container"])
    message:
        "{rule}: call id SNPs in the RNA data {output.vcf}"
    shell:
        "(bcftools mpileup "
        "-R {input.bed} "
        "-O u "
        "-f {input.ref} "
        "-d 1000000 "
        "{input.bam} "
        "| bcftools call "
        "--skip-variants indels "
        "-c -O v "
        "-o {output.vcf}) "
        "&> {log}"

input / output files¶

Rule parameters	Key	Value	Description
input	bam	`"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam"`	Bam file from an split-read aware aligner like STAR
	bai	`"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai"`	Bam index file
	bed	`config.get("bcftools_id_snps", {}).get("snps_bed", "missing bcftools_id_snps snps_bed file")`	Bed file with regions or positions to pileup
	ref	`config.get("reference", {}).get("fasta_rna", "missing reference fasta_rna")`	Reference genome
output	vcf	`"snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf"`	Vcf with pileup information for all positions in input bed file

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated for msisensor_pro
container	string	name or path to docker/singularity container
snps_bed	string	path to bed file the id SNPs

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

call_small_cnv_amplifications¶

The CNVkit and GATK CNV caller often miss small amplifications of four exons or smaller. This rule analyses genes of interest and reports small amplifications in these genes when predefined criteria are met. The caller is based on two rolling window averaged over the region and finds the largest difference in log2ratio between these two windows. If the difference is large enough it is reported. See further call small cnv amplifications info.

Rule¶

rule call_small_cnv_amplifications:
    input:
        cnv_data="cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv",
        regions_file=config.get("call_small_cnv_amplifications", {}).get("regions_file", ""),
    output:
        amplifications=temp("cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv"),
    params:
        window_size=config.get("call_small_cnv_amplifications", {}).get("window_size", 4),
        region_max_size=config.get("call_small_cnv_amplifications", {}).get("region_max_size", 30),
        min_nr_stdev_diff=config.get("call_small_cnv_amplifications", {}).get("min_nr_stdev_diff", 8),
        min_log_odds_diff=config.get("call_small_cnv_amplifications", {}).get("min_log_odds_diff", 0.4),
    log:
        "cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv.log",
    benchmark:
        repeat(
            "cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv.benchmark.tsv",
            config.get("call_small_cnv_amplifications", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("call_small_cnv_amplifications", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("call_small_cnv_amplifications", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("call_small_cnv_amplifications", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("call_small_cnv_amplifications", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("call_small_cnv_amplifications", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("call_small_cnv_amplifications", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("call_small_cnv_amplifications", {}).get("container", config["default_container"])
    message:
        "{rule}: call small amplifications in cnv data into {output.amplifications}"
    script:
        "../scripts/call_small_cnv_amplifications.py"

input / output files¶

Rule parameters	Key	Value	Description
input	cnv_data	`"cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv"`	Copy number region data from GATK CNV
input	regions_file	`config.get("call_small_cnv_amplifications", {}).get("regions_file", "")`	File with genes, gene regions with surrounding cnv-probes and the actual gene region
output	amplifications	`"cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv"`	Text file with the called amplifications

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
container	string	name or path to docker/singularity container
regions_file	string	File with genes, gene regions with surrounding cnv-probes and the actual gene region. Defines both the genomic position of the genes and their surrounding regions that should be analysed for this gene. The size of the analysis region is dependent on the density of the backbone but should at least 20 backbone SNPs. 20 SNPs is used in Pipeline.
window_size	integer	Number of data points included in the sliding window
region_max_size	integer	Max number of data points of deletion to be reported
min_nr_stdev_diff	number	Min number of standard deviations difference between the deletion and surrounding data points in region
min_log_odds_diff	number	Min log2ratio difference between the deletion and the average of surrounding data points in region

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

call_small_cnv_deletions¶

The CNVkit and GATK CNV caller often miss small deletions of four exons or smaller. This rule analyses genes of interest and reports small deletions in these genes when predefined criteria are met. The caller is based on two rolling window averaged over the region and finds the largest difference in log2ratio between these two windows. If the difference is large enough it is reported. See further call small cnv deletions info.

Rule¶

rule call_small_cnv_deletions:
    input:
        cnv_data="cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv",
        regions_file=config.get("call_small_cnv_deletions", {}).get("regions_file", ""),
    output:
        deletions=temp("cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv"),
    params:
        blacklist=config.get("call_small_cnv_deletions", {}).get("blacklist", ""),
        min_nr_stdev_diff=config.get("call_small_cnv_deletions", {}).get("min_nr_stdev_diff", 2.5),
        min_log_odds_diff=config.get("call_small_cnv_deletions", {}).get("min_log_odds_diff", 0.3),
        region_max_size=config.get("call_small_cnv_deletions", {}).get("region_max_size", 30),
        window_size=config.get("call_small_cnv_deletions", {}).get("window_size", 4),
    log:
        "cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv.log",
    benchmark:
        repeat(
            "cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv.benchmark.tsv",
            config.get("call_small_cnv_deletions", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("call_small_cnv_deletions", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("call_small_cnv_deletions", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("call_small_cnv_deletions", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("call_small_cnv_deletions", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("call_small_cnv_deletions", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("call_small_cnv_deletions", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("call_small_cnv_deletions", {}).get("container", config["default_container"])
    message:
        "{rule}: call small deletions in cnv data into {output.deletions}"
    script:
        "../scripts/call_small_cnv_deletions.py"

input / output files¶

Rule parameters	Key	Value	Description
input	cnv_data	`"cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv"`	Copy number region data from GATK CNV
input	regions_file	`config.get("call_small_cnv_deletions", {}).get("regions_file", "")`	File with genes, gene regions with surrounding cnv-probes and the actual gene region
output	deletions	`"cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv"`	Text file with the called deletions

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
container	string	name or path to docker/singularity container
regions_file	string	File with genes, gene regions with surrounding cnv-probes and the actual gene region. Defines both the genomic position of the genes and their surrounding regions that should be analysed for this gene. The size of the analysis region is dependent on the density of the backbone but should at least 20 backbone SNPs. 40 SNPs is used in Pipeline.
blacklist	string	File with gene regions that should be filtered (Format is header + gene\tchr\tpos1\tpos2)
min_nr_stdev_diff	number	Min number of standard deviations difference between the deletion and surrounding data points in region
min_log_odds_diff	number	Min log2ratio difference between the deletion and the average of surrounding data points in region
region_max_size	integer	Max number of data points of deletion to be reported
window_size	integer	Number of data points included in the sliding window

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

cnv_tsv_report¶

Collect all CNV calls into an excel friendly text file. Adds potential 1p19q calls. Also add a FP flag for variants called by CNVkit that in GATK CNV does not have any signal in that region. See further CNV tsv report info.

Rule¶

rule cnv_tsv_report:
    input:
        amplifications="cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv",
        chrom_arm_size=config.get("cnv_tsv_report", {}).get("chrom_arm_size", ""),
        deletions="cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv",
        gatk_cnr="cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv",
        org_vcfs=[
            "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.fp_tag.vcf",
            "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.fp_tag.vcf",
        ],
        tc_file=get_tc_file,
        vcfs=[
            "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.filter.cnv_hard_filter_amp.fp_tag.vcf",
            "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.filter.cnv_hard_filter_loh.fp_tag.vcf",
        ],
    output:
        tsv=temp("cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv"),
        tsv_additional_only=temp("cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_additional_variants_only.tsv"),
        tsv_chrom_arms=temp("cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_chromosome_arms.tsv"),
        vcf_del="cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.filter.cnv_hard_filter_loh.fp_tag.annotate_fp.vcf",
    params:
        amp_chr_arm_cn_limit=config.get("cnv_tsv_report", {}).get("amp_chr_arm_cn_limit", ""),
        baseline_fraction_limit=config.get("cnv_tsv_report", {}).get("baseline_fraction_limit", ""),
        call_small_amplifications_cn_limit=config.get("cnv_tsv_report", {}).get("amp_cn_limit", ""),
        chr_arm_fraction=config.get("cnv_tsv_report", {}).get("chr_arm_fraction", ""),
        del_chr_arm_cn_limit=config.get("cnv_tsv_report", {}).get("del_chr_arm_cn_limit", ""),
        del_1p19q_cn_limit=config.get("cnv_tsv_report", {}).get("del_1p19q_cn_limit", ""),
        del_1p19q_chr_arm_fraction=config.get("cnv_tsv_report", {}).get("del_1p19q_chr_arm_fraction", ""),
        max_cnv_fp_size=config.get("cnv_tsv_report", {}).get("max_cnv_fp_size", ""),
        normal_cn_lower_limit=config.get("cnv_tsv_report", {}).get("normal_cn_lower_limit", ""),
        normal_cn_upper_limit=config.get("cnv_tsv_report", {}).get("normal_cn_upper_limit", ""),
        normal_baf_lower_limit=config.get("cnv_tsv_report", {}).get("normal_baf_lower_limit", ""),
        normal_baf_upper_limit=config.get("cnv_tsv_report", {}).get("normal_baf_upper_limit", ""),
        polyploidy_fraction_limit=config.get("cnv_tsv_report", {}).get("polyploidy_fraction_limit", ""),
        tc=get_tc,
    log:
        "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv.log",
    benchmark:
        repeat(
            "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv.benchmark.tsv",
            config.get("cnv_tsv_report", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("cnv_tsv_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("cnv_tsv_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("cnv_tsv_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("cnv_tsv_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("cnv_tsv_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("cnv_tsv_report", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("cnv_tsv_report", {}).get("container", config["default_container"])
    message:
        "{rule}: Convert cnv vcf to a tsv file: {output.tsv}"
    script:
        "../scripts/cnv_report.py"

input / output files¶

Rule parameters	Key	Value	Description
input	amplifications	`"cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv"`	File with small CNV amplifications generated by call_small_cnv_amplifications
	deletions	`"cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv"`	File with small CNV amplifications generated by call_small_cnv_deletions
	org_vcfs	`[ "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.fp_tag.vcf", "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.fp_tag.vcf", ]`	Unfiltered vcf-files with merged calls from CNVkit and GATK CNV
	tc_file	`get_tc_file`	Tumor content of sample
	vcfs	`[ "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.filter.cnv_hard_filter_amp.fp_tag.vcf", "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.filter.cnv_hard_filter_loh.fp_tag.vcf", ]`	Filtered vcf-files with merged calls from CNVkit and GATK CNV
output	tsv	`"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv"`	A CNV report in excel friendly format
	tsv_additional_only	`"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_additional_variants_only.tsv"`	File where only additional CNVs are reported. Used for extra table in CNV-html report.
	tsv_chrom_arms	`"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_chromosome_arms.tsv"`	File where large CNVs are reported. Used for extra table in CNV-html report.

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
amp_chr_arm_cn_limit	number	Lower limit of copy number to be included in calculation of chr_arm_fraction for duplications
baseline_fraction_limit	number	Lower limit of fraction with neutral copy segments and neutral BAF values. A warning of potentially incorrect baseline is reported otherwise.
container	string	name or path to docker/singularity container
call_small_amplifications_cn_limit	number	Upper limit of copy number to be reported as a small amplification.
chr_arm_fraction	number	Fraction of the chromosome arms that needs to be deleted
del_1p19q_cn_limit	number	Upper limit of copy number to be included in calculation of del_1p19q_chr_arm_fraction
del_1p19q_chr_arm_fraction	number	Fraction of the chromosome arms that needs to be deleted
del_chr_arm_cn_limit	number	Upper limit of copy number to be included in calculation of chr_arm_fraction for deletions
max_cnv_fp_size	number	Max size of cnvkit cnv to be tested for false positive
normal_cn_lower_limit	number	Lower limit of copy number to be counted as a normal segment
normal_cn_upper_limit	number	Upper limit of copy number to be counted as a normal segment
normal_baf_lower_limit	number	Lower limit of BAF in segment to be counted as a normal segment
normal_baf_upper_limit	number	Upper limit of BAF in segment to be counted as a normal segment
polyploidy_fraction_limit	number	Upper limit of fraction with aberrant BAF / copy number segment combination. A warning of potential polyploidy is reported otherwise.

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

estimate_ctdna_fraction¶

Estimate ctDNA fraction based on CNV and SNP information using an in-house script. For CNV based estimation the VAF-values of germline SNPs (the BAF-plot) for each segment is used to make an density calculation. If two peaks are found on opposite sides of 50% allele frequency the separation of the peak is used to calculate the ctDNA fraction. The highest ctDNA fraction is reported. If no CNV segments have peak separation 0% is reported. For SNV based estimation the ctDNA is reported as the highest variant allele frequency SNV times two.

Rule¶

rule estimate_ctdna_fraction:
    input:
        germline_vcf="snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz",
        germline_vcf_tabix="snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz.tbi",
        segments="cnv_sv/jumble_vcf/{sample}_{type}.pathology_purecn.vcf",
        vcf="snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.artifact_annotated.hotspot_annotated.background_annotated.include.exon.filter.snv_hard_filter_umi.codon_snvs.sorted.vep_annotated.qci.vcf",
    output:
        ctDNA_fraction=temp("cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction.tsv"),
        ctDNA_fraction_info=temp("cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction_info.tsv"),
    params:
        gnomAD_AF_limit=config.get("estimate_ctdna_fraction", {}).get("gnomAD_AF_limit", 0.00001),
        max_somatic_af=config.get("estimate_ctdna_fraction", {}).get("max_somatic_af", 0.4),
        min_germline_af=config.get("estimate_ctdna_fraction", {}).get("min_germline_af", 0.1),
        min_nr_SNPs_per_segment=config.get("estimate_ctdna_fraction", {}).get("min_nr_SNPs_per_segment", 35),
        min_segment_length=config.get("estimate_ctdna_fraction", {}).get("min_segment_length", 10000000),
        problematic_regions_beds=config.get("estimate_ctdna_fraction", {}).get("problematic_regions_beds", []),
        vaf_baseline=config.get("estimate_ctdna_fraction", {}).get("vaf_baseline", 0.48),
    log:
        "twist_solid/estimate_ctdna_fraction/{sample}_{type}.ctDNA_tc.tsv.log",
    benchmark:
        repeat(
            "twist_solid/estimate_ctdna_fraction/{sample}_{type}.ctDNA_tc.tsv.benchmark.tsv",
            config.get("estimate_ctdna_fraction", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("estimate_ctdna_fraction", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("estimate_ctdna_fraction", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("estimate_ctdna_fraction", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("estimate_ctdna_fraction", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("estimate_ctdna_fraction", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("estimate_ctdna_fraction", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("estimate_ctdna_fraction", {}).get("container", config["default_container"])
    message:
        "{rule}: estimate ctdna fraction based on CNV and SNV data into {output.ctDNA_fraction}"
    script:
        "../scripts/estimate_ctDNA_fraction.py"

input / output files¶

Rule parameters	Key	Value	Description
input	germline_vcf	`"snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz"`	Germline vcf only. Same file as used in BAF plots of the CNV html report
	segments	`"cnv_sv/jumble_vcf/{sample}_{type}.pathology_purecn.vcf"`	CNV segments (from Jumble)
	vcf	`"snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.artifact_annotated.hotspot_annotated.background_annotated.include.exon.filter.snv_hard_filter_umi.codon_snvs.sorted.vep_annotated.qci.vcf"`	Small variant vcf with vep annotations
output	ctDNA_fraction	`"cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction.tsv"`	Estimated ctDNA fraction values based on CNV and SNV data
output	ctDNA_fraction_info	`"cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction_info.tsv"`	CNV segment info for regions with deletion signals as well as variants passing all filters

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
gnomAD_AF_limit	number	additional filtering based on existence in GnomAD (removes germline)
max_somatic_af	number	additional filtering based on maximal allowed allele frequency (removes germline)
min_germline_af	number	minimial germline allele frequency to be included in density calculations (removes somatic)
min_nr_SNPs_per_segment	integer	minimal number of germline SNPs in segments to be used for ctDNA estimation
min_segment_length	integer	minimal length of segments to be used for ctDNA estimation
problematic_regions_beds	array	list of bedfiles
vaf_baseline	number	Sets the VAF-baseline. In theory BAF-baseline is 0.5 (50% for each SNP), however in practice this is slightly shifted downwards due to reference bias in alignment

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

exon_skipping¶

Calls exon skipping events in RNA data. Only reports MET exon 14 skipping and the EGFRvIII variant. See further exon skipping info.

Rule¶

rule exon_skipping:
    input:
        bed=config.get("exon_skipping", {}).get("design_bed", ""),
        junction="alignment/star/{sample}_{type}.SJ.out.tab",
    output:
        result=temp("fusions/exon_skipping/{sample}_{type}.results.tsv"),
    log:
        "fusions/exon_skipping/{sample}_{type}.results.tsv.log",
    benchmark:
        repeat(
            "fusions/exon_skipping/{sample}_{type}.results.tsv.benchmark.tsv",
            config.get("exon_skipping", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("exon_skipping", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("exon_skipping", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("exon_skipping", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("exon_skipping", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("exon_skipping", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("exon_skipping", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("exon_skipping", {}).get("container", config["default_container"])
    message:
        "{rule}: Find intergenic fusions and report them in {output.result}"
    script:
        "../scripts/exon_skipping.py"

input / output files¶

Rule parameters	Key	Value	Description
input	bed	`config.get("exon_skipping", {}).get("design_bed", "")`	Bed file used to extract exonic genomic regions
input	junction	`"alignment/star/{sample}_{type}.SJ.out.tab"`	Junction file from STAR based on split reads
output	result	`"fusions/exon_skipping/{sample}_{type}.results.tsv"`	Report with info regarding MET exon 14 and EGFR vIII skipping

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
design_bed	string	path to design bed file

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

fix_vcf_ad_for_qci¶

Produces a QCI compatible vcf file with SNV and INDEL calls were the AD field has been corrected to so that the numbers corresponds with the AF field. QCI uses the AD field to calculate allele frequency. See further QCI fix info.

Rule¶

rule fix_vcf_ad_for_qci:
    input:
        vcf="{file}.vcf",
    output:
        vcf="{file}.qci.vcf",
    log:
        "{file}.qci.vcf.log",
    benchmark:
        repeat(
            "{file}.qci.vcf.benchmark.tsv",
            config.get("fix_vcf_ad_for_qci", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("fix_vcf_ad_for_qci", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("fix_vcf_ad_for_qci", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("fix_vcf_ad_for_qci", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("fix_vcf_ad_for_qci", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("fix_vcf_ad_for_qci", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("fix_vcf_ad_for_qci", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("fix_vcf_ad_for_qci", {}).get("container", config["default_container"])
    message:
        "{rule}: Correct AD so the correct AF is shown in QCI for vcf {input.vcf}"
    script:
        "../scripts/fix_vcf_ad_for_qci.py"

input / output files¶

Rule parameters	Key	Value	Description
input	vcf	`"{file}.vcf"`	Vcf file with SNV and INDEL calls
output	vcf	`"{file}.qci.vcf"`	QCI compatible vcf file with SNV and INDEL calls were the AD field has been corrected

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

hotspot_report¶

A excel friendly text file containing information of all hotspot postions as well as all called variants. For all hotspot positions the coverage and alternative allele information is reported as well as additional annotations. The columns reported are configured by the config file hotspot_report.yaml. See further Hotspot report info.

Rule¶

rule hotspot_report:
    input:
        hotspots=lambda wildcards: config["hotspot_report"]["hotspot_mutations"][wildcards.tag],
        vcf=lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz"),
        vcf_index=lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz.tbi"),
        vcf_file_wo_pick=lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated_wo_pick.vcf.gz"),
        vcf_file_wo_pick_index=lambda wildcards: "%s%s"
        % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated_wo_pick.vcf.gz"),
        gvcf="qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz",
        gvcf_index="qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz.tbi",
    output:
        report=temp("qc/hotspot_report/{sample}_{type}.{tag}.output.tsv"),
    params:
        levels=config.get("hotspot_report", {}).get("levels", []),
        sample_name=lambda wildcards: "%s_%s" % (wildcards.sample, wildcards.type),
        report_config=config.get("hotspot_report", {})["report_config"],
        chr_translation_file=config.get("hotspot_report", {})["chr_translation_file"],
        extra=config.get("hotspot_report", {}).get("extra", ""),
    log:
        "qc/hotspot_report/{sample}_{type}.{tag}.output.tsv.log",
    benchmark:
        repeat(
            "qc/hotspot_report/{sample}_{type}.{tag}.output.benchmark.tsv",
            config.get("hotspot_report", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("hotspot_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("hotspot_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("hotspot_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("hotspot_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("hotspot_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("hotspot_report", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("hotspot_report", {}).get("container", config["default_container"])
    message:
        "{rule}: Make a coverage and mutations report: {output.report}"
    script:
        "../scripts/hotspot_report.py"

input / output files¶

Rule parameters	Key	Value	Description
input	vcf	`lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz")`	Vcf file with filtered SNV and INDEL calls
	vcf_index	`lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz.tbi")`	Vcf tabix index file
	gvcf	`"qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz"`	Genome vcf file coverage and allele information on all positions in the panel
	gvcf_index	`"qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz.tbi"`	Genome vcf tabix index file
output	report	`"qc/hotspot_report/{sample}_{type}.{tag}.output.tsv"`	Excel friendly text report with hotspot position qc information

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
hotspot_mutations	object	List of files with hotspot positions and annotations. The names of the list items are found in the file names as {tag}.
report_config	string	file defining which fields should be extracted from the annotation
chr_translation_file	string	file used to translate chr to NC id. The format is a tab separate file with column chr and NC
levels	array	List of configurations. 1. depth above this value 2. depth level status, ex ok or low 3. depth status; ok, not analyzable Example: - [200, "ok", "yes"]
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

hotspot_report_aa_translate¶

Take the output from hotspot report (coverage_and mutations) and translate the amino acid three letter code into one letter code using an in-house script.

Rule¶

rule hotspot_report_aa_translate:
    input:
        report="qc/hotspot_report/{sample}_{type}.{tag}.output.tsv",
    output:
        report=temp("qc/hotspot_report/{sample}_{type}.{tag}.aa_translated.tsv"),
    log:
        "twist_solid/hotspot_report_aa_translate/{sample}_{type}.{tag}.aa_translated.tsv.log",
    benchmark:
        repeat(
            "twist_solid/hotspot_report_aa_translate/{sample}_{type}.{tag}.aa_translated.tsv.benchmark.tsv",
            config.get("hotspot_report_aa_translate", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("hotspot_report_aa_translate", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("hotspot_report_aa_translate", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("hotspot_report_aa_translate", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("hotspot_report_aa_translate", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("hotspot_report_aa_translate", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("hotspot_report_aa_translate", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("hotspot_report_aa_translate", {}).get("container", config["default_container"])
    message:
        "{rule}: Translate aa three letter codes into one letter code in {input.report}"
    script:
        "../scripts/aa_translate.py"

input / output files¶

Rule parameters	Key	Value	Description
input	report	`"qc/hotspot_report/{sample}_{type}.{tag}.output.tsv"`	coverage and mutations file from hotspot report
output	report	`"qc/hotspot_report/{sample}_{type}.{tag}.aa_translated.tsv"`	coverage and mutations file from hotspot report with aa change in one letter code

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

jumble_gis_score¶

Extract the predicted Genomic Instability Score (GIS) from Jumble's output for the current sample's tumor content (TC).

Rule¶

rule jumble_gis_score:
    input:
        gis_csv="cnv_sv/jumble_run/{sample}_{type}/{sample}_{type}.jumble_gis.csv",
    output:
        gis_score=temp("cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.txt"),
    params:
        tc=get_tc,
    log:
        "cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.log",
    benchmark:
        repeat(
            "cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.benchmark.tsv",
            config.get("jumble_gis_score", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("jumble_gis_score", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("jumble_gis_score", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("jumble_gis_score", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("jumble_gis_score", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("jumble_gis_score", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("jumble_gis_score", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("jumble_gis_score", {}).get("container", config["default_container"])
    message:
        "{rule}: Extract predicted GIS score for {wildcards.sample}_{wildcards.type} at {params.tc} TC"
    script:
        "../scripts/extract_gis_score.py"

input / output files¶

Rule parameters	Key	Value	Description
input	gis_csv	`"cnv_sv/jumble_run/{sample}_{type}/{sample}_{type}.jumble_gis.csv"`	GIS csv file from jumble_run
output	gis_score	`"cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.txt"`	Extracted predicted GIS score for current TC

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
tc	number	tumor content

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

house_keeping_gene_coverage¶

List the average coverage of the housekeeping genes GAPDH, GUSB, OAZ1, and POLR2A. See further house keeping gene coverage info.

Rule¶

rule house_keeping_gene_coverage:
    input:
        bam="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam",
        bai="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai",
        bed=config.get("reference", {}).get("design_bed_rna", ""),
    output:
        result=temp("qc/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv"),
    log:
        "fusions/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv.log",
    benchmark:
        repeat(
            "fusions/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv.benchmark.tsv",
            config.get("house_keeping_gene_coverage", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("house_keeping_gene_coverage", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("house_keeping_gene_coverage", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("house_keeping_gene_coverage", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("house_keeping_gene_coverage", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("house_keeping_gene_coverage", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("house_keeping_gene_coverage", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("house_keeping_gene_coverage", {}).get("container", config["default_container"])
    message:
        "{rule}: Find intergenic fusions and report them in {output.result}"
    script:
        "../scripts/house_keeping_gene_coverage.py"

input / output files¶

Rule parameters	Key	Value	Description
input	bam	`"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam"`	Bam file
	bai	`"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai"`	Bam index file
	bed	`config.get("reference", {}).get("design_bed_rna", "")`	Bed file with gene regions used to obtain genomic coordinates
output	result	`"qc/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv"`	Excel frienly text report with gene coverage information

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

report_fusions¶

Make a combined report of filtered fusion calls from all RNA fusion callers. Applies caller specific thresholds based on read support and fusion type. Annotates breakpoints to genes and exons when available. See further RNA fusions report info.

Rule¶

rule report_fusions:
    input:
        arriba="fusions/arriba/{sample}_{type}.fusions.tsv",
        bam="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam",
        bai="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai",
        bed=config.get("reference", {}).get("design_bed_rna", ""),
        bed_extra_annotation=config.get("report_fusions", {}).get("annotation_bed", ""),
        dedup_coverage="qc/mosdepth/{sample}_{type}.regions.bed.gz",
        fusioncatcher="fusions/fusioncatcher/{sample}_{type}/final-list_candidate-fusion-genes.hg19.txt",
        star_fusion="fusions/star_fusion/{sample}_{type}/star-fusion.fusion_predictions.abridged.coding_effect.tsv",
    output:
        fusions=temp("fusions/report_fusions/{sample}_{type}.fusion_report.tsv"),
    params:
        fp_fusions=config.get("report_fusions", {}).get("fp_fusions", ""),
        fusioncatcher_flag_low_support=config.get("report_fusions", {}).get("fusioncatcher_flag_low_support", 15),
        fusioncatcher_low_support=config.get("report_fusions", {}).get("fusioncatcher_low_support", 3),
        fusioncatcher_low_support_inframe=config.get("report_fusions", {}).get("fusioncatcher_low_support_inframe", 6),
        star_fusion_flag_low_support=config.get("report_fusions", {}).get("star_fusion_flag_low_support", 15),
        star_fusion_low_support=config.get("report_fusions", {}).get("star_fusion_low_support", 2),
        star_fusion_low_support_inframe=config.get("report_fusions", {}).get("star_fusion_low_support_inframe", 6),
    log:
        "qc/report_fusions/{sample}_{type}.fusion_report.tsv.log",
    threads: config.get("report_fusions", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("report_fusions", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("report_fusions", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("report_fusions", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("report_fusions", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("report_fusions", {}).get("time", config["default_resources"]["time"]),
    benchmark:
        repeat(
            "qc/report_fusions/{sample}_{type}.fusion_report.tsv.benchmark.tsv",
            config.get("report_fusions", {}).get("benchmark_repeats", 1),
        )
    container:
        config.get("report_fusions", {}).get("container", config["default_container"])
    message:
        "{rule}: Collect and filter fusions and create report: {output.fusions}"
    script:
        "../scripts/report_fusions.py"

input / output files¶

Rule parameters	Key	Value	Description
input	arriba	`"fusions/arriba/{sample}_{type}.fusions.tsv"`	Called fusions by Arriba
	bam	`"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam"`	Bam file
	bai	`"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai"`	Bam index file
	bed	`config.get("reference", {}).get("design_bed_rna", "")`	Bed file with regions of the design
	bed_extra_annotation	`config.get("report_fusions", {}).get("annotation_bed", "")`	Bed file with gene annotations of regions outide of the design
	fusioncatcher	`"fusions/fusioncatcher/{sample}_{type}/final-list_candidate-fusion-genes.hg19.txt"`	Called fusions by FusionCatcher
	star_fusion	`"fusions/star_fusion/{sample}_{type}/star-fusion.fusion_predictions.abridged.coding_effect.tsv"`	Called fusions by StarFusion
output	fusions	`"fusions/report_fusions/{sample}_{type}.fusion_report.tsv"`	Excel friendly text report with filtered fusion calls from respective caller

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
annotation_bed	string	extra bed file for annotation of fusion partner gene exons not in design
fusioncather_flag_low_support	integer	lower limit of supporting reads to flag for low support for FusionCatcher
fusioncather_low_support	integer	lower limit of supporting reads to filter fusion for FusionCatcher
fusioncather_low_support_fp_genes	integer	lower limit of supporting reads to filter fusion in common fp genes for FusionCatcher
fusioncather_low_support_inframe	integer	lower limit of supporting reads to filter fusion that are inframe for FusionCatcher
star_fusion_flag_low_support	integer	lower limit of supporting reads to filter fusion for StarFusion
star_fusion_low_support	integer	lower limit of supporting reads to filter fusion for StarFusion
star_fusion_low_support_fp_genes	integer	lower limit of supporting reads to flag filter fusion in common fp genes for StarFusion
star_fusion_low_support_inframe	integer	lower limit of supporting reads to flag filter fusion that are inframe for StarFusion

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

report_gene_fuse¶

Make a report of filtered fusion calls from the DNA fusion caller Gene Fuse. Applies configurable thresholds for supporting reads and flags noisy genes as well as filters artifact genes. See further DNA fusions report info.

Rule¶

rule report_gene_fuse:
    input:
        fusions="fusions/gene_fuse/{sample}_{type}_gene_fuse_fusions.txt",
    output:
        report=temp("fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv"),
    params:
        filter_fusions=config.get("report_gene_fuse", {}).get("filter_fusions", ""),
        min_unique_reads=config.get("report_gene_fuse", {}).get("min_unique_reads", 6),
    log:
        "fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv.log",
    threads: config.get("report_gene_fuse", {}).get("threads", config["default_resources"]["threads"])
    resources:
        threads=config.get("report_gene_fuse", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("report_gene_fuse", {}).get("time", config["default_resources"]["time"]),
        mem_mb=config.get("report_gene_fuse", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("report_gene_fuse", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("report_gene_fuse", {}).get("partition", config["default_resources"]["partition"]),
    benchmark:
        repeat(
            "fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv.benchmark.tsv",
            config.get("report_gene_fuse", {}).get("benchmark_repeats", 1),
        )
    container:
        config.get("report_gene_fuse", {}).get("container", config["default_container"])
    message:
        "{rule}: Collect and filter gene fuse dna fusions and create report: {output.report}"
    script:
        "../scripts/report_gene_fuse.py"

input / output files¶

Rule parameters	Key	Value	Description
input	fusions	`"fusions/gene_fuse/{sample}_{type}_gene_fuse_fusions.txt"`	Called fusions by GeneFuse
output	report	`"fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv"`	Excel friendly text report with filtered fusion calls from GeneFuse

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
filter_fusions	string	file specifying fusions that should be filtered completely (value 0 in column 2) or have higher limit (value >0 in column 2)
min_unique_reads	integer	lower limit of uniquely supporting reads to report fusion

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

purecn_modify_vcf¶

Increases the MQB (mean base quality) value by 5 as the qualities are so bad for our samples.

Rule¶

rule purecn_modify_vcf:
    input:
        vcf="snv_indels/gatk_mutect2/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.vcf",
    output:
        vcf="cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf",
    params:
        extra=config.get("purecn_modify_vcf", {}).get("extra", ""),
    log:
        "cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf.log",
    benchmark:
        repeat(
            "cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf.benchmark.tsv",
            config.get("purecn_modify_vcf", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("purecn_modify_vcf", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("purecn_modify_vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("purecn_modify_vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("purecn_modify_vcf", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("purecn_modify_vcf", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("purecn_modify_vcf", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("purecn_modify_vcf", {}).get("container", config["default_container"])
    message:
        "{rule}: modify mbq in {input.vcf}"
    shell:
        # new lines are used to tell the linter that it is not an absolut path
        '(sed -r \'s/(.*)(\\MBQ=[0-9]+,)([0-9]+)(.*)/echo "\\1\\2$((\\3+5))\\4"/'
        "ge' {input.vcf} | "
        'sed -r \'s/(.*)(\\MBQ=)([0-9]+)(.*)/echo "\\1\\2$((\\3+5))\\4"/'
        "ge' > "
        "{output.vcf}) &> {log}"

input / output files¶

Rule parameters	Key	Value	Description
input	vcf	`"snv_indels/gatk_mutect2/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.vcf"`	hard filtered vcf with variants from mutect2 for purecn
output	vcf	`"cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf"`	hard filtered vcf with corrected MBQ values for purecn

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

sample_mixup_check¶

Compare ID-SNPs in the RNA samples to the DNA samples in the same analysis and report sample similarities to be able to discern sample mixups

Rule¶

rule sample_mixup_check:
    input:
        id_snp_vcf_dna=[
            "snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type)
            for sample in samples.index
            for unit_type in get_unit_types(units, sample)
            if unit_type == "T"
        ],
        id_snp_vcf_rna=[
            "snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type)
            for sample in samples.index
            for unit_type in get_unit_types(units, sample)
            if unit_type == "R"
        ],
    output:
        mixup_report="qc/sample_mixup_check/sample_mixup_check.tsv",
    params:
        extra=config.get("sample_mixup_check", {}).get("extra", ""),
        match_cutoff=config.get("sample_mixup_check", {}).get("match_cutoff", "0.7"),
    log:
        "twist_solid/sample_mixup_check/sample_mixup_check.tsv.log",
    benchmark:
        repeat(
            "twist_solid/sample_mixup_check/sample_mixup_check.tsv.benchmark.tsv",
            config.get("sample_mixup_check", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("sample_mixup_check", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("sample_mixup_check", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("sample_mixup_check", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("sample_mixup_check", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("sample_mixup_check", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("sample_mixup_check", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("sample_mixup_check", {}).get("container", config["default_container"])
    message:
        "{rule}: find most similar sample in rna ID-snps compared to dna ID-snps"
    script:
        "../scripts/sample_mixup_check.py"

input / output files¶

Rule parameters	Key	Value	Description
input	id_snp_vcf_dna	`[ "snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type) for sample in samples.index for unit_type in get_unit_types(units, sample) if unit_type == "T" ]`	list of vcf files with ID-SNP positions called by bcftools for the dna samples
input	id_snp_vcf_rna	`[ "snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type) for sample in samples.index for unit_type in get_unit_types(units, sample) if unit_type == "R" ]`	list of vcf files with ID-SNP positions called by bcftools for the rna samples
output	mixup_report	`"qc/sample_mixup_check/sample_mixup_check.tsv"`	text report where the most similar RNA and DNA sample pairs in one analysis are reported together with the percent similarity

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded
match_cutoff	number	minimum fraction of matching SNP calls for an RNA and DNA matched sample

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

somalier_best_match_report¶

Takes the somalier relate pairs output and finds the best match for each sample (DNA and RNA). The script handles the bidirectional nature of the pairs file to ensure every sample is reported with its top relatedness match.

Rule¶

rule somalier_best_match_report:
    input:
        pairs="qc/somalier_ungrouped/somalier_relate.pairs.tsv",
    output:
        report=temp("qc/somalier_ungrouped/somalier_best_match.tsv"),
    params:
        extra=config.get("somalier_best_match_report", {}).get("extra", ""),
        match_cutoff=config.get("somalier_best_match_report", {}).get("match_cutoff", 0.7),
    log:
        "qc/somalier_ungrouped/somalier_best_match.tsv.log",
    benchmark:
        repeat(
            "qc/somalier_ungrouped/somalier_best_match.tsv.benchmark.tsv",
            config.get("somalier_best_match_report", {}).get("benchmark_repeats", 1),
        )
    container:
        config.get("somalier_best_match_report", {}).get("container", config["default_container"])
    threads: config.get("somalier_best_match_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("somalier_best_match_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("somalier_best_match_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("somalier_best_match_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("somalier_best_match_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("somalier_best_match_report", {}).get("time", config["default_resources"]["time"]),
    message:
        "{rule}: generating best match report from somalier pairs"
    script:
        "../scripts/somalier_best_match.py"

input / output files¶

Rule parameters	Key	Value	Description
input	pairs	`"qc/somalier_ungrouped/somalier_relate.pairs.tsv"`	Somalier relate pairs file
output	report	`"qc/somalier_ungrouped/somalier_best_match.tsv"`	Somalier best match report

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded
match_cutoff	number	relatedness cutoff for considering two samples a match (default 0.7)

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

somalier_dna_dna_report¶

Takes the somalier relate pairs output and finds the best DNA-DNA match for each DNA sample. This is used to detect potential sample mixups between DNA samples.

Rule¶

rule somalier_dna_dna_report:
    input:
        pairs="qc/somalier_ungrouped/somalier_relate.pairs.tsv",
    output:
        report=temp("qc/somalier_ungrouped/somalier_dna_dna_match.tsv"),
    params:
        extra=config.get("somalier_dna_dna_report", {}).get("extra", ""),
        match_cutoff=config.get("somalier_dna_dna_report", {}).get("match_cutoff", 0.85),
    log:
        "qc/somalier_ungrouped/somalier_dna_dna_match.tsv.log",
    benchmark:
        repeat(
            "qc/somalier_ungrouped/somalier_dna_dna_match.tsv.benchmark.tsv",
            config.get("somalier_dna_dna_report", {}).get("benchmark_repeats", 1),
        )
    container:
        config.get("somalier_dna_dna_report", {}).get("container", config["default_container"])
    threads: config.get("somalier_dna_dna_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("somalier_dna_dna_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("somalier_dna_dna_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("somalier_dna_dna_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("somalier_dna_dna_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("somalier_dna_dna_report", {}).get("time", config["default_resources"]["time"]),
    message:
        "{rule}: generating DNA-DNA best match report from somalier pairs"
    script:
        "../scripts/somalier_dna_dna_match.py"

input / output files¶

Rule parameters	Key	Value	Description
input	pairs	`"qc/somalier_ungrouped/somalier_relate.pairs.tsv"`	Somalier relate pairs file
output	report	`"qc/somalier_ungrouped/somalier_dna_dna_match.tsv"`	Somalier DNA-DNA best match report

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
match_cutoff	number	relatedness cutoff for match (default 0.85)

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	memory in MB used per cpu
mem_per_cpu	integer	memory used per cpu
partition	string	partition to use on cluster
time	string	max execution time
threads	integer	number of threads to be available

Rule specific to the Twist Solid pipeline that are not defined in hydra¶

bcftools_id_snps¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

call_small_cnv_amplifications¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

call_small_cnv_deletions¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

cnv_tsv_report¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

estimate_ctdna_fraction¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

exon_skipping¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

fix_vcf_ad_for_qci¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

hotspot_report¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

hotspot_report_aa_translate¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

jumble_gis_score¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

house_keeping_gene_coverage¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

report_fusions¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

report_gene_fuse¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

purecn_modify_vcf¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶