Rule specific to the Twist Solid pipeline that are not defined in hydra
Trim .fastq files by removing adapter sequences and other unwanted sequences. Adapter sequences are specified in units.tsv under the adapter column. See further ID-SNPs info.
Rule
rule bcftools_id_snps:
input:
bam="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam",
bai="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai",
bed=config.get("bcftools_id_snps", {}).get("snps_bed", "missing bcftools_id_snps snps_bed file"),
ref=config.get("reference", {}).get("fasta_rna", "missing reference fasta_rna"),
output:
vcf=temp("snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf"),
wildcard_constraints:
type="[R]",
log:
"snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf.log",
benchmark:
repeat(
"snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf.benchmark.tsv",
config.get("bcftools_id_snps", {}).get("benchmark_repeats", 1),
)
threads: config.get("bcftools_id_snps", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("bcftools_id_snps", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("bcftools_id_snps", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("bcftools_id_snps", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("bcftools_id_snps", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("bcftools_id_snps", {}).get("time", config["default_resources"]["time"]),
container:
config.get("bcftools_id_snps", {}).get("container", config["default_container"])
message:
"{rule}: call id SNPs in the RNA data {output.vcf}"
shell:
"(bcftools mpileup "
"-R {input.bed} "
"-O u "
"-f {input.ref} "
"-d 1000000 "
"{input.bam} "
"| bcftools call "
"--skip-variants indels "
"-c -O v "
"-o {output.vcf}) "
"&> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam" |
Bam file from an split-read aware aligner like STAR |
| bai |
"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai" |
Bam index file |
| bed |
config.get("bcftools_id_snps", {}).get("snps_bed", "missing bcftools_id_snps snps_bed file") |
Bed file with regions or positions to pileup |
| ref |
config.get("reference", {}).get("fasta_rna", "missing reference fasta_rna") |
Reference genome |
| output |
vcf |
"snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf" |
Vcf with pileup information for all positions in input bed file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated for msisensor_pro |
| container |
string |
name or path to docker/singularity container |
| snps_bed |
string |
path to bed file the id SNPs |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
call_small_cnv_amplifications
The CNVkit and GATK CNV caller often miss small amplifications of four exons or smaller. This rule analyses genes of interest and reports small amplifications in these genes when predefined criteria are met. The caller is based on two rolling window averaged over the region and finds the largest difference in log2ratio between these two windows. If the difference is large enough it is reported. See further call small cnv amplifications info.
Rule
rule call_small_cnv_amplifications:
input:
cnv_data="cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv",
regions_file=config.get("call_small_cnv_amplifications", {}).get("regions_file", ""),
output:
amplifications=temp("cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv"),
params:
window_size=config.get("call_small_cnv_amplifications", {}).get("window_size", 4),
region_max_size=config.get("call_small_cnv_amplifications", {}).get("region_max_size", 30),
min_nr_stdev_diff=config.get("call_small_cnv_amplifications", {}).get("min_nr_stdev_diff", 8),
min_log_odds_diff=config.get("call_small_cnv_amplifications", {}).get("min_log_odds_diff", 0.4),
log:
"cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv.log",
benchmark:
repeat(
"cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv.benchmark.tsv",
config.get("call_small_cnv_amplifications", {}).get("benchmark_repeats", 1),
)
threads: config.get("call_small_cnv_amplifications", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("call_small_cnv_amplifications", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("call_small_cnv_amplifications", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("call_small_cnv_amplifications", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("call_small_cnv_amplifications", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("call_small_cnv_amplifications", {}).get("time", config["default_resources"]["time"]),
container:
config.get("call_small_cnv_amplifications", {}).get("container", config["default_container"])
message:
"{rule}: call small amplifications in cnv data into {output.amplifications}"
script:
"../scripts/call_small_cnv_amplifications.py"
| Rule parameters |
Key |
Value |
Description |
| input |
cnv_data |
"cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv" |
Copy number region data from GATK CNV |
| regions_file |
config.get("call_small_cnv_amplifications", {}).get("regions_file", "") |
File with genes, gene regions with surrounding cnv-probes and the actual gene region |
| output |
amplifications |
"cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv" |
Text file with the called amplifications |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
name or path to docker/singularity container |
| regions_file |
string |
File with genes, gene regions with surrounding cnv-probes and the actual gene region. Defines both the genomic position of the genes and their surrounding regions that should be analysed for this gene. The size of the analysis region is dependent on the density of the backbone but should at least 20 backbone SNPs. 20 SNPs is used in Pipeline.
|
| window_size |
integer |
Number of data points included in the sliding window |
| region_max_size |
integer |
Max number of data points of deletion to be reported |
| min_nr_stdev_diff |
number |
Min number of standard deviations difference between the deletion and surrounding data points in region |
| min_log_odds_diff |
number |
Min log2ratio difference between the deletion and the average of surrounding data points in region |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
call_small_cnv_deletions
The CNVkit and GATK CNV caller often miss small deletions of four exons or smaller. This rule analyses genes of interest and reports small deletions in these genes when predefined criteria are met. The caller is based on two rolling window averaged over the region and finds the largest difference in log2ratio between these two windows. If the difference is large enough it is reported. See further call small cnv deletions info.
Rule
rule call_small_cnv_deletions:
input:
cnv_data="cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv",
regions_file=config.get("call_small_cnv_deletions", {}).get("regions_file", ""),
output:
deletions=temp("cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv"),
params:
blacklist=config.get("call_small_cnv_deletions", {}).get("blacklist", ""),
min_nr_stdev_diff=config.get("call_small_cnv_deletions", {}).get("min_nr_stdev_diff", 2.5),
min_log_odds_diff=config.get("call_small_cnv_deletions", {}).get("min_log_odds_diff", 0.3),
region_max_size=config.get("call_small_cnv_deletions", {}).get("region_max_size", 30),
window_size=config.get("call_small_cnv_deletions", {}).get("window_size", 4),
log:
"cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv.log",
benchmark:
repeat(
"cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv.benchmark.tsv",
config.get("call_small_cnv_deletions", {}).get("benchmark_repeats", 1),
)
threads: config.get("call_small_cnv_deletions", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("call_small_cnv_deletions", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("call_small_cnv_deletions", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("call_small_cnv_deletions", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("call_small_cnv_deletions", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("call_small_cnv_deletions", {}).get("time", config["default_resources"]["time"]),
container:
config.get("call_small_cnv_deletions", {}).get("container", config["default_container"])
message:
"{rule}: call small deletions in cnv data into {output.deletions}"
script:
"../scripts/call_small_cnv_deletions.py"
| Rule parameters |
Key |
Value |
Description |
| input |
cnv_data |
"cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv" |
Copy number region data from GATK CNV |
| regions_file |
config.get("call_small_cnv_deletions", {}).get("regions_file", "") |
File with genes, gene regions with surrounding cnv-probes and the actual gene region |
| output |
deletions |
"cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv" |
Text file with the called deletions |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
name or path to docker/singularity container |
| regions_file |
string |
File with genes, gene regions with surrounding cnv-probes and the actual gene region. Defines both the genomic position of the genes and their surrounding regions that should be analysed for this gene. The size of the analysis region is dependent on the density of the backbone but should at least 20 backbone SNPs. 40 SNPs is used in Pipeline.
|
| blacklist |
string |
File with gene regions that should be filtered (Format is header + gene\tchr\tpos1\tpos2) |
| min_nr_stdev_diff |
number |
Min number of standard deviations difference between the deletion and surrounding data points in region |
| min_log_odds_diff |
number |
Min log2ratio difference between the deletion and the average of surrounding data points in region |
| region_max_size |
integer |
Max number of data points of deletion to be reported |
| window_size |
integer |
Number of data points included in the sliding window |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
cnv_tsv_report
Collect all CNV calls into an excel friendly text file. Adds potential 1p19q calls. Also add a FP flag for variants called by CNVkit that in GATK CNV does not have any signal in that region. See further CNV tsv report info.
Rule
rule cnv_tsv_report:
input:
amplifications="cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv",
chrom_arm_size=config.get("cnv_tsv_report", {}).get("chrom_arm_size", ""),
deletions="cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv",
gatk_cnr="cnv_sv/gatk_denoise_read_counts/{sample}_{type}.clean.denoisedCR.tsv",
org_vcfs=[
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.fp_tag.vcf",
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.fp_tag.vcf",
],
tc_file=get_tc_file,
vcfs=[
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.filter.cnv_hard_filter_amp.fp_tag.vcf",
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.filter.cnv_hard_filter_loh.fp_tag.vcf",
],
output:
tsv=temp("cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv"),
tsv_additional_only=temp("cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_additional_variants_only.tsv"),
tsv_chrom_arms=temp("cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_chromosome_arms.tsv"),
vcf_del="cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.filter.cnv_hard_filter_loh.fp_tag.annotate_fp.vcf",
params:
amp_chr_arm_cn_limit=config.get("cnv_tsv_report", {}).get("amp_chr_arm_cn_limit", ""),
baseline_fraction_limit=config.get("cnv_tsv_report", {}).get("baseline_fraction_limit", ""),
call_small_amplifications_cn_limit=config.get("cnv_tsv_report", {}).get("amp_cn_limit", ""),
chr_arm_fraction=config.get("cnv_tsv_report", {}).get("chr_arm_fraction", ""),
del_chr_arm_cn_limit=config.get("cnv_tsv_report", {}).get("del_chr_arm_cn_limit", ""),
del_1p19q_cn_limit=config.get("cnv_tsv_report", {}).get("del_1p19q_cn_limit", ""),
del_1p19q_chr_arm_fraction=config.get("cnv_tsv_report", {}).get("del_1p19q_chr_arm_fraction", ""),
max_cnv_fp_size=config.get("cnv_tsv_report", {}).get("max_cnv_fp_size", ""),
normal_cn_lower_limit=config.get("cnv_tsv_report", {}).get("normal_cn_lower_limit", ""),
normal_cn_upper_limit=config.get("cnv_tsv_report", {}).get("normal_cn_upper_limit", ""),
normal_baf_lower_limit=config.get("cnv_tsv_report", {}).get("normal_baf_lower_limit", ""),
normal_baf_upper_limit=config.get("cnv_tsv_report", {}).get("normal_baf_upper_limit", ""),
polyploidy_fraction_limit=config.get("cnv_tsv_report", {}).get("polyploidy_fraction_limit", ""),
tc=get_tc,
log:
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv.log",
benchmark:
repeat(
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv.benchmark.tsv",
config.get("cnv_tsv_report", {}).get("benchmark_repeats", 1),
)
threads: config.get("cnv_tsv_report", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("cnv_tsv_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("cnv_tsv_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("cnv_tsv_report", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("cnv_tsv_report", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("cnv_tsv_report", {}).get("time", config["default_resources"]["time"]),
container:
config.get("cnv_tsv_report", {}).get("container", config["default_container"])
message:
"{rule}: Convert cnv vcf to a tsv file: {output.tsv}"
script:
"../scripts/cnv_report.py"
| Rule parameters |
Key |
Value |
Description |
| input |
amplifications |
"cnv_sv/call_small_cnv_amplifications/{sample}_{type}.amplifications.tsv" |
File with small CNV amplifications generated by call_small_cnv_amplifications |
| deletions |
"cnv_sv/call_small_cnv_deletions/{sample}_{type}.deletions.tsv" |
File with small CNV amplifications generated by call_small_cnv_deletions |
| org_vcfs |
[ "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.fp_tag.vcf", "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.fp_tag.vcf", ] |
Unfiltered vcf-files with merged calls from CNVkit and GATK CNV |
| tc_file |
get_tc_file |
Tumor content of sample |
| vcfs |
[ "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.filter.cnv_hard_filter_amp.fp_tag.vcf", "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.{tag}.filter.cnv_hard_filter_loh.fp_tag.vcf", ] |
Filtered vcf-files with merged calls from CNVkit and GATK CNV |
| output |
tsv |
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_report.tsv" |
A CNV report in excel friendly format |
| tsv_additional_only |
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_additional_variants_only.tsv" |
File where only additional CNVs are reported. Used for extra table in CNV-html report. |
| tsv_chrom_arms |
"cnv_sv/svdb_query/{sample}_{type}.{tc_method}.{tag}.cnv_chromosome_arms.tsv" |
File where large CNVs are reported. Used for extra table in CNV-html report. |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| amp_chr_arm_cn_limit |
number |
Lower limit of copy number to be included in calculation of chr_arm_fraction for duplications |
| baseline_fraction_limit |
number |
Lower limit of fraction with neutral copy segments and neutral BAF values. A warning of potentially incorrect baseline is reported otherwise. |
| container |
string |
name or path to docker/singularity container |
| call_small_amplifications_cn_limit |
number |
Upper limit of copy number to be reported as a small amplification. |
| chr_arm_fraction |
number |
Fraction of the chromosome arms that needs to be deleted |
| del_1p19q_cn_limit |
number |
Upper limit of copy number to be included in calculation of del_1p19q_chr_arm_fraction |
| del_1p19q_chr_arm_fraction |
number |
Fraction of the chromosome arms that needs to be deleted |
| del_chr_arm_cn_limit |
number |
Upper limit of copy number to be included in calculation of chr_arm_fraction for deletions |
| max_cnv_fp_size |
number |
Max size of cnvkit cnv to be tested for false positive |
| normal_cn_lower_limit |
number |
Lower limit of copy number to be counted as a normal segment |
| normal_cn_upper_limit |
number |
Upper limit of copy number to be counted as a normal segment |
| normal_baf_lower_limit |
number |
Lower limit of BAF in segment to be counted as a normal segment |
| normal_baf_upper_limit |
number |
Upper limit of BAF in segment to be counted as a normal segment |
| polyploidy_fraction_limit |
number |
Upper limit of fraction with aberrant BAF / copy number segment combination. A warning of potential polyploidy is reported otherwise. |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
estimate_ctdna_fraction
Estimate ctDNA fraction based on CNV and SNP information using an in-house script. For CNV based estimation the VAF-values of germline SNPs (the BAF-plot) for each segment is used to make an density calculation. If two peaks are found on opposite sides of 50% allele frequency the separation of the peak is used to calculate the ctDNA fraction. The highest ctDNA fraction is reported. If no CNV segments have peak separation 0% is reported. For SNV based estimation the ctDNA is reported as the highest variant allele frequency SNV times two.
Rule
rule estimate_ctdna_fraction:
input:
germline_vcf="snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz",
germline_vcf_tabix="snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz.tbi",
segments="cnv_sv/jumble_vcf/{sample}_{type}.pathology_purecn.vcf",
vcf="snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.artifact_annotated.hotspot_annotated.background_annotated.include.exon.filter.snv_hard_filter_umi.codon_snvs.sorted.vep_annotated.qci.vcf",
output:
ctDNA_fraction=temp("cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction.tsv"),
ctDNA_fraction_info=temp("cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction_info.tsv"),
params:
gnomAD_AF_limit=config.get("estimate_ctdna_fraction", {}).get("gnomAD_AF_limit", 0.00001),
max_somatic_af=config.get("estimate_ctdna_fraction", {}).get("max_somatic_af", 0.4),
min_germline_af=config.get("estimate_ctdna_fraction", {}).get("min_germline_af", 0.1),
min_nr_SNPs_per_segment=config.get("estimate_ctdna_fraction", {}).get("min_nr_SNPs_per_segment", 35),
min_segment_length=config.get("estimate_ctdna_fraction", {}).get("min_segment_length", 10000000),
problematic_regions_beds=config.get("estimate_ctdna_fraction", {}).get("problematic_regions_beds", []),
vaf_baseline=config.get("estimate_ctdna_fraction", {}).get("vaf_baseline", 0.48),
log:
"twist_solid/estimate_ctdna_fraction/{sample}_{type}.ctDNA_tc.tsv.log",
benchmark:
repeat(
"twist_solid/estimate_ctdna_fraction/{sample}_{type}.ctDNA_tc.tsv.benchmark.tsv",
config.get("estimate_ctdna_fraction", {}).get("benchmark_repeats", 1),
)
threads: config.get("estimate_ctdna_fraction", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("estimate_ctdna_fraction", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("estimate_ctdna_fraction", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("estimate_ctdna_fraction", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("estimate_ctdna_fraction", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("estimate_ctdna_fraction", {}).get("time", config["default_resources"]["time"]),
container:
config.get("estimate_ctdna_fraction", {}).get("container", config["default_container"])
message:
"{rule}: estimate ctdna fraction based on CNV and SNV data into {output.ctDNA_fraction}"
script:
"../scripts/estimate_ctDNA_fraction.py"
| Rule parameters |
Key |
Value |
Description |
| input |
germline_vcf |
"snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz" |
Germline vcf only. Same file as used in BAF plots of the CNV html report |
| segments |
"cnv_sv/jumble_vcf/{sample}_{type}.pathology_purecn.vcf" |
CNV segments (from Jumble) |
| vcf |
"snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.artifact_annotated.hotspot_annotated.background_annotated.include.exon.filter.snv_hard_filter_umi.codon_snvs.sorted.vep_annotated.qci.vcf" |
Small variant vcf with vep annotations |
| output |
ctDNA_fraction |
"cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction.tsv" |
Estimated ctDNA fraction values based on CNV and SNV data |
| ctDNA_fraction_info |
"cnv_sv/estimate_ctdna_fraction/{sample}_{type}.ctDNA_fraction_info.tsv" |
CNV segment info for regions with deletion signals as well as variants passing all filters |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| gnomAD_AF_limit |
number |
additional filtering based on existence in GnomAD (removes germline) |
| max_somatic_af |
number |
additional filtering based on maximal allowed allele frequency (removes germline) |
| min_germline_af |
number |
minimial germline allele frequency to be included in density calculations (removes somatic) |
| min_nr_SNPs_per_segment |
integer |
minimal number of germline SNPs in segments to be used for ctDNA estimation |
| min_segment_length |
integer |
minimal length of segments to be used for ctDNA estimation |
| problematic_regions_beds |
array |
list of bedfiles |
| vaf_baseline |
number |
Sets the VAF-baseline. In theory BAF-baseline is 0.5 (50% for each SNP), however in practice this is slightly shifted downwards due to reference bias in alignment |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
exon_skipping
Calls exon skipping events in RNA data. Only reports MET exon 14 skipping and the EGFRvIII variant. See further exon skipping info.
Rule
rule exon_skipping:
input:
bed=config.get("exon_skipping", {}).get("design_bed", ""),
junction="alignment/star/{sample}_{type}.SJ.out.tab",
output:
result=temp("fusions/exon_skipping/{sample}_{type}.results.tsv"),
log:
"fusions/exon_skipping/{sample}_{type}.results.tsv.log",
benchmark:
repeat(
"fusions/exon_skipping/{sample}_{type}.results.tsv.benchmark.tsv",
config.get("exon_skipping", {}).get("benchmark_repeats", 1),
)
threads: config.get("exon_skipping", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("exon_skipping", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("exon_skipping", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("exon_skipping", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("exon_skipping", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("exon_skipping", {}).get("time", config["default_resources"]["time"]),
container:
config.get("exon_skipping", {}).get("container", config["default_container"])
message:
"{rule}: Find intergenic fusions and report them in {output.result}"
script:
"../scripts/exon_skipping.py"
| Rule parameters |
Key |
Value |
Description |
| input |
bed |
config.get("exon_skipping", {}).get("design_bed", "") |
Bed file used to extract exonic genomic regions |
| junction |
"alignment/star/{sample}_{type}.SJ.out.tab" |
Junction file from STAR based on split reads |
| output |
result |
"fusions/exon_skipping/{sample}_{type}.results.tsv" |
Report with info regarding MET exon 14 and EGFR vIII skipping |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| design_bed |
string |
path to design bed file |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
fix_vcf_ad_for_qci
Produces a QCI compatible vcf file with SNV and INDEL calls were the AD field has been corrected to so that the numbers corresponds with the AF field. QCI uses the AD field to calculate allele frequency. See further QCI fix info.
Rule
rule fix_vcf_ad_for_qci:
input:
vcf="{file}.vcf",
output:
vcf="{file}.qci.vcf",
log:
"{file}.qci.vcf.log",
benchmark:
repeat(
"{file}.qci.vcf.benchmark.tsv",
config.get("fix_vcf_ad_for_qci", {}).get("benchmark_repeats", 1),
)
threads: config.get("fix_vcf_ad_for_qci", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("fix_vcf_ad_for_qci", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("fix_vcf_ad_for_qci", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("fix_vcf_ad_for_qci", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("fix_vcf_ad_for_qci", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("fix_vcf_ad_for_qci", {}).get("time", config["default_resources"]["time"]),
container:
config.get("fix_vcf_ad_for_qci", {}).get("container", config["default_container"])
message:
"{rule}: Correct AD so the correct AF is shown in QCI for vcf {input.vcf}"
script:
"../scripts/fix_vcf_ad_for_qci.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
"{file}.vcf" |
Vcf file with SNV and INDEL calls |
| output |
vcf |
"{file}.qci.vcf" |
QCI compatible vcf file with SNV and INDEL calls were the AD field has been corrected |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
hotspot_report
A excel friendly text file containing information of all hotspot postions as well as all called variants. For all hotspot positions the coverage and alternative allele information is reported as well as additional annotations. The columns reported are configured by the config file hotspot_report.yaml. See further Hotspot report info.
Rule
rule hotspot_report:
input:
hotspots=lambda wildcards: config["hotspot_report"]["hotspot_mutations"][wildcards.tag],
vcf=lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz"),
vcf_index=lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz.tbi"),
vcf_file_wo_pick=lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated_wo_pick.vcf.gz"),
vcf_file_wo_pick_index=lambda wildcards: "%s%s"
% (get_hotspot_report_vcf_input(wildcards), ".vep_annotated_wo_pick.vcf.gz"),
gvcf="qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz",
gvcf_index="qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz.tbi",
output:
report=temp("qc/hotspot_report/{sample}_{type}.{tag}.output.tsv"),
params:
levels=config.get("hotspot_report", {}).get("levels", []),
sample_name=lambda wildcards: "%s_%s" % (wildcards.sample, wildcards.type),
report_config=config.get("hotspot_report", {})["report_config"],
chr_translation_file=config.get("hotspot_report", {})["chr_translation_file"],
extra=config.get("hotspot_report", {}).get("extra", ""),
log:
"qc/hotspot_report/{sample}_{type}.{tag}.output.tsv.log",
benchmark:
repeat(
"qc/hotspot_report/{sample}_{type}.{tag}.output.benchmark.tsv",
config.get("hotspot_report", {}).get("benchmark_repeats", 1),
)
threads: config.get("hotspot_report", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("hotspot_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("hotspot_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("hotspot_report", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("hotspot_report", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("hotspot_report", {}).get("time", config["default_resources"]["time"]),
container:
config.get("hotspot_report", {}).get("container", config["default_container"])
message:
"{rule}: Make a coverage and mutations report: {output.report}"
script:
"../scripts/hotspot_report.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz") |
Vcf file with filtered SNV and INDEL calls |
| vcf_index |
lambda wildcards: "%s%s" % (get_hotspot_report_vcf_input(wildcards), ".vep_annotated.vcf.gz.tbi") |
Vcf tabix index file |
| gvcf |
"qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz" |
Genome vcf file coverage and allele information on all positions in the panel |
| gvcf_index |
"qc/add_mosdepth_coverage_to_gvcf/{sample}_{type}.mosdepth.g.vcf.gz.tbi" |
Genome vcf tabix index file |
| output |
report |
"qc/hotspot_report/{sample}_{type}.{tag}.output.tsv" |
Excel friendly text report with hotspot position qc information |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| hotspot_mutations |
object |
List of files with hotspot positions and annotations. The names of the list items are found in the file names as {tag}. |
| report_config |
string |
file defining which fields should be extracted from the annotation |
| chr_translation_file |
string |
file used to translate chr to NC id. The format is a tab separate file with column chr and NC |
| levels |
array |
List of configurations. 1. depth above this value 2. depth level status, ex ok or low 3. depth status; ok, not analyzable Example: - [200, "ok", "yes"]
|
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
hotspot_report_aa_translate
Take the output from hotspot report (coverage_and mutations) and translate the amino acid three letter code into one letter code using an in-house script.
Rule
rule hotspot_report_aa_translate:
input:
report="qc/hotspot_report/{sample}_{type}.{tag}.output.tsv",
output:
report=temp("qc/hotspot_report/{sample}_{type}.{tag}.aa_translated.tsv"),
log:
"twist_solid/hotspot_report_aa_translate/{sample}_{type}.{tag}.aa_translated.tsv.log",
benchmark:
repeat(
"twist_solid/hotspot_report_aa_translate/{sample}_{type}.{tag}.aa_translated.tsv.benchmark.tsv",
config.get("hotspot_report_aa_translate", {}).get("benchmark_repeats", 1),
)
threads: config.get("hotspot_report_aa_translate", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("hotspot_report_aa_translate", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("hotspot_report_aa_translate", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("hotspot_report_aa_translate", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("hotspot_report_aa_translate", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("hotspot_report_aa_translate", {}).get("time", config["default_resources"]["time"]),
container:
config.get("hotspot_report_aa_translate", {}).get("container", config["default_container"])
message:
"{rule}: Translate aa three letter codes into one letter code in {input.report}"
script:
"../scripts/aa_translate.py"
| Rule parameters |
Key |
Value |
Description |
| input |
report |
"qc/hotspot_report/{sample}_{type}.{tag}.output.tsv" |
coverage and mutations file from hotspot report |
| output |
report |
"qc/hotspot_report/{sample}_{type}.{tag}.aa_translated.tsv" |
coverage and mutations file from hotspot report with aa change in one letter code |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
jumble_gis_score
Extract the predicted Genomic Instability Score (GIS) from Jumble's output for the current sample's tumor content (TC).
Rule
rule jumble_gis_score:
input:
gis_csv="cnv_sv/jumble_run/{sample}_{type}/{sample}_{type}.jumble_gis.csv",
output:
gis_score=temp("cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.txt"),
params:
tc=get_tc,
log:
"cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.log",
benchmark:
repeat(
"cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.benchmark.tsv",
config.get("jumble_gis_score", {}).get("benchmark_repeats", 1),
)
threads: config.get("jumble_gis_score", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("jumble_gis_score", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("jumble_gis_score", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("jumble_gis_score", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("jumble_gis_score", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("jumble_gis_score", {}).get("time", config["default_resources"]["time"]),
container:
config.get("jumble_gis_score", {}).get("container", config["default_container"])
message:
"{rule}: Extract predicted GIS score for {wildcards.sample}_{wildcards.type} at {params.tc} TC"
script:
"../scripts/extract_gis_score.py"
| Rule parameters |
Key |
Value |
Description |
| input |
gis_csv |
"cnv_sv/jumble_run/{sample}_{type}/{sample}_{type}.jumble_gis.csv" |
GIS csv file from jumble_run |
| output |
gis_score |
"cnv_sv/jumble_gis_score/{sample}_{type}.{tc_method}.predicted_gis.txt" |
Extracted predicted GIS score for current TC |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| tc |
number |
tumor content |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
house_keeping_gene_coverage
List the average coverage of the housekeeping genes GAPDH, GUSB, OAZ1, and POLR2A. See further house keeping gene coverage info.
Rule
rule house_keeping_gene_coverage:
input:
bam="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam",
bai="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai",
bed=config.get("reference", {}).get("design_bed_rna", ""),
output:
result=temp("qc/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv"),
log:
"fusions/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv.log",
benchmark:
repeat(
"fusions/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv.benchmark.tsv",
config.get("house_keeping_gene_coverage", {}).get("benchmark_repeats", 1),
)
threads: config.get("house_keeping_gene_coverage", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("house_keeping_gene_coverage", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("house_keeping_gene_coverage", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("house_keeping_gene_coverage", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("house_keeping_gene_coverage", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("house_keeping_gene_coverage", {}).get("time", config["default_resources"]["time"]),
container:
config.get("house_keeping_gene_coverage", {}).get("container", config["default_container"])
message:
"{rule}: Find intergenic fusions and report them in {output.result}"
script:
"../scripts/house_keeping_gene_coverage.py"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam" |
Bam file |
| bai |
"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai" |
Bam index file |
| bed |
config.get("reference", {}).get("design_bed_rna", "") |
Bed file with gene regions used to obtain genomic coordinates |
| output |
result |
"qc/house_keeping_gene_coverage/{sample}_{type}.house_keeping_gene_coverage.tsv" |
Excel frienly text report with gene coverage information |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
report_fusions
Make a combined report of filtered fusion calls from all RNA fusion callers. Applies caller specific thresholds based on read support and fusion type. Annotates breakpoints to genes and exons when available. See further RNA fusions report info.
Rule
rule report_fusions:
input:
arriba="fusions/arriba/{sample}_{type}.fusions.tsv",
bam="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam",
bai="fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai",
bed=config.get("reference", {}).get("design_bed_rna", ""),
bed_extra_annotation=config.get("report_fusions", {}).get("annotation_bed", ""),
dedup_coverage="qc/mosdepth/{sample}_{type}.regions.bed.gz",
fusioncatcher="fusions/fusioncatcher/{sample}_{type}/final-list_candidate-fusion-genes.hg19.txt",
star_fusion="fusions/star_fusion/{sample}_{type}/star-fusion.fusion_predictions.abridged.coding_effect.tsv",
output:
fusions=temp("fusions/report_fusions/{sample}_{type}.fusion_report.tsv"),
params:
fp_fusions=config.get("report_fusions", {}).get("fp_fusions", ""),
fusioncatcher_flag_low_support=config.get("report_fusions", {}).get("fusioncatcher_flag_low_support", 15),
fusioncatcher_low_support=config.get("report_fusions", {}).get("fusioncatcher_low_support", 3),
fusioncatcher_low_support_inframe=config.get("report_fusions", {}).get("fusioncatcher_low_support_inframe", 6),
star_fusion_flag_low_support=config.get("report_fusions", {}).get("star_fusion_flag_low_support", 15),
star_fusion_low_support=config.get("report_fusions", {}).get("star_fusion_low_support", 2),
star_fusion_low_support_inframe=config.get("report_fusions", {}).get("star_fusion_low_support_inframe", 6),
log:
"qc/report_fusions/{sample}_{type}.fusion_report.tsv.log",
threads: config.get("report_fusions", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("report_fusions", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("report_fusions", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("report_fusions", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("report_fusions", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("report_fusions", {}).get("time", config["default_resources"]["time"]),
benchmark:
repeat(
"qc/report_fusions/{sample}_{type}.fusion_report.tsv.benchmark.tsv",
config.get("report_fusions", {}).get("benchmark_repeats", 1),
)
container:
config.get("report_fusions", {}).get("container", config["default_container"])
message:
"{rule}: Collect and filter fusions and create report: {output.fusions}"
script:
"../scripts/report_fusions.py"
| Rule parameters |
Key |
Value |
Description |
| input |
arriba |
"fusions/arriba/{sample}_{type}.fusions.tsv" |
Called fusions by Arriba |
| bam |
"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam" |
Bam file |
| bai |
"fusions/star_fusion/{sample}_{type}/Aligned.out.sorted.bam.bai" |
Bam index file |
| bed |
config.get("reference", {}).get("design_bed_rna", "") |
Bed file with regions of the design |
| bed_extra_annotation |
config.get("report_fusions", {}).get("annotation_bed", "") |
Bed file with gene annotations of regions outide of the design |
| fusioncatcher |
"fusions/fusioncatcher/{sample}_{type}/final-list_candidate-fusion-genes.hg19.txt" |
Called fusions by FusionCatcher |
| star_fusion |
"fusions/star_fusion/{sample}_{type}/star-fusion.fusion_predictions.abridged.coding_effect.tsv" |
Called fusions by StarFusion |
| output |
fusions |
"fusions/report_fusions/{sample}_{type}.fusion_report.tsv" |
Excel friendly text report with filtered fusion calls from respective caller |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| annotation_bed |
string |
extra bed file for annotation of fusion partner gene exons not in design |
| fusioncather_flag_low_support |
integer |
lower limit of supporting reads to flag for low support for FusionCatcher |
| fusioncather_low_support |
integer |
lower limit of supporting reads to filter fusion for FusionCatcher |
| fusioncather_low_support_fp_genes |
integer |
lower limit of supporting reads to filter fusion in common fp genes for FusionCatcher |
| fusioncather_low_support_inframe |
integer |
lower limit of supporting reads to filter fusion that are inframe for FusionCatcher |
| star_fusion_flag_low_support |
integer |
lower limit of supporting reads to filter fusion for StarFusion |
| star_fusion_low_support |
integer |
lower limit of supporting reads to filter fusion for StarFusion |
| star_fusion_low_support_fp_genes |
integer |
lower limit of supporting reads to flag filter fusion in common fp genes for StarFusion |
| star_fusion_low_support_inframe |
integer |
lower limit of supporting reads to flag filter fusion that are inframe for StarFusion |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
report_gene_fuse
Make a report of filtered fusion calls from the DNA fusion caller Gene Fuse. Applies configurable thresholds for supporting reads and flags noisy genes as well as filters artifact genes. See further DNA fusions report info.
Rule
rule report_gene_fuse:
input:
fusions="fusions/gene_fuse/{sample}_{type}_gene_fuse_fusions.txt",
output:
report=temp("fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv"),
params:
filter_fusions=config.get("report_gene_fuse", {}).get("filter_fusions", ""),
min_unique_reads=config.get("report_gene_fuse", {}).get("min_unique_reads", 6),
log:
"fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv.log",
threads: config.get("report_gene_fuse", {}).get("threads", config["default_resources"]["threads"])
resources:
threads=config.get("report_gene_fuse", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("report_gene_fuse", {}).get("time", config["default_resources"]["time"]),
mem_mb=config.get("report_gene_fuse", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("report_gene_fuse", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("report_gene_fuse", {}).get("partition", config["default_resources"]["partition"]),
benchmark:
repeat(
"fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv.benchmark.tsv",
config.get("report_gene_fuse", {}).get("benchmark_repeats", 1),
)
container:
config.get("report_gene_fuse", {}).get("container", config["default_container"])
message:
"{rule}: Collect and filter gene fuse dna fusions and create report: {output.report}"
script:
"../scripts/report_gene_fuse.py"
| Rule parameters |
Key |
Value |
Description |
| input |
fusions |
"fusions/gene_fuse/{sample}_{type}_gene_fuse_fusions.txt" |
Called fusions by GeneFuse |
| output |
report |
"fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv" |
Excel friendly text report with filtered fusion calls from GeneFuse |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| filter_fusions |
string |
file specifying fusions that should be filtered completely (value 0 in column 2) or have higher limit (value >0 in column 2) |
| min_unique_reads |
integer |
lower limit of uniquely supporting reads to report fusion |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
purecn_modify_vcf
Increases the MQB (mean base quality) value by 5 as the qualities are so bad for our samples.
Rule
rule purecn_modify_vcf:
input:
vcf="snv_indels/gatk_mutect2/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.vcf",
output:
vcf="cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf",
params:
extra=config.get("purecn_modify_vcf", {}).get("extra", ""),
log:
"cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf.log",
benchmark:
repeat(
"cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf.benchmark.tsv",
config.get("purecn_modify_vcf", {}).get("benchmark_repeats", 1),
)
threads: config.get("purecn_modify_vcf", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("purecn_modify_vcf", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("purecn_modify_vcf", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("purecn_modify_vcf", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("purecn_modify_vcf", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("purecn_modify_vcf", {}).get("time", config["default_resources"]["time"]),
container:
config.get("purecn_modify_vcf", {}).get("container", config["default_container"])
message:
"{rule}: modify mbq in {input.vcf}"
shell:
# new lines are used to tell the linter that it is not an absolut path
'(sed -r \'s/(.*)(\\MBQ=[0-9]+,)([0-9]+)(.*)/echo "\\1\\2$((\\3+5))\\4"/'
"ge' {input.vcf} | "
'sed -r \'s/(.*)(\\MBQ=)([0-9]+)(.*)/echo "\\1\\2$((\\3+5))\\4"/'
"ge' > "
"{output.vcf}) &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
"snv_indels/gatk_mutect2/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.vcf" |
hard filtered vcf with variants from mutect2 for purecn |
| output |
vcf |
"cnv_sv/purecn_modify_vcf/{sample}_{type}.normalized.sorted.vep_annotated.filter.snv_hard_filter_purecn.bcftools_annotated_purecn.mbq.vcf" |
hard filtered vcf with corrected MBQ values for purecn |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
sample_mixup_check
Compare ID-SNPs in the RNA samples to the DNA samples in the same analysis and report sample similarities to be able to discern sample mixups
Rule
rule sample_mixup_check:
input:
id_snp_vcf_dna=[
"snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type)
for sample in samples.index
for unit_type in get_unit_types(units, sample)
if unit_type == "T"
],
id_snp_vcf_rna=[
"snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type)
for sample in samples.index
for unit_type in get_unit_types(units, sample)
if unit_type == "R"
],
output:
mixup_report="qc/sample_mixup_check/sample_mixup_check.tsv",
params:
extra=config.get("sample_mixup_check", {}).get("extra", ""),
match_cutoff=config.get("sample_mixup_check", {}).get("match_cutoff", "0.7"),
log:
"twist_solid/sample_mixup_check/sample_mixup_check.tsv.log",
benchmark:
repeat(
"twist_solid/sample_mixup_check/sample_mixup_check.tsv.benchmark.tsv",
config.get("sample_mixup_check", {}).get("benchmark_repeats", 1),
)
threads: config.get("sample_mixup_check", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("sample_mixup_check", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("sample_mixup_check", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("sample_mixup_check", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("sample_mixup_check", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("sample_mixup_check", {}).get("time", config["default_resources"]["time"]),
container:
config.get("sample_mixup_check", {}).get("container", config["default_container"])
message:
"{rule}: find most similar sample in rna ID-snps compared to dna ID-snps"
script:
"../scripts/sample_mixup_check.py"
| Rule parameters |
Key |
Value |
Description |
| input |
id_snp_vcf_dna |
[ "snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type) for sample in samples.index for unit_type in get_unit_types(units, sample) if unit_type == "T" ] |
list of vcf files with ID-SNP positions called by bcftools for the dna samples |
| id_snp_vcf_rna |
[ "snv_indels/bcftools_id_snps/%s_%s.id_snps.vcf" % (sample, unit_type) for sample in samples.index for unit_type in get_unit_types(units, sample) if unit_type == "R" ] |
list of vcf files with ID-SNP positions called by bcftools for the rna samples |
| output |
mixup_report |
"qc/sample_mixup_check/sample_mixup_check.tsv" |
text report where the most similar RNA and DNA sample pairs in one analysis are reported together with the percent similarity |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
| match_cutoff |
number |
minimum fraction of matching SNP calls for an RNA and DNA matched sample |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
somalier_best_match_report
Takes the somalier relate pairs output and finds the best match for each sample (DNA and RNA). The script handles the bidirectional nature of the pairs file to ensure every sample is reported with its top relatedness match.
Rule
rule somalier_best_match_report:
input:
pairs="qc/somalier_ungrouped/somalier_relate.pairs.tsv",
output:
report=temp("qc/somalier_ungrouped/somalier_best_match.tsv"),
params:
extra=config.get("somalier_best_match_report", {}).get("extra", ""),
match_cutoff=config.get("somalier_best_match_report", {}).get("match_cutoff", 0.7),
log:
"qc/somalier_ungrouped/somalier_best_match.tsv.log",
benchmark:
repeat(
"qc/somalier_ungrouped/somalier_best_match.tsv.benchmark.tsv",
config.get("somalier_best_match_report", {}).get("benchmark_repeats", 1),
)
container:
config.get("somalier_best_match_report", {}).get("container", config["default_container"])
threads: config.get("somalier_best_match_report", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("somalier_best_match_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("somalier_best_match_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("somalier_best_match_report", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("somalier_best_match_report", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("somalier_best_match_report", {}).get("time", config["default_resources"]["time"]),
message:
"{rule}: generating best match report from somalier pairs"
script:
"../scripts/somalier_best_match.py"
| Rule parameters |
Key |
Value |
Description |
| input |
pairs |
"qc/somalier_ungrouped/somalier_relate.pairs.tsv" |
Somalier relate pairs file |
| output |
report |
"qc/somalier_ungrouped/somalier_best_match.tsv" |
Somalier best match report |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
| match_cutoff |
number |
relatedness cutoff for considering two samples a match (default 0.7) |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |
somalier_dna_dna_report
Takes the somalier relate pairs output and finds the best DNA-DNA match for each DNA sample. This is used to detect potential sample mixups between DNA samples.
Rule
rule somalier_dna_dna_report:
input:
pairs="qc/somalier_ungrouped/somalier_relate.pairs.tsv",
output:
report=temp("qc/somalier_ungrouped/somalier_dna_dna_match.tsv"),
params:
extra=config.get("somalier_dna_dna_report", {}).get("extra", ""),
match_cutoff=config.get("somalier_dna_dna_report", {}).get("match_cutoff", 0.85),
log:
"qc/somalier_ungrouped/somalier_dna_dna_match.tsv.log",
benchmark:
repeat(
"qc/somalier_ungrouped/somalier_dna_dna_match.tsv.benchmark.tsv",
config.get("somalier_dna_dna_report", {}).get("benchmark_repeats", 1),
)
container:
config.get("somalier_dna_dna_report", {}).get("container", config["default_container"])
threads: config.get("somalier_dna_dna_report", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("somalier_dna_dna_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("somalier_dna_dna_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("somalier_dna_dna_report", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("somalier_dna_dna_report", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("somalier_dna_dna_report", {}).get("time", config["default_resources"]["time"]),
message:
"{rule}: generating DNA-DNA best match report from somalier pairs"
script:
"../scripts/somalier_dna_dna_match.py"
| Rule parameters |
Key |
Value |
Description |
| input |
pairs |
"qc/somalier_ungrouped/somalier_relate.pairs.tsv" |
Somalier relate pairs file |
| output |
report |
"qc/somalier_ungrouped/somalier_dna_dna_match.tsv" |
Somalier DNA-DNA best match report |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| match_cutoff |
number |
relatedness cutoff for match (default 0.85) |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
memory in MB used per cpu |
| mem_per_cpu |
integer |
memory used per cpu |
| partition |
string |
partition to use on cluster |
| time |
string |
max execution time |
| threads |
integer |
number of threads to be available |