Some system doesn't allow access to internet making it impossible to have a pipeline that are dependent on resource hosted on web, like docker hub and github. This is solved by packing the pipeline and all dependencies:

Pipeline code and environment
Singularities
Reference/design files

Preperations¶

Fetch the pipeline and install requirements

# Set Twist Solid version
TAG_OR_BRANCH="vX.Y.X"

# Clone selected version
git clone --branch ${TAG_OR_BRANCH} https://github.com/genomic-medicine-sweden/Twist_Solid.git
cd Twist_Solid

Environment¶

Create an environment, on a computer/server with access to internet, that can be moved to bianca.

Requires:

conda
conda-pack

# Build compressed file containing, named Twist_Solid_{TAG_OR_BRANCH}.tar.gz
# - Twist Solid Pipeline
# - snakemake-wrappers
# - hydra-genetics modules
# - conda env
# - config files
TAG_OR_BRANCH="vX.Y.X" bash build/build_conda.sh

The script build/build_conda.sh performs the following steps: 1. Clones the pipeline repository. 2. Creates a conda environment and installs requirements. 3. Packs the conda environment. 4. Clones snakemake-wrappers and hydra-genetics modules. 5. Downloads configuration files. 6. Updates configuration paths using envsubst. 7. Downloads containers (optional). 8. Packs everything into a tarball Twist_Solid_{TAG_OR_BRANCH}.tar.gz.

Download containers¶

# NOTE: singularity command need to be available for this step
hydra-genetics prepare-environment create-singularity-files -c config/config.yaml -o singularity_cache

Download reference files¶

# NextSeq
hydra-genetics --debug references download -o design_and_ref_files -v config/references/design_files.hg19.yaml -v config/references/nextseq.hg19.pon.yaml -v config/references/references.hg19.yaml

#NovaSeq, not all files are prepare for novaseq
hydra-genetics references download -o design_and_ref_files -v config/references/design_files.hg19.yaml -v config/references/novaseq.hg19.pon.yaml -v config/references/references.hg19.yaml

# Compress data
tar -czvf design_and_ref_files.tar.gz design_and_ref_files

Files/Folders¶

The following file/folders have been created and need to be moved to your server:

file: design_and_ref_files.tar.gz
file: Twist_Solid_{TAG_OR_BRANCH}.tar.gz
folder: singularity_cache

On Server¶

Setup environment¶

Unpack environment and activate¶

# Extract tar.
TAG_OR_BRANCH="vX.Y.X"
tar -xvf Twist_Solid_${TAG_OR_BRANCH}.tar.gz
cd Twist_Solid_${TAG_OR_BRANCH}
mkdir venv && tar xvf env.tar.gz -C venv/
source venv/bin/activate

# Variables that will be used later
PATH_TO_ENV=${PWD}
PATH_TO_HYDRA_MODULES=${PWD}/hydra-genetics
PATH_TO_FOLDER_WITH_PIPELINE=${PWD}/Twist_Solid

Decompress reference files¶

tar -xvf design_and_ref_files.tar.gz

Singularities¶

Move singularity cache to a appropriate location

Modify config and profile¶

Resource¶

Make sure that config/resource.yaml match your system setup, ex: - partition - number of cores - memory

config.data.hg19.yaml files¶

Point to uploaded reference files

# config/config.data.hg19.yaml
# Update the following lines:
# Adjust config so that all reference files have the {{REFERENCE_DATA}} variable
REFERENCE_DATA: "{EXTRACT_PATH}/design_and_ref_files"
PROJECT_DESIGN_DATA: "{{REFERENCE_DATA}}"
PROJECT_PON_DATA: "{{REFERENCE_DATA}}"
PROJECT_REF_DATA: "{{REFERENCE_DATA}}"

Config.yaml files¶

Set path for hydra-genetics modules

# Update the following line
hydra_local_path: "{PATH_TO_EXTRACTED_ENV}/hydra-genetics"

Add path to local singularities

# config/config.yaml
# Make sure the environment is active
cp config/config.yaml config/config.yaml.copy
hydra-genetics prepare-environment container-path-update -c config/config.yaml.copy -n config/config.yaml -p ${PATH_TO_singularity_cache}

The path to the apptainer cache can also be given once at the top of the config, much like the REFERENCE_DATA variable.

Profile¶

Copy a profile and modify it to match your system, exTwist_Solid_${TAG_OR_BRANCH}/Twist_Solid/profiles/bianca/config.yaml

# Found at Twist_Solid_{TAG_OR_BRANCH}/snakemake-wrappers, use absolute_path with 'git+file:/'
wrapper-prefix="PATH_TO_WRAPPERS"
# ex: wrapper-prefix: "git+file://proj/sens2022566/nobackup/patriksm/Twist_Solid_add-{TAG_OR_BRANCH}/snakemake-wrappers/"

# Update account info, change ADD_YOUR_ACCOUNT to your bianca project id
drmaa: " -A ADD_YOUR_ACCOUNT -N 1-1 -t {resources.time} -n {resources.threads} --mem={resources.mem_mb} --mem-per-cpu={resources.mem_per_cpu} --mem-per-cpu={resources.mem_per_cpu} --partition={resources.partition} -J {rule} -e slurm_out/{rule}_%j.err -o slurm_out/{rule}_%j.out"

Validate config files¶

# This will make sure that all design and reference files exists and haven't changed
# Warnings for possible file PATH/hydra-genetics and missing tbi files in config can be ignored
hydra-genetics --debug references validate -c config/config.yaml -c config/config.data.hg19.yaml -v config/references/design_files.hg19.yaml -v config/references/nextseq.hg19.pon.yaml -v config/references/references.hg19.yaml  -p ${PATH_TO_design_and_ref_files}

Run Pipeline¶

# Create analysis
mkdir analysis
# Enter folder
cd analysis
# Copy config files
cp -r PATH_TO_UPDATED_CONFIGS/config .

# Create samples.tsv and units.tsv
# https://hydra-genetics.readthedocs.io/en/latest/create_sample_files/
# remember to update tumor content value (TC) in samples.tsv for DNA samples 
hydra-genetics create-input-files  -d PATH_TO_FASTQ_FILE -p NovaSeq6000 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

# Make sure slurm-drmaa is available
source /{PATH_TO_ENV}/venv/bin/activate
snakemake -s /{PATH_TO_PIPELINE}/Twist_Solid/workflow/Snakefile  --profile ${PATH_TO_UPDATED_PROFILE}/Twist_Solid/profiles/bianca