Pipeline setup and configuration

There are a number of main files that governs how the pipeline is executed listed below:

  • Snakefile
  • common.smk
  • config.yaml
  • resources.yaml
  • profile/uppsala/config.yaml
  • samples.tsv and units.tsv

There is more general information about the content of these files in hydra-genetics documentation in code standards, config and Snakefile.

Snakefile

The Snakefile is located in workflow/ and imports hydra-genetics modules and rules as well as modifies these rules when needed. It also imports pipeline specific rules and define rule orders. Finally, this is where the rule all is defined.

common.smk

The common.smk is located under workflow/rules/. This is a general rule taking care of any actions that are not directly connected with running a specific program. It includes version checks, import of config, resources, tsv-files and validations using schemas. Functions used by pipeline specific rules are also defined here as well as the output files using the function compile_output_list which programmatically generates a list of all necessary output files for the module to be targeted in the all rule defined in the Snakemake file. See further Result files.

config.yaml

The config.yaml is located under config/. The file ties all file and other dependencies as well as parameters for different rules together. See further pipeline configuration.


Expand to view current config.yaml
resources: "config/resources.yaml"
samples: "samples.tsv"
units: "units.tsv"

#hydra_local_path: "PATH_TO_REPO"

modules:
  alignment: "v0.7.0"
  annotation: "v1.4.0"
  biomarker: "v0.8.0"
  cnv_sv: "v3.0.0"
  filtering: "v0.3.0"
  fusions: "v0.3.1"
  prealignment: "v1.3.0"
  qc: "v0.7.0"
  reports: "v1.1.1"
  snv_indels: "v2.1.0"

output: "config/output_files_FFPE.yaml"
default_container: "docker://hydragenetics/common:3.1.1.1"

trimmer_software: "fastp_pe"
subsample: "None"

arriba:
  container: "docker://hydragenetics/arriba:2.3.0"

arriba_draw_fusion:
  container: "docker://hydragenetics/arriba:2.3.0"

add_multi_snv_in_codon:
  af_limit: 0.00
  artifact_limit: 10000

bcbio_variation_recall_ensemble:
  container: "docker://hydragenetics/bcbio-vc:0.2.6"
  callers:
    - vardict
    - gatk_mutect2

bcftools_annotate:
  output_type: "z"
  annotation_string: "-m DB"

bwa_mem:
  container: "docker://hydragenetics/bwa_mem:0.7.17"

bwa_mem_merge:
  extra: "-c -p"

bwa_mem_realign_consensus_reads:
  container: "docker://hydragenetics/fgbio:2.1.0"

call_small_cnv_deletions:
  window_size: 4
  region_max_size: 30
  min_nr_stdev_diff: 2.5
  min_log_odds_diff: 0.3

call_small_cnv_amplifications:
  window_size: 3
  region_max_size: 15
  min_nr_stdev_diff: 8
  min_log_odds_diff: 0.4

cnvkit_batch:
  container: "docker://hydragenetics/cnvkit:0.9.11"
  extra: "--drop-low-coverage"
  method: "hybrid"

cnvkit_batch_hrd:
  container: "docker://hydragenetics/cnvkit:0.9.11"
  method: "hybrid"

cnvkit_call:
  container: "docker://hydragenetics/cnvkit:0.9.11"

cnvkit_diagram:
  container: "docker://hydragenetics/cnvkit:0.9.11"

cnvkit_export_seg:
  container: "docker://hydragenetics/cnvkit:0.9.11"

cnvkit_scatter:
  container: "docker://hydragenetics/cnvkit:0.9.11"

cnv_html_report:
  show_table: true
  cytobands: true
  extra_tables:
    - name: Small CNVs and 1p19q
      description: >
        Additional small amplifications and deletions as well as 1p19q co-deletions called by Twist Solid
        in-house scripts. Can have overlaps with called regions from other callers.
      path: "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.cnv_loh_genes_all.cnv_additional_variants_only.tsv"
    - name: Large chromosomal aberrations
      description: >
        Large chromosomal aberrations in the form of deletions, duplications and copy neutral loss of heterozygosity.
        Also warnings of baseline skewness and detection of polyploidy in the sample.
      path: "cnv_sv/svdb_query/{sample}_{type}.{tc_method}.cnv_loh_genes_all.cnv_chromosome_arms.tsv"

cnv_tsv_report:
  amp_cn_limit: 6.0
  baseline_fraction_limit: 0.2
  del_1p19q_cn_limit: 1.4
  del_1p19q_chr_arm_fraction: 0.3
  chr_arm_fraction: 0.3
  del_chr_arm_cn_limit: 1.4
  amp_chr_arm_cn_limit: 2.6
  normal_baf_lower_limit: 0.3
  normal_baf_upper_limit: 0.7
  normal_cn_lower_limit: 1.7
  normal_cn_upper_limit: 2.25
  polyploidy_fraction_limit: 0.2
  max_cnv_fp_size: 15000000

ctat_splicing_call:
  container: "docker://hydragenetics/ctat-splicing:0.0.3"

estimate_ctdna_fraction:
  gnomAD_AF_limit: 0.00001
  max_somatic_af: 0.4
  min_germline_af: 0.1
  min_nr_SNPs_per_segment: 35
  min_segment_length: 10000000
  vaf_baseline: 0.48

fastp_pe:
  container: "docker://hydragenetics/fastp:0.20.1"
  # Default enabled trimming parameters for fastp. Specified for clarity.
  extra: "--trim_poly_g --qualified_quality_phred 15 --unqualified_percent_limit 40 --n_base_limit 5 --length_required 15"

fastp_pe_arriba:
  container: "docker://hydragenetics/fastp:0.20.1"
  extra: "--max_len1 100"

fastqc:
  container: "docker://hydragenetics/fastqc:0.11.9"

fgbio_call_and_filter_consensus_reads:
  container: "docker://hydragenetics/fgbio:2.1.0"
  max_base_error_rate: "0.2"
  min_reads_call: "1 0 0"
  min_reads_filter: "1 0 0"
  min_input_base_quality_call: 20
  min_input_base_quality_filter: 30

fgbio_copy_umi_from_read_name:
  container: "docker://hydragenetics/fgbio:2.1.0"

fgbio_group_reads_by_umi:
  container: "docker://hydragenetics/fgbio:2.1.0"
  umi_strategy: paired

filter_vcf:
  snv_soft_filter: "config/filters/config_soft_filter_uppsala_vep105.yaml"
  snv_soft_filter_umi: "config/filters/config_soft_filter_umi_vep105.yaml"
  snv_hard_filter: "config/filters/config_hard_filter_uppsala_vep105.yaml"
  snv_hard_filter_umi: "config/filters/config_hard_filter_umi_vep105.yaml"
  snv_hard_filter_purecn: "config/filters/config_hard_filter_purecn.yaml"
  cnv_hard_filter_amp: "config/filters/config_hard_filter_cnv_amp.yaml"
  cnv_hard_filter_loh: "config/filters/config_hard_filter_cnv_loh.yaml"
  germline: "config/filters/config_hard_filter_germline_vep105.yaml"
  itd_hard_filter: "config/filters/config_hard_filter_scanitd.yaml"

filter_fuseq_wes:
  min_support: 50
  filter_on_fusiondb: True

filter_fuseq_wes_umi:
  min_support: 15
  filter_on_fusiondb: True

finaletoolkit_end_motifs:
  container: "docker://hydragenetics/finaletoolkit:0.10.7"

finaletoolkit_frag_length_bins:
  container: "docker://hydragenetics/finaletoolkit:0.10.7"

finaletoolkit_interval_end_motifs:
  container: "docker://hydragenetics/finaletoolkit:0.10.7"

finaletoolkit_interval_mds:
  container: "docker://hydragenetics/finaletoolkit:0.10.7"

finaletoolkit_mds:
  container: "docker://hydragenetics/finaletoolkit:0.10.7"


fragmentomics_fragment_length_patient_score:
  container: "docker://hydragenetics/fragmentomics:0.1.0"

fuseq_wes:
  container: "docker://hydragenetics/fuseq_wes:1.0.1"

fusioncatcher:
  container: "docker://hydragenetics/fusioncatcher:1.33"
  extra: ""

gatk_calculate_contamination:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_call_copy_ratio_segments:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_collect_allelic_counts:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_collect_read_counts:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_denoise_read_counts:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_model_segments:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_get_pileup_summaries:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_mutect2:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_mutect2_filter:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_mutect2_gvcf:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gatk_mutect2_merge_stats:
  container: "docker://hydragenetics/gatk4:4.1.9.0"

gene_fuse:
  container: "docker://hydragenetics/genefuse:0.6.1"

general_html_report:
  final_directory_depth: 3
  multiqc_config: "config/reports/multiqc_config_dna.yaml"

juli_annotate:
  container: "docker://hydragenetics/juli:0.1.6.2"

juli_call:
  container: "docker://hydragenetics/juli:0.1.6.2"

jumble_cnvkit_call:
  container: "docker://hydragenetics/cnvkit:0.9.9"

jumble_run:
  container: "docker://hydragenetics/jumble:260310"

jumble_vcf:
  dup_limit: 2.5
  het_del_limit: 1.5
  hom_del_limit: 0.5

hotspot_report:
  report_config: "config/reports/hotspot_report.yaml"
  levels:
    - [200, "ok", "yes"]
    - [30, "low", "yes"]
    - [0, "low", "not analyzable"]

manta_config_t:
  container: "docker://hydragenetics/manta:1.6.0"

manta_run_workflow_t:
  container: "docker://hydragenetics/manta:1.6.0"

merge_af_complex_variants:
  merge_method: "sum"

merge_cnv_json:
  cancer_genes: "config/reports/cancer_genes.csv"
  filtered_cnv_vcfs:
    - cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.filter.cnv_hard_filter_amp.fp_tag.vcf
    - cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_loh_genes_all.filter.cnv_hard_filter_loh.fp_tag.annotate_fp.vcf
  germline_vcf: snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz
  unfiltered_cnv_vcfs:
    - cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.fp_tag.vcf.gz
    - cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_loh_genes_all.fp_tag.vcf.gz
  unfiltered_cnv_vcfs_tbi:
    - cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.fp_tag.vcf.gz.tbi
    - cnv_sv/svdb_query/{sample}_{type}.{tc_method}.svdb_query.annotate_cnv.cnv_loh_genes_all.fp_tag.vcf.gz.tbi

mosdepth:
  container: "docker://hydragenetics/mosdepth:0.3.2"
  extra: "--no-per-base --fast-mode"

mosdepth_bed:
  container: "docker://hydragenetics/mosdepth:0.3.2"

msisensor_pro:
  container: "docker://hydragenetics/msisensor_pro:1.1.a"
  extra: "-c 50"

multiqc:
  container: "docker://hydragenetics/multiqc:1.21"
  reports:
    DNA:
      config: "config/reports/multiqc_config_dna.yaml"
      included_unit_types: ["N", "T"]
      deduplication: ["mark_duplicates"]
      qc_files:
        - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastq1_fastqc.zip"
        - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastq2_fastqc.zip"
        - "qc/picard_collect_alignment_summary_metrics/{sample}_{type}.alignment_summary_metrics.txt"
        - "qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt"
        - "qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt"
        - "qc/picard_collect_insert_size_metrics/{sample}_{type}.insert_size_metrics.txt"
        - "qc/samtools_stats/{sample}_{type}.samtools-stats.txt"
        - "qc/gatk_calculate_contamination/{sample}_{type}.contamination.table"
    DNA_umi:
      config: "config/reports/multiqc_config_dna.yaml"
      included_unit_types: ["N", "T"]
      deduplication: ["umi"]
      qc_files:
        - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastq1_fastqc.zip"
        - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastq2_fastqc.zip"
        - "qc/picard_collect_alignment_summary_metrics/{sample}_{type}.alignment_summary_metrics.txt"
        - "qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt"
        - "qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt"
        - "qc/picard_collect_insert_size_metrics/{sample}_{type}.insert_size_metrics.txt"
        - "qc/samtools_stats/{sample}_{type}.samtools-stats.txt"
        - "qc/gatk_calculate_contamination/{sample}_{type}.contamination.table"
        - "alignment/fgbio_group_reads_by_umi/{sample}_{type}.umi.histo.tsv"
    RNA:
      config: "config/reports/multiqc_config_rna.yaml"
      included_unit_types: ["R"]
      qc_files:
        - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastq1_fastqc.zip"
        - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastq2_fastqc.zip"
        - "qc/mosdepth/{sample}_{type}.mosdepth.global.dist.txt"
        - "qc/mosdepth/{sample}_{type}.mosdepth.region.dist.txt"
        - "qc/picard_collect_alignment_summary_metrics/{sample}_{type}.alignment_summary_metrics.txt"
        - "qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt"
        - "qc/picard_collect_hs_metrics/{sample}_{type}.HsMetrics.txt"
        - "qc/samtools_stats/{sample}_{type}.samtools-stats.txt"
        - "qc/mosdepth/{sample}_{type}.regions.bed.gz"

optitype:
  #container: "docker://hydragenetics/optitype:1.3.5"
  container: "docker://fred2/optitype:release-v1.3.1"
  sample_type: "-d"
  enumeration: 4

picard_collect_alignment_summary_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

picard_collect_hs_metrics:
  container: "docker://hydragenetics/picard:2.25.0"
  extra: "COVERAGE_CAP=50000"

picard_collect_duplication_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

picard_collect_insert_size_metrics:
  container: "docker://hydragenetics/picard:2.25.0"

picard_mark_duplicates:
  container: "docker://hydragenetics/picard:2.25.0"

purecn:
  container: "docker://hydragenetics/purecn:2.2.0"
  genome: "hg19"
  interval_padding: 100
  segmentation_method: "internal"
  fun_segmentation: "PSCBS"

purecn_coverage:
  container: "docker://hydragenetics/purecn:2.2.0"

purecn_purity_file:
  min_purity: 0.01

report_fusions:
  fusioncatcher_flag_low_support: 15
  fusioncatcher_low_support: 3
  fusioncatcher_low_support_inframe: 6
  star_fusion_flag_low_support: 15
  star_fusion_low_support: 2
  star_fusion_low_support_inframe: 6

report_gene_fuse:
  min_unique_reads: 6

sample_mixup_check:
  match_cutoff: 0.7

samtools_merge_bam:
  extra: "-c -p"

samtools_merge_bam_umi:
  extra: "-c -p"

scarhrd:
  container: "docker://hydragenetics/scarhrd:20200825"
  seqz: FALSE

scanitd:
  container: "docker://hydragenetics/scanitd:0.9.2"

seqtk_subsample:
  container: "docker://hydragenetics/seqtk:1.4"
  nr_reads: 100000000
  nr_reads_rna: 100000000

somalier_ungrouped_extract:
  container: "docker://hydragenetics/somalier:0.2.18"
  env: SOMALIER_SAMPLE_NAME={sample}_{type}

somalier_ungrouped_relate:
  container: "docker://hydragenetics/somalier:0.2.18"

somalier_best_match_report:
  match_cutoff: 0.7

star:
  container: "docker://hydragenetics/star:2.7.10a"

star_fusion:
  container: "docker://hydragenetics/star-fusion:1.10.1"
  extra: "--examine_coding_effect"

svdb_merge:
  container: "docker://hydragenetics/svdb:2.6.0"
  tc_method:
    - name: pathology_purecn
      cnv_caller:
        - cnvkit
        - gatk
        - jumble
      priority: "cnvkit,gatk,jumble"
    - name: purecn
      cnv_caller:
        - cnvkit
        - gatk
        - jumble
      priority: "cnvkit,gatk,jumble"
    - name: pathology
      cnv_caller:
        - cnvkit
        - gatk
        - jumble
      priority: "cnvkit,gatk,jumble"
  overlap: 1 #Just merge the two vcf-files without merging regions
  extra: "--pass_only" #Just merge the two vcf-files without merging regions

svdb_query:
  container: "docker://hydragenetics/svdb:2.6.0"

tmb:
  af_lower_limit: 0.05
  af_upper_limit: 0.95
  af_germline_lower_limit: 0.47
  af_germline_upper_limit: 0.53
  artifacts: ""
  background_panel: ""
  db1000g_limit: 0.0001
  dp_limit: 100
  gnomad_limit: 0.0001
  vd_limit: 10
  nr_avg_germline_snvs: 2.0
  nssnv_tmb_correction: 0.84
  variant_type_list: ["missense_variant", "stop_gained", "stop_lost"]

tmb_umi:
  af_lower_limit: 0.003
  af_upper_limit: 0.997
  af_germline_lower_limit: 0.40
  af_germline_upper_limit: 0.60
  background_panel: ""
  db1000g_limit: 0.0001
  dp_limit: 100
  gnomad_limit: 0.0001
  vd_limit: 10
  nr_avg_germline_snvs: 0.0
  nssnv_tmb_correction: 1.09
  filter_nr_observations: 3
  variant_type_list: ["missense_variant", "stop_gained", "stop_lost", "synonymous_variant", "stop_retained_variant"]

vardict:
  container: "docker://hydragenetics/vardict:1.8.3"
  allele_frequency_threshold: "0.01"
  allele_frequency_threshold_umi: "0.001"
  bed_columns: "-c 1 -S 2 -E 3 -g 4"
  extra: "-Q 1"

vep:
  container: "docker://hydragenetics/vep:105"
  mode: --offline --cache --refseq

vep_wo_pick:
  mode: --offline --cache --refseq

vt_decompose:
  container: "docker://hydragenetics/vt:2015.11.10"

vt_normalize:
  container: "docker://hydragenetics/vt:2015.11.10"

whatshap_phase:
  container: "docker://hydragenetics/whatshap:2.8"

resources.yaml

The resources.yaml is located under config/. The file declares default resources used by rules as well as resources for specific rules that needs more resources than allocated by default. See further pipeline configuration.

# ex, default resources
default_resources:
  threads: 1
  time: "4:00:00"
  mem_mb: 6144
  mem_per_cpu: 6144
  partition: "low"

# ex, rule override
vardict:
  time: "48:00:00"


Expand to view current resources.yaml
default_resources:
  threads: 1
  time: "6:00:00"
  mem_mb: 6144
  mem_per_cpu: 6144
  partition: "low"

arriba:
  threads: 5
  time: "8:00:00"
  mem_mb: 30720
  mem_per_cpu: 6144

bwa_mem:
  mem_mb: 61440
  mem_per_cpu: 6144
  threads: 10
  time: "24:00:00"

bwa_mem_realign_consensus_reads:
  mem_mb: 61440
  mem_per_cpu: 6144
  threads: 10
  time: "24:00:00"

fastp_pe:
  threads: 5
  mem_mb: 30720
  mem_per_cpu: 6144

fastp_pe_arriba:
  threads: 5
  mem_mb: 30720
  mem_per_cpu: 6144

fgbio_call_and_filter_consensus_reads:
  threads: 3
  mem_mb: 18432
  mem_per_cpu: 6144
  time: "24:00:00"

fgbio_group_reads_by_umi:
  time: "24:00:00"

fuseq_wes:
  threads: 2
  mem_mb: 12288
  time: "48:00:00"

fusioncatcher:
  threads: 10
  time: "16:00:00"
  mem_mb: 61440
  mem_per_cpu: 6144

fastqc:
  threads: 2
  mem_mb: 12288
  mem_per_cpu: 6144

juli_call:
  threads: 10
  mem_mb: 61440
  mem_per_cpu: 6144
  time: "48:00:00"

jumble_run:
  threads: 10

manta_run_workflow_t:
  threads: 4
  time: "8:00:00"
  mem_mb: 24576
  mem_per_cpu: 6144

gatk_mutect2_gvcf:
  time: "8:00:00"

gatk_mutect2:
  time: "8:00:00"

gene_fuse:
  threads: 6
  time: "8:00:00"
  mem_mb: 36864
  mem_per_cpu: 6144

optitype:
  threads: 6
  time: "4:00:00"
  mem_mb: 36864
  mem_per_cpu: 6144

star:
  threads: 8
  time: "8:00:00"
  mem_mb: 49152
  mem_per_cpu: 6144

star_fusion:
  threads: 8
  time: "16:00:00"
  mem_mb: 49152
  mem_per_cpu: 6144

vardict:
  time: "48:00:00"

vep:
  threads: 5
  time: "6:00:00"
  mem_mb: 30720
  mem_per_cpu: 6144

profile yaml

Profiles are saved in yaml files and used to control how snakemake will be executed, if jobs will be submitted to a cluster, use singularity, restart on failure and so forth. It also forward requested resources to drmaa using a drmaa variable.

# ex, snakemake settings
jobs: 100
keep-going: True
restart-times: 2
rerun-incomplete: True
use-singularity: True
configfile: "config/config.yaml"
singularity-args: "-e --cleanenv -B /projects -B /data -B /beegfs
# ex, drmaa settings
drmaa: " -A wp1 -N 1-1 -t {resources.time} -n {resources.threads} --mem={resources.mem_mb} --mem-per-cpu={resources.mem_per_cpu} --mem-per-cpu={resources.mem_per_cpu} --partition={resources.partition} -J {rule} -e slurm_out/{rule}_%j.err -o slurm_out/{rule}_%j.out"
drmaa-log-dir: "slurm_out"
default-resources: [threads=1, time="04:00:00", partition="low", mem_mb="3074", mem_per_cpu="3074"]
singularity-prefix: "/path/to/singularity_cache/"
wrapper-prefix: "https://github.com/hydra-genetics/snakemake-wrappers/raw/"

samples.tsv and units.tsv

The samples.tsv and units.tsv are input files that must be generated before running the pipeline and should in general be located in the base folder of the analysis folder, can be changed in the config.yaml. See further running the pipeline and create input files.

Example samples.tsv

sample tumor_content
NA12878 0.5
NA12879 0
NA12880 1

Example units.tsv

sample type machine platform flowcell lane barcode fastq1 fastq2 adapter
NA12878 T NDX550407_RUO NextSeq HKTG2BGXG L001 ACGGAACA+ACGAGAAC fastq/NA12878_fastq1.fastq.gz fastq/NA12878_fastq2.fastq.gz ACGT,ACGT
NA12879 N NDX550407_RUO NextSeq HKTG2BGXG L001 TCGGAACT+TCGAGAAT fastq/NA12879_fastq1.fastq.gz fastq/NA12879_fastq2.fastq.gz ACGT,ACGT
NA12880 R NDX550407_RUO NextSeq HKTG2BGXG L002 GCGGAACG+GCGAGAAG fastq/NA12880_fastq1.fastq.gz fastq/NA12880_fastq2.fastq.gz ACGT,ACGT