Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .test/config/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,11 @@ fastani:
rgi:
skip: False
extra: "--clean --alignment_tool DIAMOND"

synteny:
skip: False
divergence: 1
prefix: "ntSynt"
extra: ""
viz_scale: "1e6"
viz_extra: "--normalize --format pdf"
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,4 +117,8 @@ snakemake --cores 2 --sdm conda apptainer --directory .test

> Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G, Lees JA, Gladstone RA, Lo S, Beaudoin C, Floto RA, Frost SDW, Corander J, Bentley SD, Parkhill J. _Producing polished prokaryotic pangenomes with the Panaroo pipeline_. Genome Biol. 21(1):180, **2020**. PMID: 32698896. https://doi.org/10.1186/s13059-020-02090-4.

> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. _Sustainable data analysis with Snakemake_. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2.
> Köster J., Mölder F., Jablonski K. P., Letcher B., Hall M. B., Tomkins-Tinch C. H., Sochat V., Forster J., Lee S., Twardziok S. O., Kanitz A., Wilm A., Holtgrewe M., Rahmann S., & Nahnsen S. _Sustainable data analysis with Snakemake_. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2.

> Coombe L, Kazemi P, Wong J, Birol I, Warren RL. _ntSynt: multi-genome synteny detection using minimizer graph mappings_. BMC Biology. 23:367, **2025**. https://doi.org/10.1186/s12915-025-02455-w

> Coombe L, Warren RL, Birol I. _ntSynt-viz: Visualizing synteny patterns across multiple genomes_. bioRxiv 2025.01.15.633221. https://doi.org/10.1101/2025.01.15.633221
8 changes: 8 additions & 0 deletions config/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,11 @@ fastani:
rgi:
skip: False
extra: "--clean --alignment_tool DIAMOND"

synteny:
skip: False
divergence: 1
prefix: "ntSynt"
extra: ""
viz_scale: "1e6"
viz_extra: "--normalize --format pdf"
28 changes: 28 additions & 0 deletions config/schemas/config.schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,33 @@ properties:
type: string
description: Extra command-line arguments for RGI
default: "--clean --alignment_tool DIAMOND"
synteny:
type: object
properties:
skip:
type: boolean
description: Whether to skip synteny analysis
default: false
divergence:
type: number
description: Maximum divergence range between genomes for synteny analysis
default: 1
prefix:
type: string
description: Prefix for synteny output files
default: "ntSynt"
extra:
type: string
description: Extra command-line arguments for synteny analysis
default: ""
viz_scale:
type: string
description: Scale for synteny visualization (e.g., "1e6")
default: "1e6"
viz_extra:
type: string
description: Extra command-line arguments for synteny visualization
default: "--normalize --format pdf"
required:
- samplesheet
- tool
Expand All @@ -155,3 +182,4 @@ required:
- panaroo
- fastani
- rgi
- synteny
190 changes: 83 additions & 107 deletions resources/images/dag.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion workflow/envs/bakta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ channels:
- bioconda
- nodefaults
dependencies:
- bakta=1.11.4
- bakta=1.12
8 changes: 8 additions & 0 deletions workflow/envs/ntsynt.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: bakta
Comment thread
rabioinf marked this conversation as resolved.
Outdated
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- ntsynt=1.0.5
- ntsynt-viz=1.0.4
12 changes: 6 additions & 6 deletions workflow/rules/annotate.smk
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
rule get_fasta:
rule get_pgap_fasta:
input:
get_fasta,
output:
Expand All @@ -15,7 +15,7 @@ rule get_fasta:

rule prepare_yaml_files:
input:
fasta=rules.get_fasta.output.fasta,
fasta=rules.get_pgap_fasta.output.fasta,
output:
input_yaml="results/annotation/pgap/prepare_files/{sample}/input.yaml",
submol_yaml="results/annotation/pgap/prepare_files/{sample}/submol.yaml",
Expand All @@ -39,7 +39,7 @@ rule annotate_pgap:
branch(
lookup(dpath="pgap/use_yaml_config", within=config),
then=rules.prepare_yaml_files.output.input_yaml,
otherwise=rules.get_fasta.output.fasta,
otherwise=rules.get_pgap_fasta.output.fasta,
),
output:
gff="results/annotation/pgap/{sample}/{sample}.gff",
Expand Down Expand Up @@ -78,7 +78,7 @@ rule annotate_pgap:

rule annotate_prokka:
input:
fasta=rules.get_fasta.output.fasta,
fasta=get_fasta,
output:
gff="results/annotation/prokka/{sample}/{sample}.gff",
fasta="results/annotation/prokka/{sample}/{sample}.fna",
Expand Down Expand Up @@ -142,15 +142,15 @@ rule get_bakta_db:
else
echo 'Using Bakta DB from supplied input dir: {params.existing_db}' > {log};
ln -s {params.existing_db} {output.db};
echo 'Update ARMFinderPlus DB using supplied input dir: {params.existing_db}' >> {log};
echo 'Update AMRFinderPlus DB using supplied input dir: {params.existing_db}' >> {log};
amrfinder_update --force_update --database {params.existing_db}/amrfinderplus-db &>> {log}
fi
"""


rule annotate_bakta:
input:
fasta=rules.get_fasta.output.fasta,
fasta=get_fasta,
db=rules.get_bakta_db.output.db,
output:
gff="results/annotation/bakta/{sample}/{sample}.gff",
Expand Down
9 changes: 9 additions & 0 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# import basic packages
import glob
import os
import pandas as pd
import re
from snakemake import logging
Expand Down Expand Up @@ -75,6 +77,13 @@ def get_final_input(wildcards):
sample=samples.index,
ext=["txt", "json"],
)
if not config["synteny"]["skip"]:
if len(samples.index) > 1 or (
len(samples.index) == 1 and config["reference"]["fasta"] != ""
):
inputs += expand(
f"results/qc/genome_synteny/{config["synteny"].get("prefix", "ntSynt")}_ribbon-plot.pdf",
)
return inputs


Expand Down
97 changes: 96 additions & 1 deletion workflow/rules/qc.smk
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ rule panaroo:

rule rgi_detection:
input:
fasta=rules.get_fasta.output.fasta,
fasta=get_fasta,
output:
multiext("results/qc/rgi/{sample}/result", ".txt", ".json"),
log:
Expand All @@ -141,3 +141,98 @@ rule rgi_detection:
"""--- Running RGI to detect antibiotic resistance genes ---"""
wrapper:
"https://raw.githubusercontent.com/MPUSP/mpusp-snakemake-wrappers/refs/heads/main/rgi"


rule synteny_detection:
input:
fastas=get_all_fasta,
output:
tsv=f"results/qc/genome_synteny/{config["synteny"].get("prefix", "ntSynt")}.synteny_blocks.tsv",
Comment thread
rabioinf marked this conversation as resolved.
Outdated
fai=directory("results/qc/genome_synteny/fai"),
log:
"results/qc/genome_synteny/logs/ntSynt.log",
conda:
"../envs/ntsynt.yml"
threads: workflow.cores
Comment thread
rabioinf marked this conversation as resolved.
params:
outdir=lambda wc, output: os.path.dirname(output.tsv),
divergence=config["synteny"]["divergence"],
prefix=config["synteny"].get("prefix", "ntSynt"),
message:
"""--- Running ntSynt for multi-genome macrosynteny synteny detection ---"""
shell:
"""
ntSynt {input.fastas} \
-d {params.divergence} \
-t {threads} \
--force \
-p {params.prefix} \
> {log} 2>&1;
echo "Synteny detection completed. Moving results to output directory." >> {log};
rsync ./{params.prefix}.* {params.outdir}/;
echo "Create fai output directory." >> {log};
mkdir -p {output.fai};
rsync ./*.fai {output.fai}/;
echo "Remove intermediate files." >> {log};
rm -f ./*.fai ./*.tsv ./*.bf ./*.dot
"""


rule prepare_ntsynt_names:
output:
"results/qc/genome_synteny/ntSynt-viz_name_conversion.tsv",
log:
"results/qc/genome_synteny/logs/prepare_ntSynt-viz_names.log",
conda:
"../envs/base.yml"
threads: 1
params:
sample_sheet=config["samplesheet"],
message:
"""--- Preparing name mapping file for ntSynt visualization ---"""
script:
"../scripts/prepare_ntSynt_viz_names.py"


rule viz_synteny:
input:
blocks=rules.synteny_detection.output.tsv,
fai=rules.synteny_detection.output.fai,
names=rules.prepare_ntsynt_names.output,
output:
pdf=f"results/qc/genome_synteny/{config["synteny"].get("prefix", "ntSynt")}_ribbon-plot.pdf",
log:
"results/qc/genome_synteny/logs/ntSynt-viz.log",
conda:
"../envs/ntsynt.yml"
threads: 1
params:
outdir=lambda wc, output: os.path.dirname(output.pdf),
fais=lambda wc, input: " ".join(glob.glob(os.path.join(input.fai, "*.fai"))),
scale=config["synteny"]["viz_scale"],
ref_fasta=(
" ".join(["--target-genome", config["reference"]["fasta"]])
if config["reference"]["fasta"]
else []
),
prefix=config["synteny"].get("prefix", "ntSynt"),
extra=config["synteny"]["viz_extra"],
message:
"""--- Running ntSynt-viz to generate multi-genome ribbon plots ---"""
shell:
"""
ntsynt_viz.py \
--blocks {input.blocks} \
--fais {params.fais} \
--name_conversion {input.names} \
{params.ref_fasta} \
--scale {params.scale} \
--prefix {params.prefix} \
{params.extra} \
> {log} 2>&1;
echo "Synteny-viz completed. Moving results to output directory." >> {log};
rsync ./{params.prefix}.* {params.outdir}/;
rsync ./{params.prefix}_* {params.outdir}/;
echo "Clean intermediate files." >> {log};
rm -f ./{params.prefix}.*.tsv ./{params.prefix}_*;
"""
33 changes: 33 additions & 0 deletions workflow/scripts/prepare_ntSynt_viz_names.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# PREPARE NTSYNT-VIZ NAME MAPPING
# -----------------------------------------------------------------------------
#
# This script prepares a mapping of sample names to the names to be used in
# ntSynt-viz. This is needed to ensure that the sample names in the ntSynt-viz
# ribbon plot are the same as the sample names in the sample sheet.

import os
import sys
import pandas as pd

sys.stderr = open(snakemake.log[0], "w", buffering=1)
sample_sheet = snakemake.params["sample_sheet"]
outfile = snakemake.output[0]

# read sample sheet
try:
df_samples = pd.read_csv(sample_sheet)
sys.stderr.write(f"Read sample sheet from {snakemake.params['sample_sheet']}\n")
except Exception as e:
sys.stderr.write(
f"Error reading sample sheet from {snakemake.params['sample_sheet']}: {e}\n"
)

df_samples["file"] = df_samples["file"].apply(lambda x: os.path.basename(x))

try:
df_samples[["file", "sample"]].to_csv(outfile, sep="\t", index=False, header=False)
sys.stderr.write(f"Wrote ntSynt-viz sample name mapping to {outfile}\n")
except Exception as e:
sys.stderr.write(
f"Error writing ntSynt-viz sample name mapping to {outfile}: {e}\n"
)
Loading