To perform hybrid assembly:
aviary assemble -1 *.1.fq.gz -2 *.2.fq.gz --longreads *.nanopore.fastq.gz --long_read_type ont -t 24 -n 48
Aviary is compatible with both Nanopore and PacBio long read technologies. Note: Aviary can also perform assembly using just short or long reads as well.
aviary assemble -1 *.1.fq.gz -2 *.2.fq.gz -t 24 -n 48
OR
aviary assemble --longreads *.nanopore.fastq.gz --long_read_type ont -t 24 -n 48
To perform mag recovery:
aviary recover --assembly scaffolds.fasta -1 sr1.1.fq sr2.1.fq.gz -2 sr1.2.fq sr2.2.fq.gz --longreads nanopore.fastq.gz -z ont --output output_dir/ --max_threads 12 --n_cores 24 --gtdb_path /path/to/gtdb/release/
If no assembly file is provided, then aviary will first perform the assembly pipeline to produce an assembly using the input reads.
If at any point the Aviary workflow is interrupted, the pipeline can be restarted and pick up from the last completed step.
Aviary allows users to supply a batch file to the aviary batch
command. This will cause aviary to run on every line within
the input batch file individually. Example batch files can be found at here and here.
Often users are required to send long running jobs off on to high performance clusters. Aviary and snakemake are
perfectly compatible with clusters and can be sent off as either a single pipeline via PBS script or equivalent.
Alternatively, snakemake can send individual jobs in a pipeline off into a cluster to share the load across nodes.
You can make use of this feature in Aviary via the --snakemake-cmds
parameter, E.g.
aviary assemble -1 *.1.fq.gz -2 *.2.fq.gz --longreads *.nanopore.fastq.gz --long_read_type ont -t 24 -p 24 -n 24 --snakemake-cmds '--cluster qsub '
NOTE: The space after --cluster qsub
is required due to a strange quirk in how python's argparse
module works.
Often users are required to send long running jobs off on to high performance clusters. Aviary and snakemake are
perfectly compatible with clusters and can be sent off as either a single pipeline via PBS script or equivalent.
Alternatively, snakemake can send individual jobs in a pipeline off into a cluster to share the load across nodes.
You can make use of this feature in Aviary by providing a --snakemake-profile cluster
, where cluster
refers to
a Snakemake profile at ~/.config/snakemake/cluster/config.yaml
. An example config.yaml
is provided below.
Additionally, --cluster-retries
can be used to set the # of retries per job, with automatically increasing time
and memory requirements. CPU and memory bounds provided to Aviary are used as a hard-cap for job submission.
cluster: qsub
cluster-status: qstat
jobs: 10000
cluster-cancel: qdel
Aviary will produce a variety of different outputs depending on the parameters provided. The following is a list of the expected outputs from the different subcommands.
In general, you will find rule benchmarks in the benchmarks/
folder. Logs and error messages will be found in the logs/
folder.
The data/
folder contains various output from the different programs that have run. Aviary attempts to keep this folder clean, but there can be a lot of superfluous content in here. The data/
folder is also where the final outputs from the pipeline will be found, they will be symlinked out of this folder into their respective output folders.
assembly/final_contigs.fasta
- The assembled genome in FASTA formatwww/assembly_stats.txt
- A summary of the assembly statisticswww/fastqc
- A folder containing the output from fastqcwww/nanoplot
- A folder containing the output from NanoPlotIf assembly is performed during the recover process, then the outputs from the assembly step will also be present.
bins/bin_info.tsv
- A file containing all of the information about the recovered MAGs. Taxonomy, QC, size, N50, etc.bins/checkm_minimal.tsv
- A minimal version of the bin_info.tsv
file.bins/coverm_abundances.tsv
- The abundance of each MAG in each sample.bins/final_bins
- A folder containing the final MAGs in FASTA format.diversity/singlem_out
- A folder containing the output from singlem.taxonomy/
- A folder containing the output from GTDB-tkannotation/
- A folder containing the output from EggNOG-mapper.Upon first running Aviary, you will be prompted to input the location for several database folders if
they haven't already been provided. If at any point the location of these folders change you can
use the the aviary configure
module to update the environment variables used by aviary.
These environment variables can also be configured manually, just set the following variables in your .bashrc
file:
GTDBTK_DATA_PATH
EGGNOG_DATA_DIR
SINGLEM_METAPACKAGE_PATH
CHECKM2DB
CONDA_ENV_PATH
Aviary has three thread contol options:
-t, --threads
-n, --n-cores, --n_cores
--threads
, then potentially
multiple programs will run concurrently providing a great boost in performance. If this value is not set then it defaults
to being the same value as --threads
-p, --pplacer-threads, --pplacer_threads
When performing assembly, users are required to estimate how much RAM they will need to use via -m, --max-memory, --max_memory
.
With HPC cluster submission (see above), requested job memory is increased with each rerun and capped at max_memory
.
By default, Aviary will use /tmp
to store temporary files during many processes throughtout assembly and MAG recovery.
If you would like to use a different directory, you can specify this by using the flexible --tmp
parameter.
If you would permanently like to change the temporary directory, you can use aviary configure --tmp /new/tmp
to
change the TMPDIR
environment variable within your current conda environment.
Often users may not want to run a complete aviary module, as such specific rules can be targeted via the -w, --workflow
parameter. For example, if a user wanted to only run a specific binning algorithm then that rule can be specified directly:
aviary recover -w rosella --assembly scaffolds.fasta -1 sr1.1.fq sr2.1.fq.gz -2 sr1.2.fq sr2.2.fq.gz --longreads nanopore.fastq.gz --output output_dir/ --max_threads 12
NOTE: Every step up to the targeted rule still has to be run if it hasn't been run before. The specific rules that can be used can be found within each modules specific snakemake file.
Powered by Doctave