To perform hybrid assembly:
aviary assemble -1 *.1.fq.gz -2 *.2.fq.gz --longreads *.nanopore.fastq.gz --long_read_type ont -t 24 -n 48
Aviary is compatible with both Nanopore and PacBio long read technologies. Note: Aviary can also perform assembly using just short or long reads as well.
aviary assemble -1 *.1.fq.gz -2 *.2.fq.gz -t 24 -n 48
OR
aviary assemble --longreads *.nanopore.fastq.gz --long_read_type ont -t 24 -n 48
To perform mag recovery:
aviary recover --assembly scaffolds.fasta -1 sr1.1.fq sr2.1.fq.gz -2 sr1.2.fq sr2.2.fq.gz --longreads nanopore.fastq.gz -z ont --output output_dir/ --max_threads 12 --n_cores 24 --gtdb_path /path/to/gtdb/release/
If no assembly file is provided, then aviary will first perform the assembly pipeline to produce an assembly using the input reads.
If at any point the Aviary workflow is interrupted, the pipeline can be restarted and pick up from the last completed step.
Aviary allows users to supply a batch file to the aviary batch
command. This will cause aviary to run on every line within
the input batch file individually. Example batch files can be found at here and here.
Often users are required to send long running jobs off on to high performance clusters. Aviary and snakemake are
perfectly compatible with clusters and can be sent off as either a single pipeline via PBS script or equivalent.
Alternatively, snakemake can send individual jobs in a pipeline off into a cluster to share the load across nodes.
You can make use of this feature in Aviary via the --snakemake-cmds
parameter, E.g.
aviary assemble -1 *.1.fq.gz -2 *.2.fq.gz --longreads *.nanopore.fastq.gz --long_read_type ont -t 24 -p 24 -n 24 --snakemake-cmds '--cluster qsub '
NOTE: The space after --cluster qsub
is required due to a strange quirk in how python's argparse
module works.
Upon first running Aviary, you will be prompted to input the location for several database folders if
they haven't already been provided. If at any point the location of these folders change you can
use the the aviary configure
module to update the environment variables used by aviary.
These environment variables can also be configured manually, just set the following variables in your .bashrc
file:
GTDBTK_DATA_PATH
BUSCO_DB
CONDA_ENV_PATH
Make sure to reactivate your conda environment or re-source your .bashrc
for aviary to be able to access these variables.
Aviary has three thread contol options:
-t, --threads
-n, --n-cores, --n_cores
--threads
, then potentially
multiple programs will run concurrently providing a great boost in performance. If this value is not set then it defaults
to being the same value as --threads
-p, --pplacer-threads, --pplacer_threads
When performing assembly, users are required to estimate how much RAM they will need to use via -m, --max-memory, --max_memory
By default, Aviary will use /tmp
to store temporary files during many processes throughtout assembly and MAG recovery.
If you would like to use a different directory, you can specify this by using the flexible --tmp
parameter.
If you would permanently like to change the temporary directory, you can use aviary configure --tmp /new/tmp
to
change the TMPDIR
environment variable within your current conda environment.
Often users may not want to run a complete aviary module, as such specific rules can be targeted via the -w, --workflow
parameter. For example, if a user wanted to only run a specific binning algorithm then that rule can be specified directly:
aviary recover -w rosella --assembly scaffolds.fasta -1 sr1.1.fq sr2.1.fq.gz -2 sr1.2.fq sr2.2.fq.gz --longreads nanopore.fastq.gz --output output_dir/ --max_threads 12
NOTE: Every step up to the targeted rule still has to be run if it hasn't been run before. The specific rules that can be used can be found within each modules specific snakemake file.
Powered by Doctave