title: rosella refine usage

====

rosella refine - Refine MAGs from contigs using UMAP and HDBSCAN clustering. (version 0.5.3)

SYNOPSIS

rosella refine [FLAGS]

DESCRIPTION

rosella refine is a tool for recovering MAGs from contigs using UMAP and HDBSCAN clustering.

FLAGS

-v, --verbose

: Print extra debugging information. [default: not set]

<!-- -->

-q, --quiet

: Unless there is an error, do not print log messages. [default: not set]

BINNING PARAMETERS

-o, --output-directory PATH

: Output directory for binning results. [default: rosella_output]

<!-- -->

-C, --coverage-file PATH

: The output from the results of CoverM contig in MetaBAT mode on the provided assembly and short read samples. If not provided, rosella will calculate coverage values.

<!-- -->

-K, --kmer-frequency-file PATH

: The kmer frequency table created by rosella.

<!-- -->

--min-contig-size INT

: Minimum contig size in base pairs to be considered for binning. Contigs between 1000 bp and this value will be recovered in the contig rescue stage if multiple samples are available. [default: [default: 1500]]

<!-- -->

--min-bin-size INT

: Minimum bin size in base pairs for MAG to be reported. If a bin is smaller than this, then it will be split apart and the contigs will be added to other bins. [default: [default: 100000]]

<!-- -->

-k, --kmer-size INT

: Kmer size to use for kmer frequency table. [default: [default: 4]]

<!-- -->

-n, --n-neighbours --n-neighbours

: Number of neighbors used in the UMAP algorithm. [default: [default: 100]]

<!-- -->

--max-retries INT

: Maximum number of times to retry refining a genome if it fails. [default: [default: 5]]

INPUT ASSEMBLY OPTION

-r,-f, --assembly,--reference PATH

: FASTA files of contigs e.g. concatenated genomes or metagenome assembly [required unless -d/--genome-fasta-directory is specified]

THREADING OPTIONS

-t, --threads INT

: Maximum number of threads used. [default: 10]

GENOME REFINING OPTIONS

-d, --genome-fasta-directory PATH

: Directory containing FASTA files of contigs [required unless -f/--genome-fasta-files is specified]

<!-- -->

-f, --genome-fasta-files PATH

: FASTA files of contigs e.g. concatenated genomes or metagenome assembly [required unless -i/--genome-fasta-directory is specified]

<!-- -->

-x, --genome-fasta-extension STR

: FASTA file extension in --genome-fasta-directory [default "fna"]

<!-- -->

--checkm-results PATH

: CheckM 1 or 2 results that contain information on the completeness and contamination of the input genomes.

<!-- -->

--max-contamination FLOAT

: Maximum contamination of a genome to be included in the refining if CheckM results provided. [default: [default: 15.0]]

<!-- -->

--bin-tag STR

: Tag to use for the refined bins. [default: [default: refined_1]]

MAPPING ALGORITHM OPTIONS

--mapper NAME

: Underlying mapping software used for short reads [default: minimap2-sr]. One of:

name description


minimap2-sr minimap2 with '-x sr' option bwa-mem bwa mem using default parameters bwa-mem2 bwa-mem2 using default parameters minimap2-ont minimap2 with '-x map-ont' option minimap2-pb minimap2 with '-x map-pb' option minimap2-hifi minimap2 with '-x map-hifi' option minimap2-no-preset minimap2 with no '-x' option

--longread-mapper NAME

: Underlying mapping software used for long reads [default: minimap2-ont]. One of:

name description


minimap2-ont minimap2 with '-x map-ont' option minimap2-pb minimap2 with '-x map-pb' option minimap2-hifi minimap2 with '-x map-hifi' option minimap2-no-preset minimap2 with no '-x' option

--minimap2-params PARAMS

: Extra parameters to provide to minimap2, both indexing command (if used) and for mapping. Note that usage of this parameter has security implications if untrusted input is specified. '-a' is always specified to minimap2. [default: none]

<!-- -->

--minimap2-reference-is-index

: Treat reference as a minimap2 database, not as a FASTA file. [default: not set]

<!-- -->

--bwa-params PARAMS

: Extra parameters to provide to BWA or BWA-MEM2. Note that usage of this parameter has security implications if untrusted input is specified. [default: none]

READ MAPPING PARAMETERS

-1 PATH ..

: Forward FASTA/Q file(s) for mapping. These may be gzipped or not.

<!-- -->

-2 PATH ..

: Reverse FASTA/Q file(s) for mapping. These may be gzipped or not.

<!-- -->

-c, --coupled PATH ..

: One or more pairs of forward and reverse possibly gzipped FASTA/Q files for mapping in order <sample1_R1.fq.gz> <sample1_R2.fq.gz> <sample2_R1.fq.gz> <sample2_R2.fq.gz> ..

<!-- -->

--interleaved PATH ..

: Interleaved FASTA/Q files(s) for mapping. These may be gzipped or not.

<!-- -->

--single PATH ..

: Unpaired FASTA/Q files(s) for mapping. These may be gzipped or not.

<!-- -->

--longreads PATH ..

: Longread FASTA/Q files(s) for mapping. These may be gzipped or not.

<!-- -->

-b, --bam-files PATH

: Path to BAM file(s). These must be reference sorted (e.g. with samtools sort) unless --sharded is specified, in which case they must be read name sorted (e.g. with samtools sort -n). When specified, no read mapping algorithm is undertaken.

<!-- -->

-l, --longread-bam-files PATH

: Path to longread BAM file(s). These must be reference sorted (e.g. with samtools sort) unless --sharded is specified, in which case they must be read name sorted (e.g. with samtools sort -n). When specified, no read mapping algorithm is undertaken.

ALIGNMENT THRESHOLDING

--min-read-aligned-length INT

: Exclude reads with smaller numbers of aligned bases. [default: 0]

<!-- -->

--min-read-percent-identity FLOAT

: Exclude reads by overall percent identity e.g. 95 for 95%. [default: 0]

<!-- -->

--min-read-aligned-percent FLOAT

: Exclude reads by percent aligned bases e.g. 95 means 95% of the read's bases must be aligned. [default: 0]

<!-- -->

--min-read-aligned-length-pair INT

: Exclude pairs with smaller numbers of aligned bases. Implies --proper-pairs-only. [default: 0]

<!-- -->

--min-read-percent-identity-pair FLOAT

: Exclude pairs by overall percent identity e.g. 95 for 95%. Implies --proper-pairs-only. [default: 0]

<!-- -->

--min-read-aligned-percent-pair FLOAT

: Exclude reads by percent aligned bases e.g. 95 means 95% of the read's bases must be aligned. Implies --proper-pairs-only. [default: 0]

<!-- -->

--proper-pairs-only

: Require reads to be mapped as proper pairs. [default: not set]

<!-- -->

--exclude-supplementary

: Exclude supplementary alignments. [default: not set]

<!-- -->

--include-secondary

: Include secondary alignments. [default: not set]

<!-- -->

--contig-end-exclusion INT

: Exclude bases at the ends of reference

sequences from calculation [default: 75]

--trim-min FLOAT

: Remove this smallest fraction of positions

when calculating trimmed_mean [default: 5]

--trim-max FLOAT

: Maximum fraction for trimmed_mean

calculations [default: 95]

EXIT STATUS

0

: Successful program execution.

<!-- -->

1

: Unsuccessful program execution.

<!-- -->

101

: The program panicked.

AUTHOR

Rhys J. P. Newell, Centre for Microbiome Research, School of Biomedical Sciences, Faculty of Health, Queensland University of Technology <rhys.newell94 near gmail.com>

Powered by Doctave