vireadb package

ViReaDB is a user-friendly database for storing reference-compressed viral sequence data and computing consensus genome sequences.

Module contents

class vireadb.ViReaDB(db_fn, bufsize=1048576)[source]

Bases: object

ViReaDB database class

add_all_entries(other, check_meta=True, check_unique=True, commit=True)[source]

Add all entries from another ViReaDB database into this one

Args:

other (vireadb.ViReaDB): The other database from which to add all entries

check_meta (bool): Check that the metadata are identical across the two databases. Should only be skipped if user is already guaranteed that they match

check_unique (bool): Check that every ID is unique (i.e., no IDs in other already exist in the calling object). Should only be skipped if user is already guaranteed to not have duplicates

commit (bool): Commit database after removing this entry

add_entry(ID, reads_fn, filetype=None, lossy_names=True, include_unmapped=False, check_unique=True, bufsize=1048576, threads=1, commit=True, verbose=False)[source]

Add a CRAM/BAM/SAM/FASTQ entry to this database. CRAM inputs are added exactly as-is.

Args:

ID (str): The unique ID of the entry to add

reads_fn (str): The input reads file. Can provide list of multiple files if FASTQ

filetype (str): The format of the input reads file (CRAM, BAM, SAM, or FASTQ), or None to infer from reads_fn

lossy_names (bool): True to discard read names when both reads of a read-pair are in the same CRAM slicel (results in better compression), otherwise False to keep all read names

include_unmapped (bool): Include unmapped reads when converting from non-CRAM formats

check_unique (bool): Check that ID doesn’t already exist. Should only be skipped if user is already guaranteed to not have duplicates

bufsize (int): Buffer size for reading from file

threads (int): Number of threads to use for compression

commit (bool): Commit database after adding this entry

verbose (bool): True to enable verbose messages (e.g. samtools and minimap2 commands), otherwise False

clear()[source]

Remove all entries from this database

commit()[source]

Commit the SQLite3 database

compute_consensus(ID, min_depth=10, min_freq=0.5, ambig='N', remove_gaps=True, overwrite=False, commit=True)[source]

Compute the consensus sequence for a given entry. The position and insertion counts must have already been computed

Args:

ID (str): The unique ID of the entry whose counts to compute

min_depth (int): Minimum depth to call base/insertion in consensus

min_freq (float): Minimum frequency [0,1] to call base/insertion in consensus

ambig (str): Symbol to use for ambiguous bases in consensus

remove_gaps (bool): Remove gap characters (-) from consensus

overwrite (bool): True to recompute (and overwrite) counts if they already exist

commit (bool): Commit database after updating this entry

compute_counts(ID, min_qual=20, bufsize=1048576, overwrite=False, commit=True)[source]

Compute position and insertion counts for a given entry

Args:

ID (str): The unique ID of the entry whose counts to compute

min_qual (int): Minimum base quality to count base

bufsize (int): Buffer size for reading from file

overwrite (bool): True to recompute (and overwrite) counts if they already exist

commit (bool): Commit database after updating this entry

del_entry(ID)[source]

Remove an entry to this database

Args:

ID (str): The unique ID of the entry to remove

del_reads(ID, confirm=True, commit=True)[source]

Remove the reads from a given entry in this database in order to save space. This should only be done if the counts have already been computed (and even then, this is strongly discouraged).

Args:

ID (str): The unique ID of the entry to remove

confirm (bool): True to prompt the user for confirmation before removing the reads, otherwise False to remove reads silently (e.g. for automation)

commit (bool): Commit database after removing this entry

export_cram(ID, out_fn, overwrite=False)[source]

Export the CRAM file of a given entry

Args:

ID (str): The unique ID of the entry whose CRAM to export

out_fn (str): The path of the output CRAM file

overwrite (bool): Overwrite output file if it exists

export_fasta(out_fn, IDs=None, overwrite=False)[source]

Export multiple consensus sequences as a FASTA file

Args:

out_fn (str): The path of the output FASTA file

IDs (list): List of IDs whose consensus sequences to export, or None to export all consensus sequences in the database

overwrite (bool): Overwrite output file if it exists

get_IDs()[source]

Return the IDs in this database

Returns:

list object containing all of the IDs in this database

get_consensus(ID)[source]

Return the consensus sequence for a given entry

Args:

ID (str): The unique ID of the entry whose counts to return

Returns:

The consensus sequence for ID as a FASTA string (or None if not yet computed)

get_counts(ID)[source]

Return the position and insertion counts for a given entry

Args:

ID (str): The unique ID of the entry whose counts to return

Returns:

The position counts for ID (or None if not yet computed)

The insertion counts for ID (or None if not yet computed)

get_entry(ID)[source]

Return the data of an entry associated with a given ID in this database

Args:

ID (str): The unique ID of the entry to retrieve

Returns:

bytes object containing the CRAM data of the reads

numpy.array object containing the position counts

dict object containing the insertion counts

str object containing the consensus sequence

get_meta()[source]

Get the metadata from this ViReaDB database

Returns:

dict object containing the metadata of this ViReaDB database

rename_entry(old_ID, new_ID, commit=True, vacuum=False)[source]

Rename an entry in this database

Args:

old_ID (str): The original ID of the entry to rename

new_ID (str): The new ID to rename the entry

commit (bool): Commit database after renaming this entry

vacuum (bool): Vacuum database after renaming this entry (to minimize database filesize)

vacuum()[source]

Rebuild the database file, repacking it into the minimal amount of disk space

vireadb.create_db(db_fn, ref_fn, overwrite=False, bufsize=1048576)[source]

Create a new ViReaDB database

Args:

db_fn (str): The filename of the SQLite3 database file representing this database

ref_fn (str): The filename of the viral reference genome to use for this database

overwrite (bool): Overwrite db_fn if it already exists

bufsize (int): Buffer size for reading from file

Returns:

ViReaDB object

vireadb.load_db(db_fn)[source]

Load a ViReaDB database from file

Args:

db_fn (str): The filename of the SQLite3 database file representing this database

Returns:

ViReaDB object

vireadb.merge_dbs(out_db_fn, in_db_fns, check_meta=True, overwrite=False)[source]

Merge multiple ViReaDB databases

Args:

out_db_fn (str): The filename of the SQLite3 database file representing the output database

in_db_fns (list): The filenames of the SQLite3 databases representing the input databases

check_meta (bool): Check that the metadata are identical across the databases. Should only be skipped if user is already guaranteed that they match

overwrite (bool): Overwrite db_fn if it already exists

Returns:

ViReaDB object