vireadb package¶
ViReaDB is a user-friendly database for storing reference-compressed viral sequence data and computing consensus genome sequences.
Module contents¶
- class vireadb.ViReaDB(db_fn, bufsize=1048576)[source]¶
Bases:
object
ViReaDB
database class- add_all_entries(other, check_meta=True, check_unique=True, commit=True)[source]¶
Add all entries from another ViReaDB database into this one
- Args:
other
(vireadb.ViReaDB
): The other database from which to add all entriescheck_meta
(bool
): Check that the metadata are identical across the two databases. Should only be skipped if user is already guaranteed that they matchcheck_unique
(bool
): Check that every ID is unique (i.e., no IDs inother
already exist in the calling object). Should only be skipped if user is already guaranteed to not have duplicatescommit
(bool
): Commit database after removing this entry
- add_entry(ID, reads_fn, filetype=None, lossy_names=True, include_unmapped=False, check_unique=True, bufsize=1048576, threads=1, commit=True, verbose=False)[source]¶
Add a CRAM/BAM/SAM/FASTQ entry to this database. CRAM inputs are added exactly as-is.
- Args:
ID
(str
): The unique ID of the entry to addreads_fn
(str
): The input reads file. Can provide list of multiple files if FASTQfiletype
(str
): The format of the input reads file (CRAM, BAM, SAM, or FASTQ), or None to infer fromreads_fn
lossy_names
(bool
):True
to discard read names when both reads of a read-pair are in the same CRAM slicel (results in better compression), otherwiseFalse
to keep all read namesinclude_unmapped
(bool
): Include unmapped reads when converting from non-CRAM formatscheck_unique
(bool
): Check thatID
doesn’t already exist. Should only be skipped if user is already guaranteed to not have duplicatesbufsize
(int
): Buffer size for reading from filethreads
(int
): Number of threads to use for compressioncommit
(bool
): Commit database after adding this entryverbose
(bool
):True
to enable verbose messages (e.g. samtools and minimap2 commands), otherwiseFalse
- compute_consensus(ID, min_depth=10, min_freq=0.5, ambig='N', remove_gaps=True, overwrite=False, commit=True)[source]¶
Compute the consensus sequence for a given entry. The position and insertion counts must have already been computed
- Args:
ID
(str
): The unique ID of the entry whose counts to computemin_depth
(int
): Minimum depth to call base/insertion in consensusmin_freq
(float
): Minimum frequency [0,1] to call base/insertion in consensusambig
(str
): Symbol to use for ambiguous bases in consensusremove_gaps
(bool
): Remove gap characters (-
) from consensusoverwrite
(bool
):True
to recompute (and overwrite) counts if they already existcommit
(bool
): Commit database after updating this entry
- compute_counts(ID, min_qual=20, bufsize=1048576, overwrite=False, commit=True)[source]¶
Compute position and insertion counts for a given entry
- Args:
ID
(str
): The unique ID of the entry whose counts to computemin_qual
(int
): Minimum base quality to count basebufsize
(int
): Buffer size for reading from fileoverwrite
(bool
):True
to recompute (and overwrite) counts if they already existcommit
(bool
): Commit database after updating this entry
- del_entry(ID)[source]¶
Remove an entry to this database
- Args:
ID
(str
): The unique ID of the entry to remove
- del_reads(ID, confirm=True, commit=True)[source]¶
Remove the reads from a given entry in this database in order to save space. This should only be done if the counts have already been computed (and even then, this is strongly discouraged).
- Args:
ID
(str
): The unique ID of the entry to removeconfirm
(bool
):True
to prompt the user for confirmation before removing the reads, otherwiseFalse
to remove reads silently (e.g. for automation)commit
(bool
): Commit database after removing this entry
- export_cram(ID, out_fn, overwrite=False)[source]¶
Export the CRAM file of a given entry
- Args:
ID
(str
): The unique ID of the entry whose CRAM to exportout_fn
(str
): The path of the output CRAM fileoverwrite
(bool
): Overwrite output file if it exists
- export_fasta(out_fn, IDs=None, overwrite=False)[source]¶
Export multiple consensus sequences as a FASTA file
- Args:
out_fn
(str
): The path of the output FASTA fileIDs
(list
): List of IDs whose consensus sequences to export, orNone
to export all consensus sequences in the databaseoverwrite
(bool
): Overwrite output file if it exists
- get_IDs()[source]¶
Return the IDs in this database
- Returns:
list
object containing all of the IDs in this database
- get_consensus(ID)[source]¶
Return the consensus sequence for a given entry
- Args:
ID
(str
): The unique ID of the entry whose counts to return- Returns:
The consensus sequence for
ID
as a FASTA string (orNone
if not yet computed)
- get_counts(ID)[source]¶
Return the position and insertion counts for a given entry
- Args:
ID
(str
): The unique ID of the entry whose counts to return- Returns:
The position counts for
ID
(orNone
if not yet computed)The insertion counts for
ID
(orNone
if not yet computed)
- get_entry(ID)[source]¶
Return the data of an entry associated with a given ID in this database
- Args:
ID
(str
): The unique ID of the entry to retrieve- Returns:
bytes
object containing the CRAM data of the readsnumpy.array
object containing the position countsdict
object containing the insertion countsstr
object containing the consensus sequence
- get_meta()[source]¶
Get the metadata from this
ViReaDB
database- Returns:
dict
object containing the metadata of thisViReaDB
database
- rename_entry(old_ID, new_ID, commit=True, vacuum=False)[source]¶
Rename an entry in this database
- Args:
old_ID
(str
): The original ID of the entry to renamenew_ID
(str
): The new ID to rename the entrycommit
(bool
): Commit database after renaming this entryvacuum
(bool
): Vacuum database after renaming this entry (to minimize database filesize)
- vireadb.create_db(db_fn, ref_fn, overwrite=False, bufsize=1048576)[source]¶
Create a new ViReaDB database
- Args:
db_fn
(str
): The filename of the SQLite3 database file representing this databaseref_fn
(str
): The filename of the viral reference genome to use for this databaseoverwrite
(bool
): Overwritedb_fn
if it already existsbufsize
(int
): Buffer size for reading from file- Returns:
ViReaDB
object
- vireadb.load_db(db_fn)[source]¶
Load a ViReaDB database from file
- Args:
db_fn
(str
): The filename of the SQLite3 database file representing this database- Returns:
ViReaDB
object
- vireadb.merge_dbs(out_db_fn, in_db_fns, check_meta=True, overwrite=False)[source]¶
Merge multiple ViReaDB databases
- Args:
out_db_fn
(str
): The filename of the SQLite3 database file representing the output databasein_db_fns
(list
): The filenames of the SQLite3 databases representing the input databasescheck_meta
(bool
): Check that the metadata are identical across the databases. Should only be skipped if user is already guaranteed that they matchoverwrite
(bool
): Overwritedb_fn
if it already exists- Returns:
ViReaDB
object