fastaparser.FastaSequence
Represents one single FASTA sequence.
Parameters
The FastaSequence class can be instantiated with the following parameters
fastaparser.FastaSequence(sequence, id_='', description='', sequence_type=None, infer_type=False)
Parameter | Type / Value | Default | Description |
---|---|---|---|
sequence | str | String of characters representing a DNA, RNA or aminoacid sequence. Cannot be empty. Must be provided | |
id_ | str | '' | ID portion of the definition line (header). '>' and newlines will be removed, if any. Spaces will be converted to '_'. Can be an empty string. Optional |
description | str | '' | Description portion of the definition line (header). Newlines will be removed, if any. Can be an empty string. Optional |
sequence_type | 'nucleotide', 'aminoacid' or None | None | Indicates the sequence type. If not defined. Optional |
infer_type | bool | False | Indicates if FastaSequence should try to infer aminoacid sequence type. If True , FastaSequence will analyse the whole sequence and, in the worst case scenario, can only identify aminoacid sequences. Optional |
Raises
TypeError
- If
sequence
,id_
,description
,sequence_type
orinfer_type
are of the wrong type.
Attributes
Instances of the FastaSequence class have the following attributes
Attribute | Type / Value | Editable | Description |
---|---|---|---|
id | str | Yes | ID portion of the definition line (header). Can be empty |
description | str | Yes | Description portion of the definition line (header). Can be empty |
sequence | list(LetterCode) | No | Sequence |
sequence_type | 'nucleotide', 'aminoacid' or None | Yes | Indicates the sequence type. Can be None if not known |
inferred_type | bool | No | True if FastaSequence inferred the sequence type, False otherwise. |
Editable attributes can be set by standard variable assignment and deleted/reset with the del keyword:
fastasequence_object.id = 'new_id'
del fastasequence_object.description
Methods
Instances of the FastaSequence class have the following methods
complement
Returns a new FastaSequence
object containing the complementary sequence (ideally, of a nucleotide sequence).
Description is updated to mention the changes relative to the original sequence.
Non-nucleotide letter codes don't have a complement and, therefore, stay the same.
In order not to impose the setting of sequence_type
as 'nucleotide'
, this method will work for any sequence and
LetterCode
(as long as sequence_type
is not 'aminoacid'
), which has the side effect of returning nonsensical
results when letter codes are not nucleotides.
Ex: For aminoacid letter codes that overlap with nucleotide letter codes, the output will be the complement of the nucleotide represented by the same letter code, which makes no sense.
FastaSequence.complement(reverse=False)
Parameter | Type / Value | Default | Description |
---|---|---|---|
reverse | bool | False | If sequence should be reversed. Optional |
Returns
FastaSequence
Complement of the current nucleotide FastaSequence
. Non-nucleotide LetterCode
will stay the same.
Raises
TypeError
- If
sequence_type
is'aminoacid'
. - If
reverse
is notbool
.
gc_content
Calculates and returns the GC content of nucleotide sequence (as a ratio, by default).
Ignores degenerate letter codes besides S (G or C).
GC content is calculated the first time the method is called. Later calls will retrieve the same value.
GC content can also be calculated in at_gc_ratio
.
If sequence_type
is not 'nucleotide'
(or the sequence is not inherently a nucleotide sequence) the GC content
might be nonsensical.
FastaSequence.gc_content(as_percentage=False)
Parameter | Type / Value | Default | Description |
---|---|---|---|
as_percentage | bool | False | Indicates whether the computed value should be returned as a percentage instead of the default ratio. Optional |
Returns
float
GC content of sequence.
Raises
TypeError
- If
sequence_type
is'aminoacid'
. - If
as_percentage
is notbool
.
at_gc_ratio
Calculates and returns the AT/GC ratio of nucleotide sequence.
Ignores degenerate letter codes besides W (A or T) and S (G or C).
AT/GC ratio is calculated the first time the method is called. Later calls will retrieve the same value.
Also uses previously calculated GC content or calculates and saves it if it hasn't been calculated yet.
If sequence_type
is not 'nucleotide'
(or the sequence is not inherently a nucleotide sequence) the AT/GC ratio
might be nonsensical.
FastaSequence.at_gc_ratio()
Returns
float
AT/GC ratio of sequence.
Raises
TypeError
- If
sequence_type
is'aminoacid'
.
count_letter_codes
Returns a dictionary of letter code counts. By default shows counts for all existing letter codes in the sequence, but specific letter codes can be specified.
FastaSequence.count_letter_codes(letter_codes=None)
Parameter | Type / Value | Default | Description |
---|---|---|---|
letter_codes | iterable or None | None | Iterable of all letter codes to count. Optional |
Returns
dict
Counts for every letter code in letter_codes
or all letter codes in the sequence if letter_codes
is not specified.
Raises
TypeError
- If
letter_codes
is neither aniterable
orNone
.
count_letter_codes_degenerate
Returns a dictionary of degenerate letter code counts. sequence_type
must be explicitly defined.
FastaSequence.count_letter_codes_degenerate()
Returns
dict
Counts for every degenerate letter code in the sequence.
Raises
TypeError
- If
sequence_type
is not explicitly defined.
formatted_definition_line
Returns a formatted FASTA definition line (header).
FastaSequence.formatted_definition_line()
Returns
str
FASTA definition line properly formatted.
formatted_sequence
Formatted FASTA sequence (only the sequence, without the definition line). Lines are separated by '\n'.
FastaSequence.formatted_sequence(max_characters_per_line=70)
Parameter | Type / Value | Default | Description |
---|---|---|---|
max_characters_per_line | int | 70 | Maximum number of characters per line. This value should not go above 80, as per the FASTA specification. A very low value is also not recommended. Optional |
Returns
str
Returns a FASTA sequence properly formatted.
Raises
TypeError
- If
max_characters_per_line
is not anint
.
formatted_fasta
Returns a formatted FASTA (definition line and sequence).
FastaSequence.formatted_fasta()
Returns
str
FASTA properly formatted.
sequence_as_string
Returns the sequence as string. Converts the list of LetterCode
objects to a single string.
FastaSequence.sequence_as_string()
Returns
str
Sequence as string.
reverse
Iterates over the sequence in reverse order (same as calling reversed()
on a FastaSequence
object).
Returns a new reverse iterator
of the sequence every time reverse
is called.
FastaSequence.reverse()
Returns
iterator
Iterator over the reversed sequence.
Class Methods
The FastaSequence class has the following class method
from_fastasequence
Initializes with the given FastaSequence
object (alternate __init__
method).
FastaSequence.from_fastasequence(fastasequence)
Parameter | Type / Value | Default | Description |
---|---|---|---|
fastasequence | FastaSequence | FastaSequence object. Must be provided |
Returns
FastaSequence
Copy of fastasequence
(FastaSequence
object).
Raises
TypeError
- If
fastasequence
is not aFastaSequence
.
Special Methods
- __iter__
- __reversed__
- __next__
- __getitem__
- __eq__
- __len__
- __repr__
- __str__