fastaparser.FastaSequence

Represents one single FASTA sequence.

Parameters

The FastaSequence class can be instantiated with the following parameters

fastaparser.FastaSequence(sequence, id_='', description='', sequence_type=None, infer_type=False)
Parameter Type / Value Default Description
sequence str String of characters representing a DNA, RNA or aminoacid sequence. Cannot be empty. Must be provided
id_ str '' ID portion of the definition line (header). '>' and newlines will be removed, if any. Spaces will be converted to '_'. Can be an empty string. Optional
description str '' Description portion of the definition line (header). Newlines will be removed, if any. Can be an empty string. Optional
sequence_type 'nucleotide', 'aminoacid' or None None Indicates the sequence type. If not defined. Optional
infer_type bool False Indicates if FastaSequence should try to infer aminoacid sequence type. If True, FastaSequence will analyse the whole sequence and, in the worst case scenario, can only identify aminoacid sequences. Optional

Raises

TypeError

  • If sequence, id_, description, sequence_type or infer_type are of the wrong type.

Attributes

Instances of the FastaSequence class have the following attributes

Attribute Type / Value Editable Description
id str Yes ID portion of the definition line (header). Can be empty
description str Yes Description portion of the definition line (header). Can be empty
sequence list(LetterCode) No Sequence
sequence_type 'nucleotide', 'aminoacid' or None Yes Indicates the sequence type. Can be None if not known
inferred_type bool No True if FastaSequence inferred the sequence type, False otherwise.

Editable attributes can be set by standard variable assignment and deleted/reset with the del keyword:

fastasequence_object.id = 'new_id'
del fastasequence_object.description

Methods

Instances of the FastaSequence class have the following methods

complement

Returns a new FastaSequence object containing the complementary sequence (ideally, of a nucleotide sequence). Description is updated to mention the changes relative to the original sequence.

Non-nucleotide letter codes don't have a complement and, therefore, stay the same. In order not to impose the setting of sequence_type as 'nucleotide', this method will work for any sequence and LetterCode (as long as sequence_type is not 'aminoacid'), which has the side effect of returning nonsensical results when letter codes are not nucleotides.

Ex: For aminoacid letter codes that overlap with nucleotide letter codes, the output will be the complement of the nucleotide represented by the same letter code, which makes no sense.

FastaSequence.complement(reverse=False)
Parameter Type / Value Default Description
reverse bool False If sequence should be reversed. Optional

Returns

FastaSequence

Complement of the current nucleotide FastaSequence. Non-nucleotide LetterCode will stay the same.

Raises

TypeError

  • If sequence_type is 'aminoacid'.
  • If reverse is not bool.

gc_content

Calculates and returns the GC content of nucleotide sequence (as a ratio, by default). Ignores degenerate letter codes besides S (G or C). GC content is calculated the first time the method is called. Later calls will retrieve the same value. GC content can also be calculated in at_gc_ratio. If sequence_type is not 'nucleotide' (or the sequence is not inherently a nucleotide sequence) the GC content might be nonsensical.

FastaSequence.gc_content(as_percentage=False)
Parameter Type / Value Default Description
as_percentage bool False Indicates whether the computed value should be returned as a percentage instead of the default ratio. Optional

Returns

float

GC content of sequence.

Raises

TypeError

  • If sequence_type is 'aminoacid'.
  • If as_percentage is not bool.

at_gc_ratio

Calculates and returns the AT/GC ratio of nucleotide sequence. Ignores degenerate letter codes besides W (A or T) and S (G or C). AT/GC ratio is calculated the first time the method is called. Later calls will retrieve the same value. Also uses previously calculated GC content or calculates and saves it if it hasn't been calculated yet. If sequence_type is not 'nucleotide' (or the sequence is not inherently a nucleotide sequence) the AT/GC ratio might be nonsensical.

FastaSequence.at_gc_ratio()

Returns

float

AT/GC ratio of sequence.

Raises

TypeError

  • If sequence_type is 'aminoacid'.

count_letter_codes

Returns a dictionary of letter code counts. By default shows counts for all existing letter codes in the sequence, but specific letter codes can be specified.

FastaSequence.count_letter_codes(letter_codes=None)
Parameter Type / Value Default Description
letter_codes iterable or None None Iterable of all letter codes to count. Optional

Returns

dict

Counts for every letter code in letter_codes or all letter codes in the sequence if letter_codes is not specified.

Raises

TypeError

  • If letter_codes is neither an iterable or None.

count_letter_codes_degenerate

Returns a dictionary of degenerate letter code counts. sequence_type must be explicitly defined.

FastaSequence.count_letter_codes_degenerate()

Returns

dict

Counts for every degenerate letter code in the sequence.

Raises

TypeError

  • If sequence_type is not explicitly defined.

formatted_definition_line

Returns a formatted FASTA definition line (header).

FastaSequence.formatted_definition_line()

Returns

str

FASTA definition line properly formatted.

formatted_sequence

Formatted FASTA sequence (only the sequence, without the definition line). Lines are separated by '\n'.

FastaSequence.formatted_sequence(max_characters_per_line=70)
Parameter Type / Value Default Description
max_characters_per_line int 70 Maximum number of characters per line. This value should not go above 80, as per the FASTA specification. A very low value is also not recommended. Optional

Returns

str

Returns a FASTA sequence properly formatted.

Raises

TypeError

  • If max_characters_per_line is not an int.

formatted_fasta

Returns a formatted FASTA (definition line and sequence).

FastaSequence.formatted_fasta()

Returns

str

FASTA properly formatted.

sequence_as_string

Returns the sequence as string. Converts the list of LetterCode objects to a single string.

FastaSequence.sequence_as_string()

Returns

str

Sequence as string.

reverse

Iterates over the sequence in reverse order (same as calling reversed() on a FastaSequence object). Returns a new reverse iterator of the sequence every time reverse is called.

FastaSequence.reverse()

Returns

iterator

Iterator over the reversed sequence.

Class Methods

The FastaSequence class has the following class method

from_fastasequence

Initializes with the given FastaSequence object (alternate __init__ method).

FastaSequence.from_fastasequence(fastasequence)
Parameter Type / Value Default Description
fastasequence FastaSequence FastaSequence object. Must be provided

Returns

FastaSequence

Copy of fastasequence (FastaSequence object).

Raises

TypeError

  • If fastasequence is not a FastaSequence.

Special Methods

  • __iter__
  • __reversed__
  • __next__
  • __getitem__
  • __eq__
  • __len__
  • __repr__
  • __str__