Interface to the Human Genome Reference (reference
)¶
-
exception
gepyto.reference.
InvalidMapping
(value)[source]¶ Exception representing an invalid mapping that we can’t fix automatically.
This can happen if the provided allele is incorrect for non-SNP variants. In this case we don’t know if the locus is bad or if the sequence is bad. Since this is ambiguous, we raise this exception for the user to fix.
-
class
gepyto.reference.
Reference
(remote=False)[source]¶ Interface to the human genome reference file.
This class uses
pyfaidx
to parse the genome reference file referenced bysettings.REFERENCE_PATH
.This can only be a single plain fasta file.
Also note that if the path is not in the
~/.gtconfig/gtrc.ini
file, gepyto will look for an environment variable namedREFERENCE_PATH
.If the genome file can’t be found, this class fallbacks to the Ensembl remote API to get the sequences.
This behaviour can also be forced by using the
remote=True
argument.-
check_variant_reference
(variant, flip=False)[source]¶ Given a variant, makes sure that the ‘ref’ allele is consistent with the human genome reference.
Parameters: - variant (
gepyto.structures.variants.Variant
subclass) – The variant to verify. - flip (bool) – If
True
incorrect(ref, alt)
pairs will be flipped (Default: False).
Returns: If flip is True, it returns the correct variant or raises a
ValueError
in case it is not salvageable. If flip is False, a bool is simply returned.- variant (
-
-
gepyto.reference.
check_indel_reference
(indel, ref, fix)[source]¶ Check and/or fix alleles for Indels.
Parameters: ref ( Reference
) – A reference object.In fix mode, this function will try to standardise the alleles for the given indel. This means that the VCF format will be enforced. No “-” alleles will be authorized.
_e.g._ ref: ‘TC’, alt: ‘-‘ will become ref: ‘CTC’, alt: ‘C’ given that the previous nucleotide in the reference is a ‘C’. The position will be adjusted accordingly.
In the regular mode, the only test will be that the ref allele is consistent with the reference. That is, the sequence given as the ref allele equals the one on the same length starting at pos in the genome.
-
gepyto.reference.
check_snp_reference
(snp, ref, flip)[source]¶ Utility function to check if a snp has the correct reference allele.
Parameters: - snp (
gepyto.structures.variants.SNP
) – Thegepyto.structures.variants.SNP
object. - ref (
Reference
) – TheReference
reference object. - flip (bool) – A flag. If True, the return value is a variant with alleles flipped if necessary. If False, a bool is returned: True if the alleles are correct.
Returns: Either a
gepyto.structures.variant.SNP
with flipped alleles or a bool.This is used internally by
Reference
, but it is also available to users, but you need to provide a pyfaidx Fasta object.- snp (