Documentation¶
The geanno.Annotator module¶
-
class
geanno.Annotator.GenomicRegionAnnotator¶ Bases:
object-
annotate()¶ Method, that annotates the base region table against the ROI tables in the database.
- Returns
Nothing to be returned.
- Return type
None
-
get_base()¶ Method that return self.__base
- Returns
Copy of
pandas.DataFrameobject self.__base- Return type
pandas.DataFrame
-
load_base_from_dataframe(base_dataframe)¶ Function that loads base from a
pandas.DataFrame, that will be annotated against annotation database.- Parameters
base_dataframe (
pandas.DataFrame) –pandas.DataFrameobject, that shall be annotated. First three columns must be bed-like, i.e. containing chromosome, start-, and end-position. Must contain columns: “#chrom”, “start”, “end”- Returns
Nothing to be returned
- Return type
None
-
load_base_from_file(base_filename)¶ Function that loads base file, that will be annotated against annotation database.
- Parameters
base_filename (str) – Path to a bed-like file, that shall be annotated. First three columns must be bed-like, i.e. containing chromosome, start-, and end-position. Must contain a header. First three header entries must be: “#chrom”, “start”, “end”.
- Returns
Nothing to be returned
- Return type
None
-
load_database_from_dataframe(database_dataframe)¶ Method for loading a database from a
pandas.DataFrame. The database contains all files against which the annotation shall be performed.- Parameters
database_dataframe (
pandas.DataFrame) –pandas.DataFrameobject. The database contains all files against which the annotation shall be performed. Required columns areFILENAME: Absolute path to the file (must be a bed like file)
REGION.TYPE: E.g. protein.coding.genes, Enhancers, …
SOURCE: E.g., Cell type from which regions are derived
ANNOTATION.BY: SOURCE | NAME
MAX.DISTANCE: Maximal distance between base and database intervall, such that database intervall is anotated to base intervall.
DISTANCE.TO: If ANNOTATION.TYPE is distance, then it has to be defined what the location is to which the distance shall be computed. Can be START | END | MID | REGION.
N.HITS: Can be either of ALL | CLOSEST
NAME.COL: If ANNOTATION.BY == NAME, then you can define the column (0-based) in which the name is stored. If NAME.COL == NA, then it is assumed, that the 4th column contains the name.
- Returns
Nothing to be returned
- Return type
None
-
load_database_from_file(database_filename)¶ Method for loading a database from a tab separated file. The database contains all files against which the annotation shall be performed.
- Parameters
database_filename (str) –
Path to tab separated database file. The database contains all files against which the annotation shall be performed. Required columns are
FILENAME: Absolute path to the file (must be a bed like file)
REGION.TYPE: E.g. protein.coding.genes, Enhancers, …
SOURCE: E.g., Cell type from which regions are derived
ANNOTATION.BY: SOURCE | NAME
MAX.DISTANCE: Maximal distance between base and database intervall, such that database intervall is anotated to base intervall.
DISTANCE.TO: If ANNOTATION.TYPE is distance, then it has to be defined what the location is to which the distance shall be computed. Can be START | END | MID | REGION.
N.HITS: Can be either of ALL | CLOSEST
NAME.COL: If ANNOTATION.BY == NAME, then you can define the column (0-based) in which the name is stored. If NAME.COL == NA, then it is assumed, that the 4th column contains the name.
- Returns
Nothing to be returned
- Return type
None
-
print_base()¶ Method that prints base.
- Returns
Nothing to be returned.
- Return type
None
-
print_database()¶ Method that prints the database.
- Returns
Nothing to be returned.
- Return type
None
-
set_tempdir(dirpath)¶ Methods that sets temp directory for pybedtools objects
- Parameters
dirpath (str) – Path to temp directory.
- Returns
Nothing to be returned.
- Return type
None
-