Documentation

The geanno.Annotator module

class geanno.Annotator.GenomicRegionAnnotator

Bases: object

annotate()

Method, that annotates the base region table against the ROI tables in the database.

Returns

Nothing to be returned.

Return type

None

get_base()

Method that return self.__base

Returns

Copy of pandas.DataFrame object self.__base

Return type

pandas.DataFrame

load_base_from_dataframe(base_dataframe)

Function that loads base from a pandas.DataFrame, that will be annotated against annotation database.

Parameters

base_dataframe (pandas.DataFrame) – pandas.DataFrame object, that shall be annotated. First three columns must be bed-like, i.e. containing chromosome, start-, and end-position. Must contain columns: “#chrom”, “start”, “end”

Returns

Nothing to be returned

Return type

None

load_base_from_file(base_filename)

Function that loads base file, that will be annotated against annotation database.

Parameters

base_filename (str) – Path to a bed-like file, that shall be annotated. First three columns must be bed-like, i.e. containing chromosome, start-, and end-position. Must contain a header. First three header entries must be: “#chrom”, “start”, “end”.

Returns

Nothing to be returned

Return type

None

load_database_from_dataframe(database_dataframe)

Method for loading a database from a pandas.DataFrame. The database contains all files against which the annotation shall be performed.

Parameters

database_dataframe (pandas.DataFrame) –

pandas.DataFrame object. The database contains all files against which the annotation shall be performed. Required columns are

  • FILENAME: Absolute path to the file (must be a bed like file)

  • REGION.TYPE: E.g. protein.coding.genes, Enhancers, …

  • SOURCE: E.g., Cell type from which regions are derived

  • ANNOTATION.BY: SOURCE | NAME

  • MAX.DISTANCE: Maximal distance between base and database intervall, such that database intervall is anotated to base intervall.

  • DISTANCE.TO: If ANNOTATION.TYPE is distance, then it has to be defined what the location is to which the distance shall be computed. Can be START | END | MID | REGION.

  • N.HITS: Can be either of ALL | CLOSEST

  • NAME.COL: If ANNOTATION.BY == NAME, then you can define the column (0-based) in which the name is stored. If NAME.COL == NA, then it is assumed, that the 4th column contains the name.

Returns

Nothing to be returned

Return type

None

load_database_from_file(database_filename)

Method for loading a database from a tab separated file. The database contains all files against which the annotation shall be performed.

Parameters

database_filename (str) –

Path to tab separated database file. The database contains all files against which the annotation shall be performed. Required columns are

  • FILENAME: Absolute path to the file (must be a bed like file)

  • REGION.TYPE: E.g. protein.coding.genes, Enhancers, …

  • SOURCE: E.g., Cell type from which regions are derived

  • ANNOTATION.BY: SOURCE | NAME

  • MAX.DISTANCE: Maximal distance between base and database intervall, such that database intervall is anotated to base intervall.

  • DISTANCE.TO: If ANNOTATION.TYPE is distance, then it has to be defined what the location is to which the distance shall be computed. Can be START | END | MID | REGION.

  • N.HITS: Can be either of ALL | CLOSEST

  • NAME.COL: If ANNOTATION.BY == NAME, then you can define the column (0-based) in which the name is stored. If NAME.COL == NA, then it is assumed, that the 4th column contains the name.

Returns

Nothing to be returned

Return type

None

print_base()

Method that prints base.

Returns

Nothing to be returned.

Return type

None

print_database()

Method that prints the database.

Returns

Nothing to be returned.

Return type

None

set_tempdir(dirpath)

Methods that sets temp directory for pybedtools objects

Parameters

dirpath (str) – Path to temp directory.

Returns

Nothing to be returned.

Return type

None