How can I use the Theseus Python library to do cohort analysis?

0
0
Marked as spam
Posted by (Questions: 42, Answers: 111)
Asked on January 12, 2020 6:30 pm
13 views
0
Private answer

image

This QuantMar thread contains the full documentation for the Theseus Python library. Theseus is an open source Python library that provides straightforward tools for cohort analysis and general marketing performance analysis. Theseus was created by Eric Benjamin Seufert of Heracles.

For usage examples, see the Theseus GitHub page. The purpose of this thread is to provide documentation on the various functions that are available in the Theseus class.

Importing Theseus

import theseus_growth as th

Instantiating Theseus

th = th.theseus()

Method References

theseus.theseus.create_profile

theseus.create_profile( self, days, retention_values, form = 'best_fit', profile_max = None )

Generate a retention profile from day-indexed retention data.

Parameters:

  • days: list. A list of integer day values that correspond to (ie. have the same index as) retention values in the retention_values parameter. The list does not need to be sorted. The list must be of the same length as retention_values. Values cannot be less than 0. Day 0 retention is assumed to always be 100, so Day 0 retention values need not be supplied.
  • retention_values: list. A list of integer retention values that correspond to (ie. have the same index as) the days parameter. Note that the retention values should represent percentages but be provided as integers (eg. 80% retention is provided in retention_values as 80, not .8). Values cannot be less than 0 or greater than 100. Day 0 retention is assumed to always be 100, so Day 0 retention values need not be supplied.
  • form: string (optional). The function that should be fit to the retention data to produce the retention profile. If form is not supplied, the best fit function will be fit to the retention data. Form can only take one of the values of: [ 'log', 'exp', 'linear', 'quad', 'weibull', 'power' ].
  • profile_max: int (optional). The timeline over which the retention profile will be projected. If profile_max is not provided, profile_max will be set to the maximum value from the days paramter (ie. the retention profile curve will only be fit across the data provided and won't be projected past that).

Returns: a dict containing the retention profile and various other meta data for the retention data provided.


theseus.theseus.plot_retention

theseus.plot_retention( self, profile, show_average_values = True )

Plot a line graph of a retention profile

Parameters:

  • profile: dict. This is the retention profile generated with create_profile.
  • show_average_values: bool (optional). When this is set to True, the average day-indexed retention values are graphed. The default value is True.

Returns: None


theseus.theseus.project_cohorted_DAU

theseus.project_cohorted_DAU( self, profile, periods, cohorts, DAU_target = None, DAU_target_timeline = None, start_date = 1 )

Build a forward DAU projection based on a retention profile created via the create_profile method

Parameters:

  • profile: dict. This is the retention profile generated with create_profile.
  • periods: int. The number of periods that should be projected. Minimum value is 2.
  • cohorts: list. A list of integers representing the size of cohorts being projected. Cohort values must be integers greater than 0.
  • DAU_target: int (optional). A DAU value that the projection should reach over the course of DAU_target_timeline, starting with the last cohort. When DAU_target is set, Theseus will run a linear regression from the value of the DAU at the time of the last cohort through to the DAU target over the course of DAU_target_timeline and insert those regression values as new cohorts into the forward DAU projection. Note that if DAU_target is set, then DAU_target_timeline must also be set.
  • DAU_target_timeline: int (optional). The timeline over which the DAU_target value should be reached. Since the DAU_target_timeline only starts from the day of the last cohort, the value of DAU_target_timeline must be less than or equal to the number of periods minus the number of cohorts. Note that if DAU_target is set, then DAU_target_timeline must also be set.
  • start_date: int (optional). The day from which the projection should start. The default value is 1.

Returns: a DataFrame object containing the forward DAU projection.


theseus.theseus.DAU_total

theseus.DAU_total( self, forward_DAU )

Sums the DAU values in a forward DAU projection into totals

Parameters:

  • forward_DAU: DataFrame. The forward DAU projection that totals should be generated from; each day (column) is summed into a total.

Returns: a DataFrame containing just one row of the total values of the DAU_projection that was passed as a parameter.


theseus.theseus.plot_forward_DAU_stacked

theseus.plot_forward_DAU_stacked( self, forward_DAU, forward_DAU_labels, forward_DAU_dates, show_values=False, 
        show_totals_values=False )

Generates and displays a stacked bar graph representing a forward DAU projection, with each cohort being visualized as a bar (Y axis) and each stack summing to the total DAU for that given day (X axis)

Parameters:

  • forward_DAU: DataFrame. The forward DAU projection being visualized.
  • forward_DAU_labels: list. A list of labels that will be used for the cohorts as represented by stacked bars in the chart. A sensible value to supply for this parameter is list( forward_DAU.index )
  • forward_DAU_dates: list. A list of labels that will be used for the days on the X axis. A sensible value to supply for this parameter is list( facebook_DAU.columns ).
  • show_values: bool (optional). A True / False setting that determines whether the value of each bar is printed onto it. Default value is False. For forward DAU projections with very many days, setting this value to True will render the graph very difficult to read.
  • show_totals_values: bool (optional). A True / False setting that determines whether the total value for each day of the forward DAU projection will be printed at the top of the stacked bar. Default value is False.

Returns: None


theseus.theseus.combine_DAU

theseus.combine_DAU( self, DAU_totals, labels = None )

Combines two or more forward DAU projections into one

Parameters:

  • DAU_totals: list. A list containing two or more forward DAU projections, ideally that have been totalled using DAU_total to only include one row each.
  • labels: list (optional). A list of labels to use for each forward DAU projection.

Returns: a DataFrame containing the combined forward DAU projections provided in DAU_totals


theseus.theseus.project_aged_DAU

theseus.project_aged_DAU( self, profile, periods, cohorts, ages, start_date = 1 )

Creates a forward DAU projection that contains the number of DAU on each day that is at least X days old, where X is some age that is provided in the ages parameter. Each value from ages is represented as a row.

Parameters:

  • profile: dict. This is the retention profile generated with create_profile.
  • periods: int. The number of periods that should be projected. Minimum value is 2.
  • cohorts: list. A list of integers representing the size of cohorts being projected. Cohort values must be integers greater than 0.
  • start_date: int (optional). The day from which the projection should start. The default value is 1.
  • ages: list. A list of positive integers corresponding to the minimum ages that should be segmented out from overall DAU values.

Returns: a DataFrame containing one row per value submitted in ages with column values corresponding to the number of DAU each day that are at least that old


theseus.theseus.project_exact_aged_DAU

theseus.project_exact_aged_DAU( self, profile, periods, cohorts, ages, start_date = 1 )

Creates a forward DAU projection that contains the number of DAU on each day that is exactly X days old, where X is some age that is provided in the ages parameter. Each value from ages is represented as a row.

Parameters:

  • profile: dict. This is the retention profile generated with create_profile.
  • periods: int. The number of periods that should be projected. Minimum value is 2.
  • cohorts: list. A list of integers representing the size of cohorts being projected. Cohort values must be integers greater than 0.
  • start_date: int (optional). The day from which the projection should start. The default value is 1.
  • ages: list. A list of positive integers corresponding to the exact ages that should be segmented out from overall DAU values.

Returns: a DataFrame containing one row per value submitted in ages with column values corresponding to the number of DAU each day that are exactly that old


theseus.theseus.to_excel

theseus.to_excel( self, df, file_name = None, sheet_name = None )

Saves the contents of a forward DAU projection as a .xlsx Excel file

Parameters: 

  • df: DataFrame. The forward DAU projection DataFrame to be exported.
  • file_name: str (optional). The name of the file that should be saved. The default value is 'theseus_output.xlsx'. Note that this should not include a file path: the file will be saved into the active directory.
  • sheet_name: str (optional). The name of the worksheet into which the data from the forward DAU projection will be copied. The default value is 'sheet1'.

Returns: None.


theseus.theseus.to_json

theseus.to_json( self, df, file_name = None )

Converts the contents of a forward DAU projection to JSON and saves it as a .json file

Parameters: 

  • df: DataFrame. The forward DAU projection DataFrame to be exported.
  • file_name: str (optional). The name of the file that should be saved. The default value is 'theseus_output.json'. Note that this should not include a file path: the file will be saved into the active directory.
Marked as spam
Posted by (Questions: 42, Answers: 111)
Answered on January 12, 2020 7:55 pm