pdstools.valuefinder.ValueFinder

Module Contents

Classes

ValueFinder

Class to analyze Value Finder datasets.

class ValueFinder(path: str | None = None, df: pandas.DataFrame | polars.DataFrame | polars.LazyFrame | None = None, verbose: bool = True, import_strategy: Literal[eager, lazy] = 'eager', ncust: int = None, **kwargs)

Class to analyze Value Finder datasets.

Relies heavily on polars for faster reading and transformations. See https://pola-rs.github.io/polars/py-polars/html/index.html

Requires either df or a path to be supplied, If a path is supplied, the ‘filename’ argument is optional. If path is given and no filename is, it will look for the most recent.

Parameters:
  • path (Optional[str]) – Path to the ValueFinder data files

  • df (Optional[DataFrame]) – Override to supply a dataframe instead of a file. Supports pandas or polars dataframes

  • import_strategy (Literal['eager', 'lazy'], default = 'eager') – Whether to import the file fully to memory, or scan the file When data fits into memory, ‘eager’ is typically more efficient However, when data does not fit, the lazy methods typically allow you to still use the data.

  • verbose (bool) – Whether to print out information during importing

  • ncust (int)

Keyword Arguments:
  • th (float) – An optional keyword argument to override the propensity threshold

  • filename (Optional[str]) – The name, or extended filepath, towards the file

  • subset (bool) – Whether to select only a subset of columns. Will speed up analysis and reduce unused information

save_data(path: str = '.') os.PathLike

Cache the ValueFinder dataset to a file

Parameters:

path (str) – Where to place the file

Returns:

The paths to the file

Return type:

PathLike

getCustomerSummary(th: float | None = None) polars.DataFrame

Computes the summary of propensities for all customers

Parameters:

th (Optional[float]) – The threshold to consider an action ‘good’. If a customer has actions with propensity above this, the customer has at least one relevant action. If not given, will default to 5th quantile.

Return type:

polars.DataFrame

getCountsPerStage(customersummary: polars.DataFrame | None = None) polars.DataFrame

Generates an aggregated view per stage.

Parameters:

customersummary (Optional[pl.DataFrame]) – Optional override of the customer summary, which can be generated by getCustomerSummary().

Return type:

polars.DataFrame

getThFromQuantile(quantile: float) float

Return the propensity threshold corresponding to a given quantile

If the threshold is already in self._thMap, simply gets it from there Otherwise, computes the threshold and then adds it to the map.

Parameters:

quantile (float) – The quantile to get the threshold for

Return type:

float

getCountsPerThreshold(th, return_df=False) polars.LazyFrame | None
Return type:

Optional[polars.LazyFrame]

addCountsForThresholdRange(start, stop, step, method=Literal['threshold, quantile']) None

Adds the counts per stage for a range of quantiles or thresholds.

Once computed, the values are added to .countsPerThreshold so we only need to compute each value once.

Parameters:
  • start (float) – The starting of the range

  • stop (float) – The end of the range

  • step (float) – The steps to compute between start and stop

  • method (Literal["threshold", "quantile"]:) – Whether to get a range of thresholds directly or compute the thresholds from their quantiles

Return type:

None

plotPropensityDistribution(sampledN: int = 10000) plotly.graph_objects.Figure

Plots the distribution of the different propensities.

For optimization reasons (storage for all points in a boxplot and time complexity for computing the distribution plot), we have to sample to a reasonable amount of data points.

Parameters:

sampledN (int, default = 10_000) – The number of datapoints to sample

Return type:

plotly.graph_objects.Figure

plotPropensityThreshold(sampledN=10000, stage='Eligibility') plotly.graph_objects.Figure

Plots the propensity threshold vs the different propensities.

Parameters:

sampledN (int, default = 10_000) – The number of datapoints to sample

Return type:

plotly.graph_objects.Figure

plotPieCharts(start: float = None, stop: float = None, step: float = None, *, method: Literal[ValueFinder.plotPieCharts.threshold, quantile] = 'threshold', rounding: int = 3, th: float | None = None) plotly.graph_objects.FigureWidget

Plots pie charts showing the distribution of customers

The pie charts each represent the fraction of customers with the color indicating whether they have sufficient relevant actions in that stage of the NBAD arbitration.

If no values are provided for start, stop or step, the pie charts are shown using the default propensity threshold, as part of the Value Finder class.

Parameters:
  • start (float) – The starting of the range

  • stop (float) – The end of the range

  • step (float) – The steps to compute between start and stop

  • method (Literal[ValueFinder.plotPieCharts.threshold, quantile])

  • rounding (int)

  • th (Optional[float])

Keyword Arguments:
  • method (Literal['threshold', 'quantile'], default='threshold') – Whether the range is computed based on the threshold directly or based on the quantile of the propensity

  • rounding (int) – The number of digits to round the values by

  • th (Optional[float]) – Choose a specific propensity threshold to plot

Return type:

plotly.graph_objects.FigureWidget

plotDistributionPerThreshold(start: float = None, stop: float = None, step: float = None, *, method: Literal[threshold, ValueFinder.plotDistributionPerThreshold.quantile] = 'threshold', rounding=3) plotly.graph_objects.FigureWidget

Plots the distribution of customers per threshold, per stage.

Based on the precomputed data in self.countsPerThreshold, this function will plot the distribution per stage.

To add more data points between a given range, simply pass all three arguments to this function: start, stop and step.

Parameters:
  • start (float) – The starting of the range

  • stop (float) – The end of the range

  • step (float) – The steps to compute between start and stop

  • method (Literal[threshold, ValueFinder.plotDistributionPerThreshold.quantile])

Keyword Arguments:
  • method (Literal['threshold', 'quantile'], default='threshold') – Whether the range is computed based on the threshold directly or based on the quantile of the propensity

  • rounding (int) – The number of digits to round the values by

Return type:

plotly.graph_objects.FigureWidget

plotFunnelChart(level: str = 'Action', query=None, return_df=False, **kwargs)

Plots the funnel of actions or issues per stage.

Parameters:

level (str, default = 'Actions') – Which element to plot: - If ‘Actions’, plots the distribution of actions. - If ‘Issues’, plots the distribution of issues