pdstools.valuefinder.ValueFinder
¶
Module Contents¶
Classes¶
Class to analyze Value Finder datasets. |
- class ValueFinder(path: str | None = None, df: pandas.DataFrame | polars.DataFrame | polars.LazyFrame | None = None, verbose: bool = True, import_strategy: Literal[eager, lazy] = 'eager', ncust: int = None, **kwargs)¶
Class to analyze Value Finder datasets.
Relies heavily on polars for faster reading and transformations. See https://pola-rs.github.io/polars/py-polars/html/index.html
Requires either df or a path to be supplied, If a path is supplied, the ‘filename’ argument is optional. If path is given and no filename is, it will look for the most recent.
- Parameters:
path (Optional[str]) – Path to the ValueFinder data files
df (Optional[DataFrame]) – Override to supply a dataframe instead of a file. Supports pandas or polars dataframes
import_strategy (Literal['eager', 'lazy'], default = 'eager') – Whether to import the file fully to memory, or scan the file When data fits into memory, ‘eager’ is typically more efficient However, when data does not fit, the lazy methods typically allow you to still use the data.
verbose (bool) – Whether to print out information during importing
ncust (int)
- Keyword Arguments:
th (float) – An optional keyword argument to override the propensity threshold
filename (Optional[str]) – The name, or extended filepath, towards the file
subset (bool) – Whether to select only a subset of columns. Will speed up analysis and reduce unused information
- save_data(path: str = '.') os.PathLike ¶
Cache the ValueFinder dataset to a file
- Parameters:
path (str) – Where to place the file
- Returns:
The paths to the file
- Return type:
PathLike
- getCustomerSummary(th: float | None = None) polars.DataFrame ¶
Computes the summary of propensities for all customers
- Parameters:
th (Optional[float]) – The threshold to consider an action ‘good’. If a customer has actions with propensity above this, the customer has at least one relevant action. If not given, will default to 5th quantile.
- Return type:
polars.DataFrame
- getCountsPerStage(customersummary: polars.DataFrame | None = None) polars.DataFrame ¶
Generates an aggregated view per stage.
- Parameters:
customersummary (Optional[pl.DataFrame]) – Optional override of the customer summary, which can be generated by getCustomerSummary().
- Return type:
polars.DataFrame
- getThFromQuantile(quantile: float) float ¶
Return the propensity threshold corresponding to a given quantile
If the threshold is already in self._thMap, simply gets it from there Otherwise, computes the threshold and then adds it to the map.
- Parameters:
quantile (float) – The quantile to get the threshold for
- Return type:
float
- getCountsPerThreshold(th, return_df=False) polars.LazyFrame | None ¶
- Return type:
Optional[polars.LazyFrame]
- addCountsForThresholdRange(start, stop, step, method=Literal['threshold, quantile']) None ¶
Adds the counts per stage for a range of quantiles or thresholds.
Once computed, the values are added to .countsPerThreshold so we only need to compute each value once.
- Parameters:
start (float) – The starting of the range
stop (float) – The end of the range
step (float) – The steps to compute between start and stop
method (Literal["threshold", "quantile"]:) – Whether to get a range of thresholds directly or compute the thresholds from their quantiles
- Return type:
None
- plotPropensityDistribution(sampledN: int = 10000) plotly.graph_objects.Figure ¶
Plots the distribution of the different propensities.
For optimization reasons (storage for all points in a boxplot and time complexity for computing the distribution plot), we have to sample to a reasonable amount of data points.
- Parameters:
sampledN (int, default = 10_000) – The number of datapoints to sample
- Return type:
plotly.graph_objects.Figure
- plotPropensityThreshold(sampledN=10000, stage='Eligibility') plotly.graph_objects.Figure ¶
Plots the propensity threshold vs the different propensities.
- Parameters:
sampledN (int, default = 10_000) – The number of datapoints to sample
- Return type:
plotly.graph_objects.Figure
- plotPieCharts(start: float = None, stop: float = None, step: float = None, *, method: Literal[ValueFinder.plotPieCharts.threshold, quantile] = 'threshold', rounding: int = 3, th: float | None = None) plotly.graph_objects.FigureWidget ¶
Plots pie charts showing the distribution of customers
The pie charts each represent the fraction of customers with the color indicating whether they have sufficient relevant actions in that stage of the NBAD arbitration.
If no values are provided for start, stop or step, the pie charts are shown using the default propensity threshold, as part of the Value Finder class.
- Parameters:
start (float) – The starting of the range
stop (float) – The end of the range
step (float) – The steps to compute between start and stop
method (Literal[ValueFinder.plotPieCharts.threshold, quantile])
rounding (int)
th (Optional[float])
- Keyword Arguments:
method (Literal['threshold', 'quantile'], default='threshold') – Whether the range is computed based on the threshold directly or based on the quantile of the propensity
rounding (int) – The number of digits to round the values by
th (Optional[float]) – Choose a specific propensity threshold to plot
- Return type:
plotly.graph_objects.FigureWidget
- plotDistributionPerThreshold(start: float = None, stop: float = None, step: float = None, *, method: Literal[threshold, ValueFinder.plotDistributionPerThreshold.quantile] = 'threshold', rounding=3) plotly.graph_objects.FigureWidget ¶
Plots the distribution of customers per threshold, per stage.
Based on the precomputed data in self.countsPerThreshold, this function will plot the distribution per stage.
To add more data points between a given range, simply pass all three arguments to this function: start, stop and step.
- Parameters:
start (float) – The starting of the range
stop (float) – The end of the range
step (float) – The steps to compute between start and stop
method (Literal[threshold, ValueFinder.plotDistributionPerThreshold.quantile])
- Keyword Arguments:
method (Literal['threshold', 'quantile'], default='threshold') – Whether the range is computed based on the threshold directly or based on the quantile of the propensity
rounding (int) – The number of digits to round the values by
- Return type:
plotly.graph_objects.FigureWidget
- plotFunnelChart(level: str = 'Action', query=None, return_df=False, **kwargs)¶
Plots the funnel of actions or issues per stage.
- Parameters:
level (str, default = 'Actions') – Which element to plot: - If ‘Actions’, plots the distribution of actions. - If ‘Issues’, plots the distribution of issues