pdstools.ih.IH

Interaction History analysis for Pega CDH.

Classes

IH

Analyze Interaction History data from Pega CDH.

Module Contents

class IH(data: polars.LazyFrame)

Analyze Interaction History data from Pega CDH.

The IH class provides analysis and visualization capabilities for customer interaction data from Pega’s Customer Decision Hub. It supports engagement, conversion, and open rate metrics through customizable outcome label mappings.

Parameters:

data (polars.LazyFrame)

data

The underlying interaction history data.

Type:

pl.LazyFrame

aggregates

Aggregation methods accessor.

Type:

Aggregates

plot

Plot accessor for visualization methods.

Type:

Plots

positive_outcome_labels

Mapping of metric types to positive outcome labels.

Type:

dict

negative_outcome_labels

Mapping of metric types to negative outcome labels.

Type:

dict

See also

pdstools.adm.ADMDatamart

For ADM model analysis.

pdstools.impactanalyzer.ImpactAnalyzer

For Impact Analyzer experiments.

Examples

>>> from pdstools import IH
>>> ih = IH.from_ds_export("interaction_history.zip")
>>> ih.aggregates.summary_by_channel().collect()
>>> ih.plot.response_count_trend()
data: polars.LazyFrame
positive_outcome_labels: Dict[str, List[str]]

Mapping of metric types to positive outcome labels.

negative_outcome_labels: Dict[str, List[str]]

Mapping of metric types to negative outcome labels.

aggregates
plot
classmethod from_ds_export(ih_filename: os.PathLike | str, query: pdstools.utils.types.QUERY | None = None) IH

Create an IH instance from a Pega Dataset Export.

Parameters:
  • ih_filename (Union[os.PathLike, str]) – Path to the dataset export file (parquet, csv, ndjson, or zip).

  • query (Optional[QUERY], optional) – Polars expression to filter the data. Default is None.

Returns:

Initialized IH instance.

Return type:

IH

Examples

>>> ih = IH.from_ds_export("Data-pxStrategyResult_pxInteractionHistory.zip")
>>> ih.data.collect_schema()
classmethod from_s3() IH
Abstractmethod:

Return type:

IH

Create an IH instance from S3 data.

Note

Not implemented yet. Please let us know if you would like this!

Raises:

NotImplementedError – This method is not yet implemented.

Return type:

IH

classmethod from_mock_data(days: int = 90, n: int = 100000) IH

Create an IH instance with synthetic sample data.

Generates realistic interaction history data for testing and demonstration purposes. Includes inbound (Web) and outbound (Email) channels with configurable propensities and model noise.

Parameters:
  • days (int, default 90) – Number of days of data to generate.

  • n (int, default 100000) – Number of interaction records to generate.

Returns:

IH instance with synthetic data.

Return type:

IH

Examples

>>> ih = IH.from_mock_data(days=30, n=10000)
>>> ih.data.select("pyChannel").collect().unique()
get_sequences(positive_outcome_label: str, level: str, outcome_column: str, customerid_column: str) Tuple[List[Tuple[str, Ellipsis]], List[Tuple[int, Ellipsis]], List[collections.defaultdict], List[collections.defaultdict]]

Extract customer action sequences for PMI analysis.

Processes customer interaction data to produce action sequences, outcome labels, and frequency counts needed for Pointwise Mutual Information (PMI) calculations.

Parameters:
  • positive_outcome_label (str) – Outcome label marking the target event (e.g., “Conversion”).

  • level (str) – Column name containing the action/offer/treatment.

  • outcome_column (str) – Column name containing the outcome label.

  • customerid_column (str) – Column name identifying unique customers.

Returns:

  • customer_sequences (List[Tuple[str, …]]) – Action sequences per customer.

  • customer_outcomes (List[Tuple[int, …]]) – Binary outcomes (1=positive, 0=other) per sequence position.

  • count_actions (List[defaultdict]) – Action frequency counts: - [0]: First element counts in bigrams - [1]: Second element counts in bigrams

  • count_sequences (List[defaultdict]) – Sequence frequency counts: - [0]: All bigrams - [1]: ≥3-grams ending with positive outcome - [2]: Bigrams ending with positive outcome - [3]: Unique n-grams per customer

Return type:

Tuple[List[Tuple[str, Ellipsis]], List[Tuple[int, Ellipsis]], List[collections.defaultdict], List[collections.defaultdict]]

See also

calculate_pmi

Compute PMI scores from sequence counts.

pmi_overview

Generate PMI analysis summary.

static calculate_pmi(count_actions: List[collections.defaultdict], count_sequences: List[collections.defaultdict]) Dict[Tuple[str, Ellipsis], float | Dict[str, float | Dict]]

Compute PMI scores for action sequences.

Calculates Pointwise Mutual Information scores for bigrams and higher-order n-grams. Higher values indicate more informative or surprising action sequences.

Parameters:
  • count_actions (List[defaultdict]) – Action frequency counts from get_sequences().

  • count_sequences (List[defaultdict]) – Sequence frequency counts from get_sequences().

Returns:

PMI scores for sequences: - Bigrams: Direct PMI value (float) - N-grams (n≥3): Dict with ‘average_pmi’ and ‘links’ (constituent bigram PMIs)

Return type:

Dict[Tuple[str, …], Union[float, Dict]]

See also

get_sequences

Extract sequences for PMI analysis.

pmi_overview

Generate PMI analysis summary.

Notes

Bigram PMI is calculated as:

\[PMI(a, b) = \log_2 \frac{P(a, b)}{P(a) \cdot P(b)}\]

N-gram PMI is the average of constituent bigram PMIs.

static pmi_overview(ngrams_pmi: Dict[Tuple[str, Ellipsis], float | Dict], count_sequences: List[collections.defaultdict], customer_sequences: List[Tuple[str, Ellipsis]], customer_outcomes: List[Tuple[int, Ellipsis]]) polars.DataFrame

Generate PMI analysis summary DataFrame.

Creates a summary of action sequences ranked by their significance in predicting positive outcomes.

Parameters:
  • ngrams_pmi (Dict[Tuple[str, ...], Union[float, Dict]]) – PMI scores from calculate_pmi().

  • count_sequences (List[defaultdict]) – Sequence frequency counts from get_sequences().

  • customer_sequences (List[Tuple[str, ...]]) – Customer action sequences from get_sequences().

  • customer_outcomes (List[Tuple[int, ...]]) – Customer outcome sequences from get_sequences().

Returns:

Summary DataFrame with columns:

  • Sequence: Action sequence tuple

  • Length: Number of actions in sequence

  • Avg PMI: Average PMI value

  • Frequency: Total occurrence count

  • Unique freq: Unique customer count

  • Score: PMI × log(Frequency), sorted descending

Return type:

pl.DataFrame

See also

get_sequences

Extract sequences for analysis.

calculate_pmi

Compute PMI scores.

Examples

>>> seqs, outs, actions, counts = ih.get_sequences(
...     "Conversion", "pyName", "pyOutcome", "pxInteractionID"
... )
>>> pmi = IH.calculate_pmi(actions, counts)
>>> IH.pmi_overview(pmi, counts, seqs, outs)