pdstools.ih.IH

Interaction History analysis for Pega CDH.

Attributes

Classes

IH

Analyze Interaction History data from Pega CDH.

Module Contents

logger
class IH(data: polars.LazyFrame)

Analyze Interaction History data from Pega CDH.

The IH class provides analysis and visualization capabilities for customer interaction data from Pega’s Customer Decision Hub. It supports engagement, conversion, and open rate metrics through customizable outcome label mappings.

Parameters:

data (polars.LazyFrame)

data

The underlying interaction history data.

Type:

pl.LazyFrame

aggregates

Aggregation methods accessor.

Type:

Aggregates

plot

Plot accessor for visualization methods.

Type:

Plots

positive_outcome_labels

Mapping of metric types to positive outcome labels.

Type:

dict

negative_outcome_labels

Mapping of metric types to negative outcome labels.

Type:

dict

See also

pdstools.adm.ADMDatamart

For ADM model analysis.

pdstools.impactanalyzer.ImpactAnalyzer

For Impact Analyzer experiments.

Examples

>>> from pdstools import IH
>>> ih = IH.from_ds_export("interaction_history.zip")
>>> ih.aggregates.summary_by_channel().collect()
>>> ih.plot.response_count_trend()
data: polars.LazyFrame
outcome_labels_used: dict | None
positive_outcome_labels: ClassVar[dict[str, list[str]]]

Mapping of metric types to positive outcome labels.

negative_outcome_labels: ClassVar[dict[str, list[str]]]

Mapping of metric types to negative outcome labels.

aggregates
plot
classmethod from_ds_export(ih_filename: os.PathLike | str, *, query: pdstools.utils.types.QUERY | None = None) IH

Create an IH instance from a Pega Dataset Export.

Parameters:
  • ih_filename (os.PathLike or str) – Path to the dataset export file (parquet, csv, ndjson, or zip).

  • query (QUERY, optional) – Polars expression to filter the data. Default is None.

Returns:

Initialized IH instance.

Return type:

IH

Examples

>>> ih = IH.from_ds_export("Data-pxStrategyResult_pxInteractionHistory.zip")
>>> ih.data.collect_schema()
classmethod from_s3(bucket: str, key: str, *, region: str | None = None, boto3_client=None, query: pdstools.utils.types.QUERY | None = None) IH

Create an IH instance from a single object stored in S3.

Downloads the interaction-history export from the given S3 bucket to a temporary directory, then delegates to from_ds_export() for parsing.

Parameters:
  • bucket (str) – Name of the S3 bucket holding the export file.

  • key (str) – S3 object key for the interaction-history export file.

  • region (str or None, optional) – AWS region name. Ignored if boto3_client is provided.

  • boto3_client (optional) – Pre-configured boto3 S3 client. Use this to inject custom credentials, endpoints, or sessions. When omitted, a default client is created via boto3.client("s3", region_name=region).

  • query (QUERY, optional) – Polars expression to filter the data. Default is None.

Returns:

Initialized IH instance.

Return type:

IH

Examples

>>> from pdstools import IH
>>> ih = IH.from_s3(
...     bucket="my-pega-exports",
...     key="ih/Data-pxStrategyResult_pxInteractionHistory.zip",
... )

Note

boto3 is an optional dependency; install the pega_io extra (or install boto3 directly) before calling this method.

See also

IH.from_ds_export

Underlying parser for downloaded files.

classmethod from_mock_data(days: int = 90, n: int = 100000, seed: int | None = None) IH

Create an IH instance with synthetic sample data.

Generates realistic interaction history data for testing and demonstration purposes. Includes inbound (Web) and outbound (Email) channels with configurable propensities and model noise.

Parameters:
  • days (int, default 90) – Number of days of data to generate.

  • n (int, default 100000) – Number of interaction records to generate.

  • seed (int or None, default None) – Optional seed for the random number generator. When provided, data generation becomes deterministic across runs — useful for tests and reproducible notebooks. When None (the default), results vary between invocations.

Returns:

IH instance with synthetic data.

Return type:

IH

Examples

>>> ih = IH.from_mock_data(days=30, n=10000, seed=42)
>>> ih.data.select("pyChannel").collect().unique()
get_sequences(positive_outcome_label: str, level: str, outcome_column: str, customerid_column: str) tuple[list[tuple[str, Ellipsis]], list[tuple[int, Ellipsis]], list[collections.defaultdict], list[collections.defaultdict]]

Extract customer action sequences for PMI analysis.

Processes customer interaction data to produce action sequences, outcome labels, and frequency counts needed for Pointwise Mutual Information (PMI) calculations.

Parameters:
  • positive_outcome_label (str) – Outcome label marking the target event (e.g., “Conversion”).

  • level (str) – Column name containing the action/offer/treatment.

  • outcome_column (str) – Column name containing the outcome label.

  • customerid_column (str) – Column name identifying unique customers.

Returns:

  • customer_sequences (list[tuple[str, …]]) – Action sequences per customer.

  • customer_outcomes (list[tuple[int, …]]) – Binary outcomes (1=positive, 0=other) per sequence position.

  • count_actions (list[defaultdict]) – Action frequency counts: - [0]: First element counts in bigrams - [1]: Second element counts in bigrams

  • count_sequences (list[defaultdict]) – Sequence frequency counts: - [0]: All bigrams - [1]: ≥3-grams ending with positive outcome - [2]: Bigrams ending with positive outcome - [3]: Unique n-grams per customer

Return type:

tuple[list[tuple[str, Ellipsis]], list[tuple[int, Ellipsis]], list[collections.defaultdict], list[collections.defaultdict]]

See also

calculate_pmi

Compute PMI scores from sequence counts.

pmi_overview

Generate PMI analysis summary.

static calculate_pmi(count_actions: list[collections.defaultdict], count_sequences: list[collections.defaultdict]) dict[tuple[str, Ellipsis], float | dict[str, float | dict]]

Compute PMI scores for action sequences.

Calculates Pointwise Mutual Information scores for bigrams and higher-order n-grams. Higher values indicate more informative or surprising action sequences.

Parameters:
Returns:

PMI scores for sequences: - Bigrams: Direct PMI value (float) - N-grams (n≥3): dict with ‘average_pmi’ and ‘links’ (constituent bigram PMIs)

Return type:

dict[tuple[str, …], float | dict]

See also

get_sequences

Extract sequences for PMI analysis.

pmi_overview

Generate PMI analysis summary.

Notes

Bigram PMI is calculated as:

\[PMI(a, b) = \log_2 \frac{P(a, b)}{P(a) \cdot P(b)}\]

N-gram PMI is the average of constituent bigram PMIs.

static pmi_overview(ngrams_pmi: dict[tuple[str, Ellipsis], float | dict], count_sequences: list[collections.defaultdict], customer_sequences: list[tuple[str, Ellipsis]], customer_outcomes: list[tuple[int, Ellipsis]]) polars.DataFrame

Generate PMI analysis summary DataFrame.

Creates a summary of action sequences ranked by their significance in predicting positive outcomes.

Parameters:
Returns:

Summary DataFrame with columns:

  • Sequence: Action sequence tuple

  • Length: Number of actions in sequence

  • Avg PMI: Average PMI value

  • Frequency: Total occurrence count

  • Unique freq: Unique customer count

  • Score: PMI × log(Frequency), sorted descending

Return type:

pl.DataFrame

See also

get_sequences

Extract sequences for analysis.

calculate_pmi

Compute PMI scores.

Examples

>>> seqs, outs, actions, counts = ih.get_sequences(
...     "Conversion", "pyName", "pyOutcome", "pxInteractionID"
... )
>>> pmi = IH.calculate_pmi(actions, counts)
>>> IH.pmi_overview(pmi, counts, seqs, outs)