pdstools.ih.IH¶
Interaction History analysis for Pega CDH.
Classes¶
Analyze Interaction History data from Pega CDH. |
Module Contents¶
- class IH(data: polars.LazyFrame)¶
Analyze Interaction History data from Pega CDH.
The IH class provides analysis and visualization capabilities for customer interaction data from Pega’s Customer Decision Hub. It supports engagement, conversion, and open rate metrics through customizable outcome label mappings.
- Parameters:
data (polars.LazyFrame)
- data¶
The underlying interaction history data.
- Type:
pl.LazyFrame
- aggregates¶
Aggregation methods accessor.
- Type:
See also
pdstools.adm.ADMDatamartFor ADM model analysis.
pdstools.impactanalyzer.ImpactAnalyzerFor Impact Analyzer experiments.
Examples
>>> from pdstools import IH >>> ih = IH.from_ds_export("interaction_history.zip") >>> ih.aggregates.summary_by_channel().collect() >>> ih.plot.response_count_trend()
- data: polars.LazyFrame¶
- aggregates¶
- plot¶
- classmethod from_ds_export(ih_filename: os.PathLike | str, query: pdstools.utils.types.QUERY | None = None) IH¶
Create an IH instance from a Pega Dataset Export.
- Parameters:
ih_filename (Union[os.PathLike, str]) – Path to the dataset export file (parquet, csv, ndjson, or zip).
query (Optional[QUERY], optional) – Polars expression to filter the data. Default is None.
- Returns:
Initialized IH instance.
- Return type:
Examples
>>> ih = IH.from_ds_export("Data-pxStrategyResult_pxInteractionHistory.zip") >>> ih.data.collect_schema()
- classmethod from_s3() IH¶
- Abstractmethod:
- Return type:
Create an IH instance from S3 data.
Note
Not implemented yet. Please let us know if you would like this!
- Raises:
NotImplementedError – This method is not yet implemented.
- Return type:
- classmethod from_mock_data(days: int = 90, n: int = 100000) IH¶
Create an IH instance with synthetic sample data.
Generates realistic interaction history data for testing and demonstration purposes. Includes inbound (Web) and outbound (Email) channels with configurable propensities and model noise.
- Parameters:
- Returns:
IH instance with synthetic data.
- Return type:
Examples
>>> ih = IH.from_mock_data(days=30, n=10000) >>> ih.data.select("pyChannel").collect().unique()
- get_sequences(positive_outcome_label: str, level: str, outcome_column: str, customerid_column: str) Tuple[List[Tuple[str, Ellipsis]], List[Tuple[int, Ellipsis]], List[collections.defaultdict], List[collections.defaultdict]]¶
Extract customer action sequences for PMI analysis.
Processes customer interaction data to produce action sequences, outcome labels, and frequency counts needed for Pointwise Mutual Information (PMI) calculations.
- Parameters:
- Returns:
customer_sequences (List[Tuple[str, …]]) – Action sequences per customer.
customer_outcomes (List[Tuple[int, …]]) – Binary outcomes (1=positive, 0=other) per sequence position.
count_actions (List[defaultdict]) – Action frequency counts: - [0]: First element counts in bigrams - [1]: Second element counts in bigrams
count_sequences (List[defaultdict]) – Sequence frequency counts: - [0]: All bigrams - [1]: ≥3-grams ending with positive outcome - [2]: Bigrams ending with positive outcome - [3]: Unique n-grams per customer
- Return type:
Tuple[List[Tuple[str, Ellipsis]], List[Tuple[int, Ellipsis]], List[collections.defaultdict], List[collections.defaultdict]]
See also
calculate_pmiCompute PMI scores from sequence counts.
pmi_overviewGenerate PMI analysis summary.
- static calculate_pmi(count_actions: List[collections.defaultdict], count_sequences: List[collections.defaultdict]) Dict[Tuple[str, Ellipsis], float | Dict[str, float | Dict]]¶
Compute PMI scores for action sequences.
Calculates Pointwise Mutual Information scores for bigrams and higher-order n-grams. Higher values indicate more informative or surprising action sequences.
- Parameters:
count_actions (List[defaultdict]) – Action frequency counts from
get_sequences().count_sequences (List[defaultdict]) – Sequence frequency counts from
get_sequences().
- Returns:
PMI scores for sequences: - Bigrams: Direct PMI value (float) - N-grams (n≥3): Dict with ‘average_pmi’ and ‘links’ (constituent bigram PMIs)
- Return type:
See also
get_sequencesExtract sequences for PMI analysis.
pmi_overviewGenerate PMI analysis summary.
Notes
Bigram PMI is calculated as:
\[PMI(a, b) = \log_2 \frac{P(a, b)}{P(a) \cdot P(b)}\]N-gram PMI is the average of constituent bigram PMIs.
- static pmi_overview(ngrams_pmi: Dict[Tuple[str, Ellipsis], float | Dict], count_sequences: List[collections.defaultdict], customer_sequences: List[Tuple[str, Ellipsis]], customer_outcomes: List[Tuple[int, Ellipsis]]) polars.DataFrame¶
Generate PMI analysis summary DataFrame.
Creates a summary of action sequences ranked by their significance in predicting positive outcomes.
- Parameters:
ngrams_pmi (Dict[Tuple[str, ...], Union[float, Dict]]) – PMI scores from
calculate_pmi().count_sequences (List[defaultdict]) – Sequence frequency counts from
get_sequences().customer_sequences (List[Tuple[str, ...]]) – Customer action sequences from
get_sequences().customer_outcomes (List[Tuple[int, ...]]) – Customer outcome sequences from
get_sequences().
- Returns:
Summary DataFrame with columns:
Sequence: Action sequence tuple
Length: Number of actions in sequence
Avg PMI: Average PMI value
Frequency: Total occurrence count
Unique freq: Unique customer count
Score: PMI × log(Frequency), sorted descending
- Return type:
pl.DataFrame
See also
get_sequencesExtract sequences for analysis.
calculate_pmiCompute PMI scores.
Examples
>>> seqs, outs, actions, counts = ih.get_sequences( ... "Conversion", "pyName", "pyOutcome", "pxInteractionID" ... ) >>> pmi = IH.calculate_pmi(actions, counts) >>> IH.pmi_overview(pmi, counts, seqs, outs)