pdstools.ih.IH ============== .. py:module:: pdstools.ih.IH .. autoapi-nested-parse:: Interaction History analysis for Pega CDH. Classes ------- .. autoapisummary:: pdstools.ih.IH.IH Module Contents --------------- .. py:class:: IH(data: polars.LazyFrame) Analyze Interaction History data from Pega CDH. The IH class provides analysis and visualization capabilities for customer interaction data from Pega's Customer Decision Hub. It supports engagement, conversion, and open rate metrics through customizable outcome label mappings. .. attribute:: data The underlying interaction history data. :type: pl.LazyFrame .. attribute:: aggregates Aggregation methods accessor. :type: Aggregates .. attribute:: plot Plot accessor for visualization methods. :type: Plots .. attribute:: positive_outcome_labels Mapping of metric types to positive outcome labels. :type: dict .. attribute:: negative_outcome_labels Mapping of metric types to negative outcome labels. :type: dict .. seealso:: :py:obj:`pdstools.adm.ADMDatamart` For ADM model analysis. :py:obj:`pdstools.impactanalyzer.ImpactAnalyzer` For Impact Analyzer experiments. .. rubric:: Examples >>> from pdstools import IH >>> ih = IH.from_ds_export("interaction_history.zip") >>> ih.aggregates.summary_by_channel().collect() >>> ih.plot.response_count_trend() .. py:attribute:: data :type: polars.LazyFrame .. py:attribute:: outcome_labels_used :type: dict | None .. py:attribute:: positive_outcome_labels :type: dict[str, list[str]] Mapping of metric types to positive outcome labels. .. py:attribute:: negative_outcome_labels :type: dict[str, list[str]] Mapping of metric types to negative outcome labels. .. py:attribute:: aggregates .. py:attribute:: plot .. py:method:: from_ds_export(ih_filename: os.PathLike | str, query: pdstools.utils.types.QUERY | None = None) -> IH :classmethod: Create an IH instance from a Pega Dataset Export. :param ih_filename: Path to the dataset export file (parquet, csv, ndjson, or zip). :type ih_filename: Union[os.PathLike, str] :param query: Polars expression to filter the data. Default is None. :type query: Optional[QUERY], optional :returns: Initialized IH instance. :rtype: IH .. rubric:: Examples >>> ih = IH.from_ds_export("Data-pxStrategyResult_pxInteractionHistory.zip") >>> ih.data.collect_schema() .. py:method:: from_s3() -> IH :classmethod: :abstractmethod: Create an IH instance from S3 data. .. note:: Not implemented yet. Please let us know if you would like this! :raises NotImplementedError: This method is not yet implemented. .. py:method:: from_mock_data(days: int = 90, n: int = 100000) -> IH :classmethod: Create an IH instance with synthetic sample data. Generates realistic interaction history data for testing and demonstration purposes. Includes inbound (Web) and outbound (Email) channels with configurable propensities and model noise. :param days: Number of days of data to generate. :type days: int, default 90 :param n: Number of interaction records to generate. :type n: int, default 100000 :returns: IH instance with synthetic data. :rtype: IH .. rubric:: Examples >>> ih = IH.from_mock_data(days=30, n=10000) >>> ih.data.select("pyChannel").collect().unique() .. py:method:: get_sequences(positive_outcome_label: str, level: str, outcome_column: str, customerid_column: str) -> tuple[list[tuple[str, Ellipsis]], list[tuple[int, Ellipsis]], list[collections.defaultdict], list[collections.defaultdict]] Extract customer action sequences for PMI analysis. Processes customer interaction data to produce action sequences, outcome labels, and frequency counts needed for Pointwise Mutual Information (PMI) calculations. :param positive_outcome_label: Outcome label marking the target event (e.g., "Conversion"). :type positive_outcome_label: str :param level: Column name containing the action/offer/treatment. :type level: str :param outcome_column: Column name containing the outcome label. :type outcome_column: str :param customerid_column: Column name identifying unique customers. :type customerid_column: str :returns: * **customer_sequences** (*list[tuple[str, ...]]*) -- Action sequences per customer. * **customer_outcomes** (*list[tuple[int, ...]]*) -- Binary outcomes (1=positive, 0=other) per sequence position. * **count_actions** (*list[defaultdict]*) -- Action frequency counts: - [0]: First element counts in bigrams - [1]: Second element counts in bigrams * **count_sequences** (*list[defaultdict]*) -- Sequence frequency counts: - [0]: All bigrams - [1]: ≥3-grams ending with positive outcome - [2]: Bigrams ending with positive outcome - [3]: Unique n-grams per customer .. seealso:: :py:obj:`calculate_pmi` Compute PMI scores from sequence counts. :py:obj:`pmi_overview` Generate PMI analysis summary. .. py:method:: calculate_pmi(count_actions: list[collections.defaultdict], count_sequences: list[collections.defaultdict]) -> dict[tuple[str, Ellipsis], float | dict[str, float | dict]] :staticmethod: Compute PMI scores for action sequences. Calculates Pointwise Mutual Information scores for bigrams and higher-order n-grams. Higher values indicate more informative or surprising action sequences. :param count_actions: Action frequency counts from :meth:`get_sequences`. :type count_actions: list[defaultdict] :param count_sequences: Sequence frequency counts from :meth:`get_sequences`. :type count_sequences: list[defaultdict] :returns: PMI scores for sequences: - Bigrams: Direct PMI value (float) - N-grams (n≥3): dict with 'average_pmi' and 'links' (constituent bigram PMIs) :rtype: dict[tuple[str, ...], Union[float, dict]] .. seealso:: :py:obj:`get_sequences` Extract sequences for PMI analysis. :py:obj:`pmi_overview` Generate PMI analysis summary. .. rubric:: Notes Bigram PMI is calculated as: .. math:: PMI(a, b) = \log_2 \frac{P(a, b)}{P(a) \cdot P(b)} N-gram PMI is the average of constituent bigram PMIs. .. py:method:: pmi_overview(ngrams_pmi: dict[tuple[str, Ellipsis], float | dict], count_sequences: list[collections.defaultdict], customer_sequences: list[tuple[str, Ellipsis]], customer_outcomes: list[tuple[int, Ellipsis]]) -> polars.DataFrame :staticmethod: Generate PMI analysis summary DataFrame. Creates a summary of action sequences ranked by their significance in predicting positive outcomes. :param ngrams_pmi: PMI scores from :meth:`calculate_pmi`. :type ngrams_pmi: dict[tuple[str, ...], Union[float, dict]] :param count_sequences: Sequence frequency counts from :meth:`get_sequences`. :type count_sequences: list[defaultdict] :param customer_sequences: Customer action sequences from :meth:`get_sequences`. :type customer_sequences: list[tuple[str, ...]] :param customer_outcomes: Customer outcome sequences from :meth:`get_sequences`. :type customer_outcomes: list[tuple[int, ...]] :returns: Summary DataFrame with columns: - **Sequence**: Action sequence tuple - **Length**: Number of actions in sequence - **Avg PMI**: Average PMI value - **Frequency**: Total occurrence count - **Unique freq**: Unique customer count - **Score**: PMI × log(Frequency), sorted descending :rtype: pl.DataFrame .. seealso:: :py:obj:`get_sequences` Extract sequences for analysis. :py:obj:`calculate_pmi` Compute PMI scores. .. rubric:: Examples >>> seqs, outs, actions, counts = ih.get_sequences( ... "Conversion", "pyName", "pyOutcome", "pxInteractionID" ... ) >>> pmi = IH.calculate_pmi(actions, counts) >>> IH.pmi_overview(pmi, counts, seqs, outs)