pdstools.impactanalyzer.statistics ================================== .. py:module:: pdstools.impactanalyzer.statistics .. autoapi-nested-parse:: Statistical calculations for Impact Analyzer experiments. Confidence intervals, significance testing, and sample-size planning are integral to interpreting Impact Analyzer results — without them a reported lift cannot be distinguished from noise. Formulas follow Pega's server-side implementation and have been validated for PDC parity. Scenario Planner Actuals validation is pending. Key implementation details -------------------------- * Pega stores **standard errors** (SE), not confidence intervals. The *z*-score (1.96) is applied only at the significance / display level. * The lift CI uses the **delta method** for the ratio estimator: ``SE(lift) = (1 / ctrl) · √(SE_t² + (test / ctrl)² · SE_c²)``. * For value metrics Pega computes variance as ``p(1-p) · AV²`` (Bernoulli scaled by action value), **not** a Poisson approximation. Attributes ---------- .. autoapisummary:: pdstools.impactanalyzer.statistics.Z_95 pdstools.impactanalyzer.statistics.FORMULAS Classes ------- .. autoapisummary:: pdstools.impactanalyzer.statistics.Formula pdstools.impactanalyzer.statistics.LiftResult Functions --------- .. autoapisummary:: pdstools.impactanalyzer.statistics.accept_rate pdstools.impactanalyzer.statistics.binomial_se pdstools.impactanalyzer.statistics.binomial_ci pdstools.impactanalyzer.statistics.value_variance pdstools.impactanalyzer.statistics.value_se pdstools.impactanalyzer.statistics.calculate_lift pdstools.impactanalyzer.statistics.lift_pl pdstools.impactanalyzer.statistics.lift_se pdstools.impactanalyzer.statistics.is_significant pdstools.impactanalyzer.statistics.calculate_engagement_lift pdstools.impactanalyzer.statistics.calculate_value_lift pdstools.impactanalyzer.statistics.required_sample_size Module Contents --------------- .. py:data:: Z_95 :type: float :value: 1.96 Two-sided 95 % *z*-critical value used by Pega. .. py:class:: Formula Structured representation of a statistical formula. .. attribute:: name Short identifier, e.g. ``"accept_rate"``. :type: str .. attribute:: latex Raw LaTeX expression (no placeholder substitution — the UI renders the symbolic form alongside a separate substitution block showing the numeric values). :type: str .. attribute:: description One-line plain-English description. :type: str .. py:attribute:: name :type: str .. py:attribute:: latex :type: str .. py:attribute:: description :type: str .. py:data:: FORMULAS :type: dict[str, Formula] .. py:class:: LiftResult Result of a lift calculation with standard error. .. attribute:: lift Relative lift ``(test - control) / control``. :type: float .. attribute:: se Delta-method standard error for *lift*. This is the full-precision SE **without** any *z*-multiplier. :type: float .. attribute:: significant ``True`` when the CI does not cross zero. The check uses ``lift ± z * se`` where ``z = 1.96`` (95 % level) after rounding ``se`` to 4 decimal places, matching Pega's ``Math.round(error * 10000.0) / 10000.0``. :type: bool .. attribute:: test_rate Observed test-group rate (accept rate or value per impression). :type: float .. attribute:: control_rate Observed control-group rate. :type: float .. attribute:: test_se Standard error of the test rate. :type: float .. attribute:: control_se Standard error of the control rate. :type: float .. rubric:: Notes Pega stores **standard errors**, not confidence intervals. The ``se`` field is the raw SE. Call :meth:`ci_95` to obtain the 95 % CI half-width (``Z_95 * se``). .. py:attribute:: lift :type: float .. py:attribute:: se :type: float .. py:attribute:: significant :type: bool .. py:attribute:: test_rate :type: float .. py:attribute:: control_rate :type: float .. py:attribute:: test_se :type: float .. py:attribute:: control_se :type: float .. py:method:: ci_95() -> float Return the 95 % confidence-interval half-width. :returns: ``Z_95 * self.se`` (i.e. ``1.96 * se``). :rtype: float .. py:function:: accept_rate(accepts: int, impressions: int) -> float Accept / click-through rate. In Pega *Accept = Accepted + Clicked* (both count as positive outcomes). :param accepts: Number of positive outcomes. :type accepts: int :param impressions: Total number of impressions. :type impressions: int :returns: ``accepts / impressions``, or ``0.0`` when *impressions* ≤ 0. :rtype: float .. py:function:: binomial_se(accepts: int, impressions: int) -> float Standard error of the accept rate: ``√(p(1-p) / n)``. This matches what Pega stores as *TestAcceptRateCI* / *ControlAcceptRateCI* in the ``ConfidenceIntervalCalculation`` sheet — note that despite the column name it is a SE, not a CI. :param accepts: Number of positive outcomes. :type accepts: int :param impressions: Total number of impressions. :type impressions: int :returns: ``√(p(1-p) / n)``, or ``0.0`` when *impressions* ≤ 0 or the rate is 0 or 1. :rtype: float .. rubric:: Notes Uses the Wald (normal-approximation) formula. For extreme *p* (close to 0 or 1) or small *n* this can under-cover; Wilson or Clopper-Pearson intervals are more robust alternatives. .. py:function:: binomial_ci(accepts: int, impressions: int, z: float = Z_95) -> float Binomial CI half-width: ``z · √(p(1-p) / n)``. Returns ``0.0`` when *impressions* ≤ 0 or the rate is 0 or 1. .. py:function:: value_variance(accepts: int, impressions: int, action_value: float) -> float Per-observation Bernoulli variance of the value metric. Pega computes ``p(1-p) · AV²``. Each impression is worth either ``action_value`` (with probability *p*) or 0. This matches *TestVariance* / *ControlVariance* in the ``ConfidenceIntervalCalculation`` sheet. .. py:function:: value_se(accepts: int, impressions: int, action_value: float) -> float SE of value per impression: ``√(Var / n)``. Matches Pega's *TestInterval* / *ControlInterval*. .. py:function:: calculate_lift(test: float, control: float) -> float Relative lift: ``(test - control) / control``. Returns ``0.0`` when *control* ≤ 0. .. py:function:: lift_pl(test_col: str, control_col: str) -> polars.Expr Polars expression for relative lift between two columns. Intended for ``pl.LazyFrame.with_columns()`` so the formula is defined once. :returns: ``(test - control) / control``. :rtype: pl.Expr .. py:function:: lift_se(test: float, control: float, se_test: float, se_control: float) -> float Delta-method standard error for the lift ratio ``test / control - 1``. Formula:: (1 / control) · √(se_test² + (test / control)² · se_control²) .. important:: Pass **standard errors** (no *z*-multiplier). Passing *z*-multiplied CI values will inflate the result by *z*. :param test: Test-group rate (accept rate or VPI). :type test: float :param control: Control-group rate. :type control: float :param se_test: Standard error of *test*. :type se_test: float :param se_control: Standard error of *control*. :type se_control: float :returns: Full-precision SE of the lift (no rounding). :rtype: float .. py:function:: is_significant(lift: float, se: float, z: float = Z_95) -> bool ``True`` when the CI does not cross zero. Tests whether ``[lift - z·se, lift + z·se]`` excludes zero, i.e. the lift is statistically significant at the given confidence level. With the default ``z = 1.96`` this is a **95 % two-sided** test. :param lift: Observed relative lift. :type lift: float :param se: Standard error of the lift (not z-multiplied). :type se: float :param z: Critical value. Default ``1.96`` (95 %). :type z: float, optional :returns: ``True`` if the interval excludes zero. :rtype: bool .. py:function:: calculate_engagement_lift(accepts_test: int, impr_test: int, accepts_control: int, impr_control: int) -> LiftResult Engagement lift with delta-method SE. This is the primary metric in the Impact Analyzer UI. :param accepts_test: Positive outcomes in the test group. :type accepts_test: int :param impr_test: Total impressions in the test group. :type impr_test: int :param accepts_control: Positive outcomes in the control group. :type accepts_control: int :param impr_control: Total impressions in the control group. :type impr_control: int :returns: Lift, SE, and significance for the engagement metric. :rtype: LiftResult .. py:function:: calculate_value_lift(accepts_test: int, impr_test: int, accepts_control: int, impr_control: int, action_value: float) -> LiftResult Value-per-impression lift with delta-method CI. Pega computes value as ``accept_rate × action_value`` with Bernoulli variance ``p(1-p) · AV²``. :param accepts_test: Positive outcomes in the test group. :type accepts_test: int :param impr_test: Total impressions in the test group. :type impr_test: int :param accepts_control: Positive outcomes in the control group. :type accepts_control: int :param impr_control: Total impressions in the control group. :type impr_control: int :param action_value: Monetary action value per accept. :type action_value: float :returns: Lift, SE, and significance for the value metric. :rtype: LiftResult .. py:function:: required_sample_size(baseline_rate: float, mde: float = 0.05, alpha: float = 0.05, power: float = 0.8, control_ratio: float = 0.02) -> int Required total impressions for a two-proportion *z*-test. :param baseline_rate: Expected control-group accept rate. :type baseline_rate: float :param mde: Minimum detectable effect (relative lift). :type mde: float :param alpha: Significance level. :type alpha: float :param power: Statistical power. :type power: float :param control_ratio: Fraction of traffic allocated to control (default 2 %). This default matches Pega Impact Analyzer's typical configuration. For general power analysis, 0.5 (equal allocation) is more common. :type control_ratio: float :returns: Ceiling of the required sample size. :rtype: int