Example Prediction Studio Analysis

Pega

2025-07-04

This is a small notebook to report and analyse Prediction Studio data on Predictions. The underlying data is from the Data-DM-Snapshot table that is used to populate the Prediction Studio screen with Prediction Performance, Lift, CTR etc.

Data can be exported from the pyGetSnapshot dataset in Pega Infinity from Dev Studio and Prediction Studio.

Datamart tables are described here.

Raw data

First, we’re going to load the raw data. The raw data is in a “long” format with e.g. test and control groups in separate rows.

[2]:
from pathlib import Path
from pdstools import Prediction

# path to dataset export here
# e.g. PR_DATA_DM_SNAPSHOTS.parquet
data_export = "<YOUR DATA HERE>"

if Path(data_export).exists():
    prediction = Prediction.from_ds_export(data_export)
else:
    prediction = Prediction.from_mock_data(days=60)

Prediction Data

The actual prediction data is in a “wide” format with separate fields for Test and Control groups. Also, it is only the “daily” snapshots and the numbers and date are formatted to be normal Polars types.

[3]:
prediction.predictions.head().collect()
[3]:
shape: (5, 23)
pyModelIdSnapshotTimePositivesNegativesResponseCountPerformancePositives_TestNegatives_TestResponseCount_TestPositives_ControlNegatives_ControlResponseCount_ControlPositives_NBANegatives_NBAResponseCount_NBAClassModelNameCTRCTR_TestCTR_ControlCTR_NBACTR_LiftisValidPrediction
strdatef64i64f64f32f64i64f64f64i64f64f64i64f64strstrf64f64f64f64f64bool
"DATA-DECISION-REQUEST-CUSTOMER…2025-09-06150.060006150.070.0250.060006250.0120.060006120.0150.060006150.0"DATA-DECISION-REQUEST-CUSTOMER""PREDICTMOBILEPROPENSITY"0.024390.040.0196080.024391.04true
"DATA-DECISION-REQUEST-CUSTOMER…2025-09-06250.060006250.070.0250.060006250.0120.060006120.0150.060006150.0"DATA-DECISION-REQUEST-CUSTOMER""PREDICTMOBILEPROPENSITY"0.040.040.0196080.024391.04true
"DATA-DECISION-REQUEST-CUSTOMER…2025-09-06120.060006120.070.0250.060006250.0120.060006120.0150.060006150.0"DATA-DECISION-REQUEST-CUSTOMER""PREDICTMOBILEPROPENSITY"0.0196080.040.0196080.024391.04true
"DATA-DECISION-REQUEST-CUSTOMER…2025-09-07120.060006120.070.05085250.84745860006250.847458120.060006120.0150.060006150.0"DATA-DECISION-REQUEST-CUSTOMER""PREDICTMOBILEPROPENSITY"0.0196080.040130.0196080.024391.046638true
"DATA-DECISION-REQUEST-CUSTOMER…2025-09-07250.84745860006250.84745870.05085250.84745860006250.847458120.060006120.0150.060006150.0"DATA-DECISION-REQUEST-CUSTOMER""PREDICTMOBILEPROPENSITY"0.040130.040130.0196080.024391.046638true

Summary by Channel

Standard functionality exists to summarize the predictions per channel. Note that we do not have the prediction to channel mapping in the data (this is an outstanding product issue), so apply the implicit naming conventions of NBAD. For a specific customer, custom mappings can be passed into the summarization function.

[4]:
prediction.summary_by_channel().collect()
[4]:
shape: (3, 28)
PredictionChannelDirectionusesNBADisMultiChannelDateRange MinDateRange MaxDurationPerformancePositivesNegativesResponsesPositives_TestPositives_ControlPositives_NBANegatives_TestNegatives_ControlNegatives_NBAusesImpactAnalyzerControlPercentageTestPercentageCTRCTR_TestCTR_ControlCTR_NBAChannelDirectionGroupisValidLift
strstrstrboolbooldatedatei64f64f64i64f64f64f64f64i64i64i64boolf64f64f64f64f64f64strboolf64
"PREDICTMOBILEPROPENSITY""Mobile""Inbound"truefalse2025-09-062025-11-04509760071.50069732700.010800001.1127e649500.021600.027000.0108000010800001080000true33.00080933.8366140.0293880.0438250.0196080.02439"Mobile/Inbound"true1.23506
"PREDICTOUTBOUNDEMAILPROPENSITY""E-mail""Outbound"truefalse2025-09-062025-11-04509760062.50056724000.018000001.824e632400.018000.021600.0180000018000001800000true33.22368433.4868420.0131580.0176820.0099010.011858"E-mail/Outbound"true0.785855
"PREDICTWEBPROPENSITY""Web""Inbound"truefalse2025-09-062025-11-04509760067.001637379200.072000007.5792e6612000.0252000.0273600.0720000072000007200000true32.77390834.3571880.0500320.0783410.0338160.036609"Web/Inbound"true1.316656