Example Prediction Studio Analysis¶

Pega

2025-07-04

This is a small notebook to report and analyse Prediction Studio data on Predictions. The underlying data is from the Data-DM-Snapshot table that is used to populate the Prediction Studio screen with Prediction Performance, Lift, CTR etc.

Data can be exported from the pyGetSnapshot dataset in Pega Infinity from Dev Studio and Prediction Studio.

Datamart tables are described here.

Raw data¶

First, we’re going to load the raw data. The raw data is in a “long” format with e.g. test and control groups in separate rows.

[2]:

from pathlib import Path
from pdstools import Prediction

# path to dataset export here
# e.g. PR_DATA_DM_SNAPSHOTS.parquet
data_export = "<YOUR DATA HERE>"

if Path(data_export).exists():
    prediction = Prediction.from_ds_export(data_export)
else:
    prediction = Prediction.from_mock_data(days=60)

Prediction Data¶

The actual prediction data is in a “wide” format with separate fields for Test and Control groups. Also, it is only the “daily” snapshots and the numbers and date are formatted to be normal Polars types.

[3]:

prediction.predictions.head().collect()

[3]:

shape: (5, 23)

pyModelId	SnapshotTime	Positives	Negatives	ResponseCount	Performance	Positives_Test	Negatives_Test	ResponseCount_Test	Positives_Control	Negatives_Control	ResponseCount_Control	Positives_NBA	Negatives_NBA	ResponseCount_NBA	Class	ModelName	CTR	CTR_Test	CTR_Control	CTR_NBA	CTR_Lift	isValidPrediction
str	date	f64	i64	f64	f32	f64	i64	f64	f64	i64	f64	f64	i64	f64	str	str	f64	f64	f64	f64	f64	bool
"DATA-DECISION-REQUEST-CUSTOMER…	2026-01-09	150.0	6000	6150.0	0.7	250.0	6000	6250.0	120.0	6000	6120.0	150.0	6000	6150.0	"DATA-DECISION-REQUEST-CUSTOMER"	"PREDICTMOBILEPROPENSITY"	0.02439	0.04	0.019608	0.02439	1.04	true
"DATA-DECISION-REQUEST-CUSTOMER…	2026-01-09	250.0	6000	6250.0	0.7	250.0	6000	6250.0	120.0	6000	6120.0	150.0	6000	6150.0	"DATA-DECISION-REQUEST-CUSTOMER"	"PREDICTMOBILEPROPENSITY"	0.04	0.04	0.019608	0.02439	1.04	true
"DATA-DECISION-REQUEST-CUSTOMER…	2026-01-09	120.0	6000	6120.0	0.7	250.0	6000	6250.0	120.0	6000	6120.0	150.0	6000	6150.0	"DATA-DECISION-REQUEST-CUSTOMER"	"PREDICTMOBILEPROPENSITY"	0.019608	0.04	0.019608	0.02439	1.04	true
"DATA-DECISION-REQUEST-CUSTOMER…	2026-01-10	120.0	6000	6120.0	0.700508	250.847458	6000	6250.847458	120.0	6000	6120.0	150.0	6000	6150.0	"DATA-DECISION-REQUEST-CUSTOMER"	"PREDICTMOBILEPROPENSITY"	0.019608	0.04013	0.019608	0.02439	1.046638	true
"DATA-DECISION-REQUEST-CUSTOMER…	2026-01-10	250.847458	6000	6250.847458	0.700508	250.847458	6000	6250.847458	120.0	6000	6120.0	150.0	6000	6150.0	"DATA-DECISION-REQUEST-CUSTOMER"	"PREDICTMOBILEPROPENSITY"	0.04013	0.04013	0.019608	0.02439	1.046638	true

Summary by Channel¶

Standard functionality exists to summarize the predictions per channel. Note that we do not have the prediction to channel mapping in the data (this is an outstanding product issue), so apply the implicit naming conventions of NBAD. For a specific customer, custom mappings can be passed into the summarization function.

[4]:

prediction.summary_by_channel().collect()

[4]:

shape: (3, 28)

Prediction	Channel	Direction	usesNBAD	isMultiChannel	DateRange Min	DateRange Max	Duration	Performance	Positives	Negatives	Responses	Positives_Test	Positives_Control	Positives_NBA	Negatives_Test	Negatives_Control	Negatives_NBA	usesImpactAnalyzer	ControlPercentage	TestPercentage	CTR	CTR_Test	CTR_Control	CTR_NBA	ChannelDirectionGroup	isValid	Lift
str	str	str	bool	bool	date	date	i64	f64	f64	i64	f64	f64	f64	f64	i64	i64	i64	bool	f64	f64	f64	f64	f64	f64	str	bool	f64
"PREDICTMOBILEPROPENSITY"	"Mobile"	"Inbound"	true	false	2026-01-09	2026-03-09	5097600	0.715007	32700.0	1080000	1.1127e6	49500.0	21600.0	27000.0	1080000	1080000	1080000	true	33.000809	33.836614	0.029388	0.043825	0.019608	0.02439	"Mobile/Inbound"	true	1.23506
"PREDICTOUTBOUNDEMAILPROPENSITY"	"E-mail"	"Outbound"	true	false	2026-01-09	2026-03-09	5097600	0.625006	24000.0	1800000	1.824e6	32400.0	18000.0	21600.0	1800000	1800000	1800000	true	33.223684	33.486842	0.013158	0.017682	0.009901	0.011858	"E-mail/Outbound"	true	0.785855
"PREDICTWEBPROPENSITY"	"Web"	"Inbound"	true	false	2026-01-09	2026-03-09	5097600	0.670016	379200.0	7200000	7.5792e6	612000.0	252000.0	273600.0	7200000	7200000	7200000	true	32.773908	34.357188	0.050032	0.078341	0.033816	0.036609	"Web/Inbound"	true	1.316656

Prediction Trends¶

Summarization by default is over all time. You can pass in an argument to summarize by day, week or any other period as supported by the Polars time offset string language.

This trend data can then easily be visualized.

[5]:

fig = prediction.plot.performance_trend("1w")
fig.show()

[6]:

fig = prediction.plot.lift_trend("1w")
fig.show()

[7]:

fig = prediction.plot.ctr_trend("1w", facetting=False)
fig.show()

[8]:

fig = prediction.plot.responsecount_trend("1w", facetting=False)
fig.show()