pdstools.prediction.Prediction ============================== .. py:module:: pdstools.prediction.Prediction Attributes ---------- .. autoapisummary:: pdstools.prediction.Prediction.logger pdstools.prediction.Prediction.COLORSCALE_TYPES pdstools.prediction.Prediction.Figure Classes ------- .. autoapisummary:: pdstools.prediction.Prediction.PredictionPlots pdstools.prediction.Prediction.Prediction Module Contents --------------- .. py:data:: logger .. py:data:: COLORSCALE_TYPES .. py:data:: Figure .. py:class:: PredictionPlots(prediction) Bases: :py:obj:`pdstools.utils.namespaces.LazyNamespace` Plots for visualizing Prediction Studio data. This class provides various plotting methods to visualize prediction performance, lift, CTR, and response counts over time. .. py:attribute:: dependencies :value: ['plotly'] .. py:attribute:: prediction .. py:method:: _prediction_trend(period: str, query: Optional[pdstools.utils.types.QUERY], metric: str, title: str, **kwargs) Internal method to create trend plots for various metrics. :param period: Time period for aggregation (e.g., "1d", "1w", "1mo") :type period: str :param query: Optional query to filter the data :type query: Optional[QUERY] :param metric: The metric to plot (e.g., "Performance", "Lift", "CTR") :type metric: str :param title: Plot title :type title: str :param \*\*kwargs: Additional keyword arguments passed directly to plotly.express.line See plotly.express.line documentation for all available options :returns: (plotly figure, dataframe with plot data) :rtype: tuple .. py:method:: performance_trend(period: str = '1d', *, query: Optional[pdstools.utils.types.QUERY] = None, return_df: bool = False, **kwargs) Create a performance trend plot showing AUC over time. Displays a line chart showing how prediction performance (AUC) changes over time, with configurable time period aggregation and filtering capabilities. :param period: Time period for aggregation (e.g., "1d", "1w", "1mo"), by default "1d". Uses Polars truncate syntax for time period grouping. :type period: str, optional :param query: Query to filter the prediction data. See pdstools.utils.cdh_utils._apply_query for details. :type query: Optional[QUERY], optional :param return_df: If True, returns the underlying data instead of the plot, by default False. :type return_df: bool, optional :param \*\*kwargs: Additional keyword arguments passed directly to plotly.express.line. See plotly.express.line documentation for all available options. :returns: A Plotly figure object or the underlying data if return_df is True. :rtype: Union[plotly.graph_objects.Figure, polars.LazyFrame] .. rubric:: Examples >>> # Basic performance trend plot >>> pred.plot.performance_trend() >>> # Weekly aggregated performance trend >>> pred.plot.performance_trend(period="1w") >>> # Performance trend with faceting by prediction >>> pred.plot.performance_trend(facet_row="Prediction") >>> # Get underlying data for custom analysis >>> data = pred.plot.performance_trend(return_df=True) .. py:method:: lift_trend(period: str = '1d', *, query: Optional[pdstools.utils.types.QUERY] = None, return_df: bool = False, **kwargs) Create a lift trend plot showing engagement lift over time. Displays a line chart showing how prediction engagement lift changes over time, comparing test group performance against control group baseline. :param period: Time period for aggregation (e.g., "1d", "1w", "1mo"), by default "1d". Uses Polars truncate syntax for time period grouping. :type period: str, optional :param query: Query to filter the prediction data. See pdstools.utils.cdh_utils._apply_query for details. :type query: Optional[QUERY], optional :param return_df: If True, returns the underlying data instead of the plot, by default False. :type return_df: bool, optional :param \*\*kwargs: Additional keyword arguments passed directly to plotly.express.line. See plotly.express.line documentation for all available options. :returns: A Plotly figure object or the underlying data if return_df is True. :rtype: Union[plotly.graph_objects.Figure, polars.LazyFrame] .. rubric:: Examples >>> # Basic lift trend plot >>> pred.plot.lift_trend() >>> # Monthly aggregated lift trend >>> pred.plot.lift_trend(period="1mo") >>> # Lift trend with custom query >>> pred.plot.lift_trend(query=pl.col("Channel") == "Email") >>> # Get underlying data >>> data = pred.plot.lift_trend(return_df=True) .. py:method:: ctr_trend(period: str = '1d', facetting=False, *, query: Optional[pdstools.utils.types.QUERY] = None, return_df: bool = False, **kwargs) Create a CTR (Click-Through Rate) trend plot over time. Displays a line chart showing how prediction click-through rates change over time, with optional faceting capabilities for comparing multiple predictions. :param period: Time period for aggregation (e.g., "1d", "1w", "1mo"), by default "1d". Uses Polars truncate syntax for time period grouping. :type period: str, optional :param facetting: Whether to create facets by prediction for side-by-side comparison, by default False. :type facetting: bool, optional :param query: Query to filter the prediction data. See pdstools.utils.cdh_utils._apply_query for details. :type query: Optional[QUERY], optional :param return_df: If True, returns the underlying data instead of the plot, by default False. :type return_df: bool, optional :param \*\*kwargs: Additional keyword arguments passed directly to plotly.express.line. See plotly.express.line documentation for all available options. :returns: A Plotly figure object or the underlying data if return_df is True. :rtype: Union[plotly.graph_objects.Figure, polars.LazyFrame] .. rubric:: Examples >>> # Basic CTR trend plot >>> pred.plot.ctr_trend() >>> # Weekly CTR trend with faceting >>> pred.plot.ctr_trend(period="1w", facetting=True) >>> # CTR trend with custom query >>> pred.plot.ctr_trend(query=pl.col("Prediction").str.contains("Email")) >>> # Get underlying data >>> data = pred.plot.ctr_trend(return_df=True) .. py:method:: responsecount_trend(period: str = '1d', facetting=False, *, query: Optional[pdstools.utils.types.QUERY] = None, return_df: bool = False, **kwargs) Create a response count trend plot showing total responses over time. Displays a line chart showing how total response volumes change over time, useful for monitoring prediction usage and data volume trends. :param period: Time period for aggregation (e.g., "1d", "1w", "1mo"), by default "1d". Uses Polars truncate syntax for time period grouping. :type period: str, optional :param facetting: Whether to create facets by prediction for side-by-side comparison, by default False. :type facetting: bool, optional :param query: Query to filter the prediction data. See pdstools.utils.cdh_utils._apply_query for details. :type query: Optional[QUERY], optional :param return_df: If True, returns the underlying data instead of the plot, by default False. :type return_df: bool, optional :param \*\*kwargs: Additional keyword arguments passed directly to plotly.express.line. See plotly.express.line documentation for all available options. :returns: A Plotly figure object or the underlying data if return_df is True. :rtype: Union[plotly.graph_objects.Figure, polars.LazyFrame] .. rubric:: Examples >>> # Basic response count trend plot >>> pred.plot.responsecount_trend() >>> # Monthly response count trend with faceting >>> pred.plot.responsecount_trend(period="1mo", facetting=True) >>> # Response count trend for specific predictions >>> pred.plot.responsecount_trend(query=pl.col("Channel") == "Web") >>> # Get underlying data for analysis >>> data = pred.plot.responsecount_trend(return_df=True) .. py:class:: Prediction(df: polars.LazyFrame, *, query: Optional[pdstools.utils.types.QUERY] = None) Monitor and analyze Pega Prediction Studio Predictions. To initialize this class, either 1. Initialize directly with the df polars LazyFrame 2. Use one of the class methods This class will read in the data from different sources, properly structure them for further analysis, and apply correct typing and useful renaming. There is also a "namespace" that you can call from this class: - `.plot` contains ready-made plots to analyze the prediction data with :param df: The Polars LazyFrame representation of the prediction data. :type df: pl.LazyFrame :param query: An optional query to apply to the input data. For details, see :meth:`pdstools.utils.cdh_utils._apply_query`. :type query: QUERY, optional .. rubric:: Examples >>> pred = Prediction.from_ds_export('/my_export_folder/predictions.zip') >>> pred = Prediction.from_mock_data(days=70) >>> from pdstools import Prediction >>> import polars as pl >>> pred = Prediction( df = pl.scan_parquet('predictions.parquet'), query = {"Class":["DATA-DECISION-REQUEST-CUSTOMER-CDH"]} ) .. seealso:: :obj:`pdstools.prediction.PredictionPlots` The out of the box plots on the Prediction data :obj:`pdstools.utils.cdh_utils._apply_query` How to query the Prediction class and methods .. py:attribute:: predictions :type: polars.LazyFrame .. py:attribute:: plot :type: PredictionPlots .. py:attribute:: prediction_validity_expr .. py:method:: from_ds_export(predictions_filename: Union[os.PathLike, str], base_path: Union[os.PathLike, str] = '.', *, query: Optional[pdstools.utils.types.QUERY] = None) :classmethod: Import from a Pega Dataset Export of the PR_DATA_DM_SNAPSHOTS table. :param predictions_filename: The full path or name (if base_path is given) to the prediction snapshot files :type predictions_filename: Union[os.PathLike, str] :param base_path: A base path to provide if predictions_filename is not given as a full path, by default "." :type base_path: Union[os.PathLike, str], optional :param query: An optional argument to filter out selected data, by default None :type query: Optional[QUERY], optional :returns: The properly initialized Prediction class :rtype: Prediction .. rubric:: Examples >>> from pdstools import Prediction >>> pred = Prediction.from_ds_export('predictions.zip', '/my_export_folder') .. note:: By default, the dataset export in Infinity returns a zip file per table. You do not need to open up this zip file! You can simply point to the zip, and this method will be able to read in the underlying data. .. seealso:: :obj:`pdstools.pega_io.File.read_ds_export` More information on file compatibility :obj:`pdstools.utils.cdh_utils._apply_query` How to query the Prediction class and methods .. py:method:: from_s3() :classmethod: Not implemented yet. Please let us know if you would like this functionality! :returns: The properly initialized Prediction class :rtype: Prediction .. py:method:: from_dataflow_export() :classmethod: Import from a data flow, such as the Prediction Studio export. Not implemented yet. Please let us know if you would like this functionality! :returns: The properly initialized Prediction class :rtype: Prediction .. py:method:: from_pdc(df: polars.LazyFrame, *, return_df=False, query: Optional[pdstools.utils.types.QUERY] = None) :classmethod: Import from (Pega-internal) PDC data, which is a combination of the PR_DATA_DM_SNAPSHOTS and PR_DATA_DM_ADMMART_MDL_FACT tables. :param df: The Polars LazyFrame containing the PDC data :type df: pl.LazyFrame :param return_df: If True, returns the processed DataFrame instead of initializing the class, by default False :type return_df: bool, optional :param query: An optional query to apply to the input data, by default None :type query: Optional[QUERY], optional :returns: Either the initialized Prediction class or the processed DataFrame if return_df is True :rtype: Union[Prediction, pl.LazyFrame] .. seealso:: :obj:`pdstools.utils.cdh_utils._read_pdc` More information on PDC data processing :obj:`pdstools.utils.cdh_utils._apply_query` How to query the Prediction class and methods .. py:method:: save_data(path: Union[os.PathLike, str] = '.') -> Optional[os.PathLike] Cache predictions to a file. :param path: Where to place the file :type path: Union[os.PathLike, str] :returns: The path to the cached prediction data file, or None if no data available :rtype: Optional[os.PathLike] .. py:method:: from_processed_data(df: polars.LazyFrame) :classmethod: Load a Prediction from already-processed data (e.g., from cache). This bypasses the normal data transformation pipeline and directly assigns the data to self.predictions. Use this when loading data that has already been processed by the Prediction class constructor, such as data saved via save_data(). :param df: A LazyFrame containing already-processed prediction data with columns like 'Positives', 'CTR', 'Performance', etc. rather than the raw 'pyPositives', 'pyModelType', etc. :type df: pl.LazyFrame :returns: A Prediction instance with the processed data loaded :rtype: Prediction .. rubric:: Examples >>> # Load from a cached file >>> cached_data = pl.scan_parquet('cached_predictions.parquet') >>> pred = Prediction.from_processed_data(cached_data) .. py:method:: from_mock_data(days=70) :classmethod: Create a Prediction instance with mock data for testing and demonstration purposes. :param days: Number of days of mock data to generate, by default 70 :type days: int, optional :returns: The initialized Prediction class with mock data :rtype: Prediction .. rubric:: Examples >>> from pdstools import Prediction >>> pred = Prediction.from_mock_data(days=30) >>> pred.plot.performance_trend() .. py:property:: is_available :type: bool Check if prediction data is available. :returns: True if prediction data is available, False otherwise :rtype: bool .. py:property:: is_valid :type: bool Check if prediction data is valid. A valid prediction meets the criteria defined in prediction_validity_expr, which requires positive and negative responses in both test and control groups. :returns: True if prediction data is valid, False otherwise :rtype: bool .. py:method:: summary_by_channel(custom_predictions: Optional[List[List]] = None, *, start_date: Optional[datetime.datetime] = None, end_date: Optional[datetime.datetime] = None, window: Optional[Union[int, datetime.timedelta]] = None, every: Optional[str] = None, debug: bool = False) -> polars.LazyFrame Summarize prediction per channel :param custom_predictions: Optional list with custom prediction name to channel mappings. Each item should be [PredictionName, Channel, Direction, isMultiChannel]. Defaults to None. :type custom_predictions: Optional[List[List]], optional :param start_date: Start date of the summary period. If None (default) uses the end date minus the window, or if both absent, the earliest date in the data :type start_date: datetime.datetime, optional :param end_date: End date of the summary period. If None (default) uses the start date plus the window, or if both absent, the latest date in the data :type end_date: datetime.datetime, optional :param window: Number of days to use for the summary period or an explicit timedelta. If None (default) uses the whole period. Can't be given if start and end date are also given. :type window: int or datetime.timedelta, optional :param every: Optional additional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example "1mo", "1w", "1d" for calendar month, week day. Defaults to None. :type every: str, optional :param debug: If True, enables debug mode for additional logging or outputs. Defaults to False. :type debug: bool, optional :returns: Summary across all Predictions as a dataframe with the following fields: Time and Configuration Fields: - DateRange Min - The minimum date in the summary time range - DateRange Max - The maximum date in the summary time range - Duration - The duration in seconds between the minimum and maximum snapshot times - Prediction: The prediction name - Channel: The channel name - Direction: The direction (e.g., Inbound, Outbound) - ChannelDirectionGroup: Combined Channel/Direction identifier - isValid: Boolean indicating if the prediction data is valid - usesNBAD: Boolean indicating if this is a standard NBAD prediction - isMultiChannel: Boolean indicating if this is a multichannel prediction - ControlPercentage: Percentage of responses in control group - TestPercentage: Percentage of responses in test group Performance Metrics: - Performance: Weighted model performance (AUC) in range 0.5-1.0 - Positives: Sum of positive responses - Negatives: Sum of negative responses - Responses: Sum of all responses - Positives_Test: Sum of positive responses in test group - Positives_Control: Sum of positive responses in control group - Positives_NBA: Sum of positive responses in NBA group - Negatives_Test: Sum of negative responses in test group - Negatives_Control: Sum of negative responses in control group - Negatives_NBA: Sum of negative responses in NBA group - CTR: Clickthrough rate (Positives over Positives + Negatives) - CTR_Test: Clickthrough rate for test group (model propensitities) - CTR_Control: Clickthrough rate for control group (random propensities) - CTR_NBA: Clickthrough rate for NBA group (available only when Impact Analyzer is used) - Lift: Lift in Engagement when testing prioritization with just Adaptive Models vs just Random Propensity Technology Usage Indicators: - usesImpactAnalyzer: Boolean indicating if Impact Analyzer is used :rtype: pl.LazyFrame .. py:method:: overall_summary(custom_predictions: Optional[List[List]] = None, *, start_date: Optional[datetime.datetime] = None, end_date: Optional[datetime.datetime] = None, window: Optional[Union[int, datetime.timedelta]] = None, every: Optional[str] = None, debug: bool = False) -> polars.LazyFrame Overall prediction summary. Only valid prediction data is included. :param custom_predictions: Optional list with custom prediction name to channel mappings. Each item should be [PredictionName, Channel, Direction, isMultiChannel]. Defaults to None. :type custom_predictions: Optional[List[List]], optional :param start_date: Start date of the summary period. If None (default) uses the end date minus the window, or if both absent, the earliest date in the data :type start_date: datetime.datetime, optional :param end_date: End date of the summary period. If None (default) uses the start date plus the window, or if both absent, the latest date in the data :type end_date: datetime.datetime, optional :param window: Number of days to use for the summary period or an explicit timedelta. If None (default) uses the whole period. Can't be given if start and end date are also given. :type window: int or datetime.timedelta, optional :param every: Optional additional grouping by time period. Format string as in polars.Expr.dt.truncate (https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.truncate.html), for example "1mo", "1w", "1d" for calendar month, week day. Defaults to None. :type every: str, optional :param debug: If True, enables debug mode for additional logging or outputs. Defaults to False. :type debug: bool, optional :returns: Summary across all Predictions as a dataframe with the following fields: Time and Configuration Fields: - DateRange Min - The minimum date in the summary time range - DateRange Max - The maximum date in the summary time range - Duration - The duration in seconds between the minimum and maximum snapshot times - ControlPercentage: Weighted average percentage of control group responses - TestPercentage: Weighted average percentage of test group responses - usesNBAD: Boolean indicating if any of the predictions is a standard NBAD prediction Performance Metrics: - Performance: Weighted average performance (AUC) across all valid channels in range 0.5-1.0 - Positives Inbound: Sum of positive responses across all valid inbound channels - Positives Outbound: Sum of positive responses across all valid outbound channels - Responses Inbound: Sum of all responses across all valid inbound channels - Responses Outbound: Sum of all responses across all valid outbound channels - Overall Lift: Weighted average lift across all valid channels - Minimum Negative Lift: The lowest negative lift value found Channel Statistics: - Number of Valid Channels: Count of unique valid channel/direction combinations - Channel with Minimum Negative Lift: Channel with the lowest negative lift value Technology Usage Indicators: - usesImpactAnalyzer: Boolean indicating if any channel uses Impact Analyzer :rtype: pl.LazyFrame