{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# These lines are only for rendering in the docs, and are hidden through Jupyter tags\n", "# Do not run if you're running the notebook seperately\n", "# Hidden from doc by virtue of cell tags - in VSCode right-click on the bar to the left of this cell, edit cell tags, see metadata\n", "\n", "import plotly.io as pio\n", "\n", "pio.renderers.default = \"notebook_connected\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pdstools import IH\n", "\n", "# from pdstools.utils import cdh_utils\n", "\n", "import polars as pl\n", "import plotly as plotly\n", "\n", "\n", "# plotly.offline.init_notebook_mode()\n", "# pio.renderers.default = \"vscode\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example IH Analysis\n", "\n", "Interaction History (IH) is a rich source of data at the level of individual interactions from Pega DSM applications like CDH. It contains the time of the interaction, the channel, the actions/treatments, the customer ID and is used to track different types of outcomes (decisions, sends, opens, clicks, etc). It does **not** contain details of individual customers - only their ID's.\n", "\n", "Interaction History is typically used to analyze customer behavior and optimize decision strategies. The following sections provide various example analyses that can be performed on IH data, including distribution analysis, response analysis, success rates, model performance, propensity distribution, and response time analysis.\n", "\n", "Like most of PDSTools, it uses [plotly](https://plotly.com/python/) for visualization and [polars](https://docs.pola.rs/) (dataframe) but the purpose of this Notebook is more to serve example analyses than re-usable code, although of course we do try to provide some generic, re-usable functions. All of the analyses should be able to be replicated easily in other analytical BI environments - except perhaps the analysis of model performance / AUC.\n", "\n", "This notebook uses sample data shipped with PDStools. Replace it with your own actual IH data and modify the analyses as appropriate." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ih = IH.from_ds_export(\n", "# \"../../data/Data-pxStrategyResult_pxInteractionHistory_20210101T010000_GMT.zip\"\n", "# )\n", "ih = IH.from_mock_data(n=1e5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Preview of the raw IH data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ih.data.head().collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same interaction can occur multiple times: once when the first decision is made, then later when responses are captured (accepted, sent, clicked, etc.). For some of the analyses we need to group by interaction. This is how that data looks like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ih.aggregates._summary_interactions(by=[\"Channel\"]).head().collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distribution Analysis\n", "\n", "A distribution of the offers (actions/treatments) is often the most obvious type of analysis. You can do an action distribution for specific outcomes (what is offered, what is accepted), view it conditionally (what got offered last month vs this month) - possibly with a delta view, or over time." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ih.plot.response_count_tree_map()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = ih.plot.action_distribution(\n", " query=pl.col.Outcome.is_in([\"Clicked\", \"Accepted\"]), \n", " title=\"Distribution of Actions\",\n", " color=\"Outcome\",\n", ")\n", "# fig.update_layout(yaxis=dict(tickmode=\"linear\")) # to show all names\n", "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Response Analysis\n", "\n", "A simple view of the responses over time shows how many responses are received per day (or any other period)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ih.plot.response_count(every=\"1d\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which could be viewed per channel as well:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ih.plot.response_count(\n", " facet=\"Channel\",\n", " query=pl.col.Channel != \"\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Success Rates\n", "\n", "Success rates (accept rate, open rate, conversion rate) are interesting to track over time. In addition you may want to split by e.g. Channel, or contrast the rates for different experimental setups in an A-B testing set-up." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ih.plot.success_rate(\n", " facet=\"Channel\", query=pl.col.Channel.is_not_null() & (pl.col.Channel != \"\")\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Performance\n", "\n", "Similar to Success Rates: typically viewed over time, likely split by channel, conditioned on variations, e.g. NB vs AGB models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ih.plot.model_performance_trend(by=\"Channel\", every=\"1w\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AGB vs NB analysis\n", "\n", "There are different types of ADM models you can use in CDH. This analysis shows the model performance of the (classic) Naive Bayes models vs the new Gradient Boosting models. We split by channel as this often matters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = ih.plot.model_performance_trend(\n", " by=\"ModelTechnique\",\n", " facet=\"Channel\",\n", " every=\"1w\",\n", " title=\"Model Performance of Naive Bayes vs Gradient Boosting Models\",\n", ")\n", "fig.update_layout(legend_title_text=\"Technique\")\n", "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Propensity Distribution\n", "\n", "IH also contains information about the factors that determine the prioritization of the offers: lever values, propensities etc.\n", "\n", "Here we show the distribution of the propensities of the offers made. It's also a first example of a custom analysis not currently supported directly by the PDSTools library. You can see how we access the underlying IH data (**ih.data**), then aggregate and display it.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly.figure_factory as ff\n", "\n", "channels = [\n", " c\n", " for c in ih.data.select(pl.col.Channel.unique().sort())\n", " .collect()[\"Channel\"]\n", " .to_list()\n", " if c is not None and c != \"\"\n", "]\n", "\n", "plot_data = [\n", " ih.data.filter(pl.col.Channel == c)\n", " .select([\"Propensity\"])\n", " .collect()[\"Propensity\"]\n", " .sample(fraction=0.1)\n", " .to_list()\n", " for c in channels\n", "]\n", "fig = ff.create_distplot(plot_data, group_labels=channels, show_hist=False)\n", "fig.update_layout(\n", " title=\"Propensity Distribution\",\n", " yaxis=dict(showticklabels=False),\n", " xaxis=dict(title=\"Propensity\", tickformat=\".0%\"),\n", " legend_title_text=\"Channel\",\n", " template=\"pega\",\n", ")\n", "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Response Time Analysis\n", "\n", "Time is one of the dimensions in IH. Here we take a look at how subsequent responses relate to the original decision. It shows, for example, how much time there typically is between the moment of decision and the click.\n", "\n", "This type of analysis is usually part of attribution analysis when considering conversion modeling.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly.express as px\n", "\n", "outcomes = [\n", " objective\n", " for objective in ih.data.select(pl.col.Outcome.unique().sort())\n", " .collect()[\"Outcome\"]\n", " .to_list()\n", " if objective is not None and objective != \"\"\n", "]\n", "plot_data = (\n", " ih.data.filter(pl.col.OutcomeTime.is_not_null())\n", " .group_by(\"InteractionID\")\n", " .agg(\n", " [pl.col.OutcomeTime.min().alias(\"Decision_Time\")]\n", " + [\n", " pl.col.OutcomeTime.filter(pl.col.Outcome == o).max().alias(o)\n", " for o in outcomes\n", " ],\n", " )\n", " .collect()\n", " .unpivot(\n", " index=[\"InteractionID\", \"Decision_Time\"],\n", " variable_name=\"Outcome\",\n", " value_name=\"Time\",\n", " )\n", " .with_columns(Duration=(pl.col.Time - pl.col.Decision_Time).dt.total_seconds())\n", " .filter(pl.col.Duration > 0)\n", ")\n", "\n", "ordered_outcomes = (\n", " plot_data.group_by(\"Outcome\")\n", " .agg(Duration=pl.col(\"Duration\").median())\n", " .sort(\"Duration\")[\"Outcome\"]\n", " .to_list()\n", ")\n", "\n", "fig = px.box(\n", " plot_data,\n", " x=\"Duration\",\n", " y=\"Outcome\",\n", " color=\"Outcome\",\n", " template=\"pega\",\n", " category_orders={\"Outcome\": ordered_outcomes},\n", " points=False,\n", " title=\"Duration of Responses\",\n", " log_x=True,\n", ")\n", "fig.update_layout(\n", " xaxis_title=\"Duration (seconds) with logarithmic scale\", yaxis_title=\"\"\n", ")\n", "fig" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }