pdstools.decision_analyzer.utils¶
Attributes¶
Classes¶
Resolves column mappings between raw data and a standardized schema. |
Functions¶
|
Apply a global set of filters. Kept outside of the DecisionData class as |
|
|
|
|
|
Returns first-level stats of a dataframe for the filter summary. |
|
Rename alias columns to their canonical raw key names before validation. |
|
Detect whether the data is a Decision Analyzer (v2) or Explainability Extract (v1). |
|
Rename columns and cast data types based on table definition. |
|
Cast columns to their target types. |
|
|
|
Create hierarchical filter options and calculate indices for selectbox widgets. |
|
Generate scope configuration for lever application and plotting based on user selections. |
|
Build the set of possible interaction ID column names from the schema. |
|
Return the first matching interaction ID column name from the data. |
|
Sample interactions from a LazyFrame before ingestion. |
|
Sample interactions and persist the result as a parquet file. |
|
Parse the |
Module Contents¶
- class ColumnResolver¶
Resolves column mappings between raw data and a standardized schema.
Raw decision data can come from multiple sources with different schemas: - Explainability Extract vs Decision Analyzer exports - Inbound vs Outbound channel data
For example, channel information may appear as: - ‘Channel’ (already using the display name) - ‘pyChannel’ (an alias for the display name) - ‘Primary_ContainerPayload_Channel’ (raw name needing rename) - Both raw key and display_name present (conflict requiring resolution)
This class normalizes these variations by: - Mapping raw column names to standardized display names - Resolving conflicts when both raw and display_name columns exist - Building the final schema with consistent column names
- __post_init__()¶
- resolve() ColumnResolver¶
Resolve all column mappings and conflicts.
- Returns:
Self, for method chaining
- Return type:
- SCOPE_HIERARCHY = ['Issue', 'Group', 'Action']¶
- PRIO_FACTORS = ['Propensity', 'Value', 'Context Weight', 'Levers']¶
- PRIO_COMPONENTS = ['Propensity', 'Value', 'Context Weight', 'Levers', 'Priority']¶
- apply_filter(df: polars.LazyFrame, filters: polars.Expr | list[polars.Expr] | None = None)¶
Apply a global set of filters. Kept outside of the DecisionData class as this is really more of a utility function, not bound to that class at all.
- Parameters:
df (polars.LazyFrame)
filters (polars.Expr | list[polars.Expr] | None)
- get_first_level_stats(interaction_data: polars.LazyFrame, filters: list[polars.Expr] | None = None)¶
Returns first-level stats of a dataframe for the filter summary.
Shows unique actions (Issue/Group/Action combinations), unique interactions (decisions), and total rows so users understand the impact of their filters.
- Parameters:
interaction_data (polars.LazyFrame)
filters (list[polars.Expr] | None)
- resolve_aliases(df: polars.LazyFrame, *table_definitions: dict) polars.LazyFrame¶
Rename alias columns to their canonical raw key names before validation.
Scans all table definitions for
aliasesentries. If an alias is found in the data but neither the raw key nor the display_name is present, the column is renamed to the raw key so downstream processing can find it.- Parameters:
df (pl.LazyFrame) – Raw data that may use alternative column names.
*table_definitions (dict) – One or more table definition dicts (DecisionAnalyzer, ExplainabilityExtract).
- Returns:
Data with alias columns renamed to canonical raw key names.
- Return type:
pl.LazyFrame
- determine_extract_type(raw_data)¶
Detect whether the data is a Decision Analyzer (v2) or Explainability Extract (v1).
The heuristic is: if any column name matches the raw key, display name, or aliases for the
pxStrategyNameentry in the DecisionAnalyzer table definition, the data is v2.
- rename_and_cast_types(df: polars.LazyFrame, table_definition: dict) polars.LazyFrame¶
Rename columns and cast data types based on table definition.
Performs a single-pass rename from raw column keys to display names, then casts types for default columns.
- Parameters:
df (pl.LazyFrame) – The input dataframe to process
table_definition (dict) – Dictionary containing column definitions with ‘display_name’, ‘default’, and ‘type’ keys
- Returns:
Processed dataframe with renamed columns and cast types
- Return type:
pl.LazyFrame
- _cast_columns(df: polars.LazyFrame, type_mapping: dict[str, type[polars.DataType]]) polars.LazyFrame¶
Cast columns to their target types.
- create_hierarchical_selectors(data: polars.LazyFrame, selected_issue: str | None = None, selected_group: str | None = None, selected_action: str | None = None) dict[str, dict[str, list[str] | int]]¶
Create hierarchical filter options and calculate indices for selectbox widgets.
- Args:
data: LazyFrame with hierarchical data (should be pre-filtered to desired stage) selected_issue: Currently selected issue (optional) selected_group: Currently selected group (optional) selected_action: Currently selected action (optional)
- Returns:
dict with structure: {
“issues”: {“options”: […], “index”: 0}, “groups”: {“options”: [“All”, …], “index”: 0}, “actions”: {“options”: [“All”, …], “index”: 0}
}
- get_scope_config(selected_issue: str, selected_group: str, selected_action: str) dict[str, str | polars.Expr | list[str]]¶
Generate scope configuration for lever application and plotting based on user selections.
- Parameters:
- Returns:
Configuration dictionary containing: - level: “Action”, “Group”, or “Issue” indicating scope level - lever_condition: Polars expression for filtering selected actions - group_cols: List of column names for grouping operations - x_col: Column name to use for x-axis in plots - selected_value: The actual selected value for highlighting - plot_title_prefix: Prefix for plot titles
- Return type:
- logger¶
- _INTERACTION_ID_RAW_KEY = 'pxInteractionID'¶
- _get_interaction_id_candidates() list[str]¶
Build the set of possible interaction ID column names from the schema.
Collects the raw key, display name, and aliases from both table definitions so this stays in sync with
column_schema.py.
- _find_interaction_id_column(columns: set[str]) str¶
Return the first matching interaction ID column name from the data.
- sample_interactions(df: polars.LazyFrame, n: int | None = None, fraction: float | None = None, id_column: str | None = None) polars.LazyFrame¶
Sample interactions from a LazyFrame before ingestion.
Uses deterministic hash-based filtering so the same data and limit always produce the same sample. All rows belonging to a selected interaction are kept (stratified on interaction ID).
Exactly one of n or fraction must be provided.
- Parameters:
- Returns:
Filtered LazyFrame containing only the sampled interactions.
- Return type:
pl.LazyFrame
- sample_and_save(df: polars.LazyFrame, n: int | None = None, fraction: float | None = None, output_dir: str | None = None) polars.LazyFrame¶
Sample interactions and persist the result as a parquet file.
Writes
decision_analyzer_sample.parquetinto output_dir (defaults to the current working directory). Returns a LazyFrame scanning the written file so downstream code benefits from a fast parquet scan.If the data is smaller than the requested sample, sampling is skipped and the original LazyFrame is returned unchanged (no file is written).
- Parameters:
- Returns:
Either a scan of the written parquet file, or the original LazyFrame when sampling was skipped.
- Return type:
pl.LazyFrame