pdstools.explanations.Schema¶
Polars schemas for explanations input parquet files and aggregate outputs.
Mirrors the pattern used by pdstools.adm.Schema: each class is a
collection of class-level attributes naming the expected columns and
their polars dtypes. Apply with cdh_utils._apply_schema_types.
The raw explanation parquet schema is the public contract between Pega
and the Explanations module. Validating against it up front (in
Preprocess.generate) turns malformed inputs into a clear
ValueError instead of a cryptic DuckDB error mid-processing.
Attributes¶
Columns that must be present in every raw explanation parquet file. |
Classes¶
Schema for a single explanation parquet file produced by Pega. |
|
Schema for the per-context aggregate parquet ( |
|
Schema for the per-model aggregate parquet ( |
Module Contents¶
- class RawExplanationData¶
Schema for a single explanation parquet file produced by Pega.
Each row is one (sample, predictor) shap-coefficient observation. Context columns (
pyChannel,pyDirection,pyIssue,pyGroup,pyName,pyTreatment) are user-configurable and not all of them are required to be present, so they are not part of the strict required-columns check. Thepartitioncolumn (JSON-encoded context dict) is required because every downstream SQL aggregation groups by it.- pySubjectID¶
- pyInteractionID¶
- predictor_name¶
- predictor_type¶
- symbolic_value¶
- numeric_value¶
- shap_coeff¶
- score¶
- partition¶
- REQUIRED_RAW_COLUMNS: tuple[str, Ellipsis] = ('pyInteractionID', 'predictor_name', 'predictor_type', 'shap_coeff', 'partition')¶
Columns that must be present in every raw explanation parquet file.
symbolic_valueandnumeric_valueare technically optional per row (one is null depending onpredictor_type), but at least one must exist as a column or the SQL queries fail. We check this separately in_validate_raw_data.
- class ContextualAggregate¶
Schema for the per-context aggregate parquet (
*_BATCH_*.parquet).Produced by
Preprocess._parquet_in_batchesfromresources/queries/numeric.sqlorsymbolic.sql.- partition¶
- predictor_name¶
- predictor_type¶
- bin_contents¶
- bin_order¶
- contribution_abs¶
- contribution¶
- contribution_min¶
- contribution_max¶
- frequency¶
- class OverallAggregate¶
Schema for the per-model aggregate parquet (
*_OVERALL.parquet).Same shape as
ContextualAggregatebutpartitionis always the literal string'whole_model'.- partition¶
- predictor_name¶
- predictor_type¶
- bin_contents¶
- bin_order¶
- contribution_abs¶
- contribution¶
- contribution_min¶
- contribution_max¶
- frequency¶