pdstools.utils.cdh_utils._namespacing

Pega-style field-name normalisation and predictor categorisation.

Functions

_capitalize(→ list[str])

Applies automatic capitalization, aligned with the R counterpart.

default_predictor_categorization() → polars.Expr)

Function to determine the 'category' of a predictor.

Module Contents

_capitalize(fields: str | collections.abc.Iterable[str], extra_endwords: collections.abc.Iterable[str] | None = None) list[str]

Applies automatic capitalization, aligned with the R counterpart.

Parameters:
Returns:

fields – The input list, but each value properly capitalized

Return type:

list

Notes

The capitalize_endwords list contains atomic word parts that are commonly found in Pega field names. Compound words (like “ResponseCount”) don’t need to be listed separately because the algorithm processes words by length, allowing shorter components (“Response”, “Count”) to handle them.

default_predictor_categorization(x: str | polars.Expr = pl.col('PredictorName')) polars.Expr

Function to determine the ‘category’ of a predictor.

It is possible to supply a custom function. This function can accept an optional column as input And as output should be a Polars expression. The most straight-forward way to implement this is with pl.when().then().otherwise(), which you can chain.

By default, this function returns “Primary” whenever there is no ‘.’ anywhere in the name string, otherwise returns the first string before the first period

Parameters:

x (str | pl.Expr, default = pl.col('PredictorName')) – The column to parse

Return type:

polars.Expr