pdstools.utils.cdh_utils._namespacing¶

Pega-style field-name normalisation and predictor categorisation.

Functions¶

`_capitalize`(→ list[str])	Applies automatic capitalization, aligned with the R counterpart.
`default_predictor_categorization`() → polars.Expr)	Function to determine the 'category' of a predictor.

Module Contents¶

_capitalize(fields: str | collections.abc.Iterable[str], extra_endwords: collections.abc.Iterable[str] | None = None) → list[str]¶

Applies automatic capitalization, aligned with the R counterpart.

Parameters:

fields (list) – A list of names
extra_endwords (collections.abc.Iterable[str] | None)

Returns:

fields – The input list, but each value properly capitalized

Return type:

list

Notes

The capitalize_endwords list contains atomic word parts that are commonly found in Pega field names. Compound words (like “ResponseCount”) don’t need to be listed separately because the algorithm processes words by length, allowing shorter components (“Response”, “Count”) to handle them.

default_predictor_categorization(x: str | polars.Expr = pl.col('PredictorName')) → polars.Expr¶

Function to determine the ‘category’ of a predictor.

It is possible to supply a custom function. This function can accept an optional column as input And as output should be a Polars expression. The most straight-forward way to implement this is with pl.when().then().otherwise(), which you can chain.

By default, this function returns “Primary” whenever there is no ‘.’ anywhere in the name string, otherwise returns the first string before the first period

Parameters:: x (str | pl.Expr, default = pl.col('PredictorName')) – The column to parse
Return type:: polars.Expr