pdstools.utils.cdh_utils._namespacing
=====================================

.. py:module:: pdstools.utils.cdh_utils._namespacing

.. autoapi-nested-parse::

   Pega-style field-name normalisation and predictor categorisation.


Functions
---------

.. autoapisummary::

   pdstools.utils.cdh_utils._namespacing._capitalize
   pdstools.utils.cdh_utils._namespacing.default_predictor_categorization


Module Contents
---------------

.. py:function:: _capitalize(fields: str | collections.abc.Iterable[str], extra_endwords: collections.abc.Iterable[str] | None = None) -> list[str]

   Applies automatic capitalization, aligned with the R counterpart.

   :param fields: A list of names
   :type fields: list

   :returns: **fields** -- The input list, but each value properly capitalized
   :rtype: list

   .. rubric:: Notes

   The capitalize_endwords list contains atomic word parts that are commonly
   found in Pega field names. Compound words (like "ResponseCount") don't need
   to be listed separately because the algorithm processes words by length,
   allowing shorter components ("Response", "Count") to handle them.


.. py:function:: default_predictor_categorization(x: str | polars.Expr = pl.col('PredictorName')) -> polars.Expr

   Function to determine the 'category' of a predictor.

   It is possible to supply a custom function.
   This function can accept an optional column as input
   And as output should be a Polars expression.
   The most straight-forward way to implement this is with
   pl.when().then().otherwise(), which you can chain.

   By default, this function returns "Primary" whenever
   there is no '.' anywhere in the name string,
   otherwise returns the first string before the first period

   :param x: The column to parse
   :type x: str | pl.Expr, default = pl.col('PredictorName')