pdstools.pega_io.action_analysis ================================ .. py:module:: pdstools.pega_io.action_analysis .. autoapi-nested-parse:: Readers for Pega Action Analysis export format. Pega's Action Analysis feature exports decision data as a ZIP archive containing multiple inner files with `.zip` extensions. Despite the extension, these inner files are **gzipped NDJSON** (not ZIP archives). The functions in this module handle that format — both in-memory (``BytesIO`` from Streamlit uploads) and on disk (extracted export directories). Path-taking readers in this module are the single funnel for Action Analysis disk reads; CodeQL ``py/path-injection`` is scoped out here at the config level. Attributes ---------- .. autoapisummary:: pdstools.pega_io.action_analysis.logger Functions --------- .. autoapisummary:: pdstools.pega_io.action_analysis.read_nested_zip_files pdstools.pega_io.action_analysis.read_gzipped_data pdstools.pega_io.action_analysis.read_gzipped_ndjson_directory Module Contents --------------- .. py:data:: logger .. py:function:: read_nested_zip_files(file_buffer) -> polars.LazyFrame Read Pega Action Analysis export format (nested archive with gzipped NDJSON). Pega's Action Analysis feature exports decision data as a ZIP archive containing multiple inner files with `.zip` extensions. Despite the extension, these inner files are **gzipped NDJSON** (not ZIP archives). This function handles this format by: 1. Opening the outer ZIP archive 2. Treating each inner `.zip` file as gzipped NDJSON 3. Decompressing and concatenating all data into a single LazyFrame This format is used for high-volume decision event exports where data is partitioned across multiple compressed files. :param file_buffer: ZIP archive buffer (e.g., from Streamlit file upload) containing inner gzipped NDJSON files with misleading `.zip` extensions. :type file_buffer: UploadedFile or BytesIO :returns: Concatenated LazyFrame from all inner files, with consistent column ordering. Call ``.collect()`` to materialize. :rtype: pl.LazyFrame .. rubric:: Notes This is specific to Pega Action Analysis exports. Modern exports may use hive-partitioned parquet directories instead, which can be read with read_data(). .. py:function:: read_gzipped_data(data: io.BytesIO) -> polars.LazyFrame | None Read a single gzipped NDJSON chunk from Pega Action Analysis export. Helper function for read_nested_zip_files(). Reads one inner file from the Action Analysis export format, decompresses the gzipped content, and parses the NDJSON data. :param data: Gzipped NDJSON data (from an inner file in Action Analysis export). :type data: BytesIO :returns: Polars LazyFrame, or None if decompression/parsing fails. :rtype: pl.LazyFrame | None .. rubric:: Notes Returns None on errors to allow processing remaining files even if some are corrupted. .. py:function:: read_gzipped_ndjson_directory(path: str) -> polars.LazyFrame Read directory of Pega Action Analysis gzipped NDJSON files. For extracted Action Analysis exports, this function recursively finds all files with `.zip` extension (which are actually gzipped NDJSON, not ZIP archives) and concatenates them into a single LazyFrame. Useful when the outer archive has been extracted to disk. :param path: Path to directory containing gzipped NDJSON files with `.zip` extension (from extracted Action Analysis export). :type path: str :returns: Concatenated LazyFrame from all files with consistent column ordering. Call ``.collect()`` to materialize. :rtype: pl.LazyFrame .. rubric:: Notes This is specific to Pega Action Analysis exports. For normal data reading (including hive-partitioned directories), use read_data() from pega_io instead.