pdstools.pega_io.action_analysis¶
Readers for Pega Action Analysis export format.
Pega’s Action Analysis feature exports decision data as a ZIP archive containing
multiple inner files with .zip extensions. Despite the extension, these inner
files are gzipped NDJSON (not ZIP archives). The functions in this module
handle that format — both in-memory (BytesIO from Streamlit uploads) and on
disk (extracted export directories).
Path-taking readers in this module are the single funnel for Action Analysis
disk reads; CodeQL py/path-injection is scoped out here at the config level.
Attributes¶
Functions¶
|
Read Pega Action Analysis export format (nested archive with gzipped NDJSON). |
|
Read a single gzipped NDJSON chunk from Pega Action Analysis export. |
|
Read directory of Pega Action Analysis gzipped NDJSON files. |
Module Contents¶
- logger¶
- read_nested_zip_files(file_buffer) polars.LazyFrame¶
Read Pega Action Analysis export format (nested archive with gzipped NDJSON).
Pega’s Action Analysis feature exports decision data as a ZIP archive containing multiple inner files with .zip extensions. Despite the extension, these inner files are gzipped NDJSON (not ZIP archives). This function handles this format by: 1. Opening the outer ZIP archive 2. Treating each inner .zip file as gzipped NDJSON 3. Decompressing and concatenating all data into a single LazyFrame
This format is used for high-volume decision event exports where data is partitioned across multiple compressed files.
- Parameters:
file_buffer (UploadedFile or BytesIO) – ZIP archive buffer (e.g., from Streamlit file upload) containing inner gzipped NDJSON files with misleading .zip extensions.
- Returns:
Concatenated LazyFrame from all inner files, with consistent column ordering. Call
.collect()to materialize.- Return type:
pl.LazyFrame
Notes
This is specific to Pega Action Analysis exports. Modern exports may use hive-partitioned parquet directories instead, which can be read with read_data().
- read_gzipped_data(data: io.BytesIO) polars.LazyFrame | None¶
Read a single gzipped NDJSON chunk from Pega Action Analysis export.
Helper function for read_nested_zip_files(). Reads one inner file from the Action Analysis export format, decompresses the gzipped content, and parses the NDJSON data.
- Parameters:
data (BytesIO) – Gzipped NDJSON data (from an inner file in Action Analysis export).
- Returns:
Polars LazyFrame, or None if decompression/parsing fails.
- Return type:
pl.LazyFrame | None
Notes
Returns None on errors to allow processing remaining files even if some are corrupted.
- read_gzipped_ndjson_directory(path: str) polars.LazyFrame¶
Read directory of Pega Action Analysis gzipped NDJSON files.
For extracted Action Analysis exports, this function recursively finds all files with .zip extension (which are actually gzipped NDJSON, not ZIP archives) and concatenates them into a single LazyFrame. Useful when the outer archive has been extracted to disk.
- Parameters:
path (str) – Path to directory containing gzipped NDJSON files with .zip extension (from extracted Action Analysis export).
- Returns:
Concatenated LazyFrame from all files with consistent column ordering. Call
.collect()to materialize.- Return type:
pl.LazyFrame
Notes
This is specific to Pega Action Analysis exports. For normal data reading (including hive-partitioned directories), use read_data() from pega_io instead.