pdstools.decision_analyzer.data_read_utils¶

Functions¶

`read_nested_zip_files`(→ polars.DataFrame)	Reads a zip file buffer (uploaded from Streamlit) that contains .zip files,
`read_gzipped_data`(→ Optional[polars.DataFrame])	Reads gzipped ndjson data from a BytesIO object and returns a Polars DataFrame.
`read_gzips_with_zip_extension`(→ polars.DataFrame)	Iterates over all files with a .zip extension in the given directory, treats them
`read_data`(path)
`get_da_data_path`()
`validate_columns`(→ Tuple[bool, Optional[str]])	Validate that default columns from table definition exist in the dataframe.

Module Contents¶

read_nested_zip_files(file_buffer) → polars.DataFrame¶

Reads a zip file buffer (uploaded from Streamlit) that contains .zip files, which are in fact gzipped ndjson files. Extracts, reads, and concatenates them into a single Polars DataFrame.

Parameters:: file_buffer (UploadedFile) – The uploaded zip file buffer from Streamlit.
Returns:: A concatenated Polars DataFrame containing the data from all gzipped ndjson files.
Return type:: pl.DataFrame

read_gzipped_data(data: io.BytesIO) → polars.DataFrame | None¶

Reads gzipped ndjson data from a BytesIO object and returns a Polars DataFrame.

Parameters:: data (BytesIO) – The gzipped ndjson data.
Returns:: The Polars DataFrame containing the data, or None if reading fails.
Return type:: Optional[pl.DataFrame]

read_gzips_with_zip_extension(path: str) → polars.DataFrame¶

Iterates over all files with a .zip extension in the given directory, treats them as gzipped ndjson files, reads, and concatenates them into a single Polars DataFrame.

Parameters:: path (str) – The path to the directory containing the .zip files.
Returns:: A concatenated Polars DataFrame containing the data from all gzipped ndjson files.
Return type:: pl.DataFrame

read_data(path)¶

get_da_data_path()¶

validate_columns(df: polars.LazyFrame, extract_type: Dict[str, pdstools.decision_analyzer.table_definition.TableConfig]) → Tuple[bool, str | None]¶

Validate that default columns from table definition exist in the dataframe.

Args:: df: The dataframe to validate extract_type: Table configuration mapping column names to their properties
Returns:: Tuple containing validation success (bool) and error message (str or None)

Parameters:

df (polars.LazyFrame)
extract_type (Dict[str, pdstools.decision_analyzer.table_definition.TableConfig])

Return type:

Tuple[bool, Optional[str]]