pdstools.decision_analyzer.data_read_utils

Functions

read_nested_zip_files(→ polars.DataFrame)

Reads a zip file buffer (uploaded from Streamlit) that contains .zip files,

read_gzipped_data(→ Optional[polars.DataFrame])

Reads gzipped ndjson data from a BytesIO object and returns a Polars DataFrame.

read_gzips_with_zip_extension(→ polars.DataFrame)

Iterates over all files with a .zip extension in the given directory, treats them

read_data(path)

get_da_data_path()

validate_columns(df, extract_type)

Module Contents

read_nested_zip_files(file_buffer) polars.DataFrame

Reads a zip file buffer (uploaded from Streamlit) that contains .zip files, which are in fact gzipped ndjson files. Extracts, reads, and concatenates them into a single Polars DataFrame.

Parameters:

file_buffer (UploadedFile) – The uploaded zip file buffer from Streamlit.

Returns:

A concatenated Polars DataFrame containing the data from all gzipped ndjson files.

Return type:

pl.DataFrame

read_gzipped_data(data: io.BytesIO) polars.DataFrame | None

Reads gzipped ndjson data from a BytesIO object and returns a Polars DataFrame.

Parameters:

data (BytesIO) – The gzipped ndjson data.

Returns:

The Polars DataFrame containing the data, or None if reading fails.

Return type:

Optional[pl.DataFrame]

read_gzips_with_zip_extension(path: str) polars.DataFrame

Iterates over all files with a .zip extension in the given directory, treats them as gzipped ndjson files, reads, and concatenates them into a single Polars DataFrame.

Parameters:

path (str) – The path to the directory containing the .zip files.

Returns:

A concatenated Polars DataFrame containing the data from all gzipped ndjson files.

Return type:

pl.DataFrame

read_data(path)
get_da_data_path()
validate_columns(df: polars.LazyFrame, extract_type: Dict[str, pdstools.decision_analyzer.table_definition.TableConfig])
Parameters: