pdstools.pega_io.S3¶
Async S3 helper for downloading Pega dataset exports.
Attributes¶
Classes¶
Asynchronous helper for downloading Pega datasets from S3. |
Module Contents¶
- logger¶
- class S3Data(bucket_name: str, temp_dir: str = './s3_download')¶
Asynchronous helper for downloading Pega datasets from S3.
Use this when Prediction Studio is configured to export monitoring tables to an S3 bucket: it downloads the partitioned
.json.gzfiles into a local directory and (optionally) hands them off topdstools.adm.ADMDatamart.- Parameters:
- bucket_name¶
- temp_dir = './s3_download'¶
- async get_files(prefix: str, *, use_meta_files: bool = False, verbose: bool = True) list[str]¶
Download files from the bucket whose key starts with
prefix.Pega data exports are split into many small files. This method fetches them concurrently into
temp_dir, skipping any file that already exists locally.When
use_meta_filesis True, each real export fileXis accompanied by a.X.metasentinel file that signals the export has finished. We list keys under the dotted prefix (path/to/.files), keep entries ending in.meta, and map them back to the underlying file (path/to/files_001.json)..metafiles themselves are never copied locally.When
use_meta_filesis False, every key underprefixis downloaded.- Parameters:
- Returns:
Local paths of all files that match
prefix(newly downloaded and already cached).- Return type:
- async get_datamart_data(table: str, *, datamart_folder: str = 'datamart', verbose: bool = True) list[str]¶
Download a single datamart table from S3.
- Parameters:
table (str) – Datamart table name. One of the keys in
DATAMART_TABLE_PREFIXES:"modelSnapshot","predictorSnapshot","binaryDistribution","contingencyTable","histogram","snapshot","notification".datamart_folder (str, keyword-only, default="datamart") – Top-level folder inside the bucket that contains the datamart export.
verbose (bool, keyword-only, default=True) – Show download progress.
- Returns:
Local paths of the downloaded files.
- Return type:
- async get_adm_datamart(*, datamart_folder: str = 'datamart', verbose: bool = True) pdstools.adm.ADMDatamart.ADMDatamart¶
Construct an
ADMDatamartdirectly from S3.Convenience wrapper that downloads the model and predictor snapshot exports and feeds them into
ADMDatamart. Because this is an async function, it must be awaited.- Parameters:
- Returns:
A datamart populated with the freshly downloaded files.
- Return type:
Examples
>>> dm = await S3Data(bucket_name="testbucket").get_adm_datamart()