pdstools.pega_io.S3

Async S3 helper for downloading Pega dataset exports.

Attributes

Classes

S3Data

Asynchronous helper for downloading Pega datasets from S3.

Module Contents

logger
DATAMART_TABLE_PREFIXES: dict[str, str]
class S3Data(bucket_name: str, temp_dir: str = './s3_download')

Asynchronous helper for downloading Pega datasets from S3.

Use this when Prediction Studio is configured to export monitoring tables to an S3 bucket: it downloads the partitioned .json.gz files into a local directory and (optionally) hands them off to pdstools.adm.ADMDatamart.

Parameters:
  • bucket_name (str) – Name of the S3 bucket containing the dataset folder.

  • temp_dir (str, default="./s3_download") – Directory where downloaded files are placed. Should be a folder you don’t mind being filled with cached exports.

bucket_name
temp_dir = './s3_download'
async get_files(prefix: str, *, use_meta_files: bool = False, verbose: bool = True) list[str]

Download files from the bucket whose key starts with prefix.

Pega data exports are split into many small files. This method fetches them concurrently into temp_dir, skipping any file that already exists locally.

When use_meta_files is True, each real export file X is accompanied by a .X.meta sentinel file that signals the export has finished. We list keys under the dotted prefix (path/to/.files), keep entries ending in .meta, and map them back to the underlying file (path/to/files_001.json). .meta files themselves are never copied locally.

When use_meta_files is False, every key under prefix is downloaded.

Parameters:
  • prefix (str) – S3 key prefix (see boto3 Bucket.objects.filter(Prefix=...)).

  • use_meta_files (bool, keyword-only, default=False) – Whether to use companion .meta files to gate downloads.

  • verbose (bool, keyword-only, default=True) – Show a tqdm progress bar (if installed) and print a summary.

Returns:

Local paths of all files that match prefix (newly downloaded and already cached).

Return type:

list[str]

async get_datamart_data(table: str, *, datamart_folder: str = 'datamart', verbose: bool = True) list[str]

Download a single datamart table from S3.

Parameters:
  • table (str) – Datamart table name. One of the keys in DATAMART_TABLE_PREFIXES: "modelSnapshot", "predictorSnapshot", "binaryDistribution", "contingencyTable", "histogram", "snapshot", "notification".

  • datamart_folder (str, keyword-only, default="datamart") – Top-level folder inside the bucket that contains the datamart export.

  • verbose (bool, keyword-only, default=True) – Show download progress.

Returns:

Local paths of the downloaded files.

Return type:

list[str]

async get_adm_datamart(*, datamart_folder: str = 'datamart', verbose: bool = True) pdstools.adm.ADMDatamart.ADMDatamart

Construct an ADMDatamart directly from S3.

Convenience wrapper that downloads the model and predictor snapshot exports and feeds them into ADMDatamart. Because this is an async function, it must be awaited.

Parameters:
  • datamart_folder (str, keyword-only, default="datamart") – Top-level folder inside the bucket that contains the datamart export.

  • verbose (bool, keyword-only, default=True) – Show download progress.

Returns:

A datamart populated with the freshly downloaded files.

Return type:

ADMDatamart

Examples

>>> dm = await S3Data(bucket_name="testbucket").get_adm_datamart()