File Formats & Delivery

CenterCheck data is delivered as Parquet files written directly to your AWS S3 bucket. Files are organized by delivery date and dataset type, and are sized to remain under 1GB each — larger datasets are automatically split across multiple files.


Access & Authentication

You provide CenterCheck with access to your existing S3 bucket. Upon onboarding, share the following with your CenterCheck account representative:

Credential
Description

AWS_ACCESS_KEY_ID

An AWS access key with write permissions to your bucket

AWS_SECRET_ACCESS_KEY

The corresponding AWS secret key

AWS_REGION

The region where your bucket is hosted

S3_BUCKET_NAME

Your bucket name

We recommend creating a dedicated IAM user with scoped permissions limited to the target bucket. Never share root account credentials.


Folder Structure

Files are organized by delivery date and dataset type:

s3://your-bucket-name/
└── 2025-01-31/
    ├── business_locations/
    │   ├── data_0.parquet
    │   └── data_1.parquet
    ├── brand/
    │   └── data_0.parquet
    ├── transactions_and_sales_metrics/
    │   └── data_0.parquet
    ├── cardholder_demographics/
    │   └── data_0.parquet
    ├── spend_journey/
    │   └── data_0.parquet
    └── zip_code_capture/
        └── data_0.parquet

Each delivery creates a new top-level folder named with the delivery date in YYYY-MM-DD format. All datasets for that delivery are nested within their respective subfolders.

The reporting period covered by each dataset is determined by the start_at and end_at fields within the files themselves, not the delivery date folder.


File Format

Property
Value

Format

Apache Parquet

Max file size

~1GB

Multi-file datasets

Split across data_0.parquet, data_1.parquet, etc.

Delivery cadence

Monthly or weekly, depending on your subscription

Compression

Snappy (default Parquet compression)

When a dataset exceeds 1GB for a given delivery, it is automatically split into multiple sequentially numbered files. All files within a dataset folder must be read together to obtain the complete dataset.


Notes

Always read all files in a dataset folder together. A single dataset may be split across multiple Parquet files. Reading only data_0.parquet will result in an incomplete dataset.

Delivery date ≠ reporting period. The top-level folder reflects when the data was delivered, not the period it covers. Always use the start_at and end_at fields within each file to determine the reporting period.

Nested fields require exploding. The previous_locations, next_locations, and secondary_next_locations fields in Spend Journey are stored as nested JSON arrays within the Parquet files. These must be exploded into individual rows before analysis. Refer to the individual schema pages for the nested object structure.

Last updated