# File Formats & Delivery

CenterCheck data is delivered as **Parquet or CSV files** written directly to your AWS S3 bucket. Files are organized by delivery date and dataset type. Parquet datasets are sized to remain under 1GB each — larger datasets are automatically split across multiple files. CSV datasets are always delivered as a single file per dataset.

***

## Access & Authentication

You provide CenterCheck with access to your existing S3 bucket. Upon onboarding, share the following with your CenterCheck account representative:

| Credential              | Description                                             |
| ----------------------- | ------------------------------------------------------- |
| `AWS_ACCESS_KEY_ID`     | An AWS access key with write permissions to your bucket |
| `AWS_SECRET_ACCESS_KEY` | The corresponding AWS secret key                        |
| `AWS_REGION`            | The region where your bucket is hosted                  |
| `S3_BUCKET_NAME`        | Your bucket name                                        |

***

## Folder Structure

Files are organized by delivery date and dataset type:

```
s3://your-bucket-name/
└── 2025-01-31/
    ├── business_locations/
    │   ├── data_0.parquet
    │   └── data_1.parquet
    ├── brand/
    │   └── data_0.parquet
    ├── transactions_and_sales_metrics/
    │   └── data_0.parquet
    ├── shopper_demographics/
    │   └── data_0.parquet
    ├── spend_journey/
    │   └── data_0.parquet
    ├── zip_code_capture/
    │   └── data_0.parquet
    ├── time_of_sales_weekly/
    │   └── data_0.parquet
    └── time_of_sales_daily/
        └── data_0.parquet
```

Each delivery creates a new top-level folder named with the delivery date in `YYYY-MM-DD` format. All datasets for that delivery are nested within their respective subfolders.

The reporting period covered by each dataset is determined by the `start_at` and `end_at` fields within the files themselves, not the delivery date folder.

***

## File Format

| Property            | Parquet                                               | CSV                                               |
| ------------------- | ----------------------------------------------------- | ------------------------------------------------- |
| Format              | Apache Parquet                                        | CSV                                               |
| Max file size       | \~1GB                                                 | No limit                                          |
| Multi-file datasets | Split across `data_0.parquet`, `data_1.parquet`, etc. | Single file per dataset                           |
| Compression         | Snappy                                                | None                                              |
| Delivery cadence    | Monthly or weekly, depending on your subscription     | Monthly or weekly, depending on your subscription |

***

## Notes

**Always read all Parquet files in a dataset folder together.** A single dataset may be split across multiple Parquet files. Reading only `data_0.parquet` will result in an incomplete dataset. CSV datasets are always a single file and do not have this concern.

**Delivery date ≠ reporting period.** The top-level folder reflects when the data was delivered, not the period it covers. Always use the `start_at` and `end_at` fields within each file to determine the reporting period.

**Nested fields require exploding.** The `previous_locations`, `next_locations`, and `secondary_next_locations` fields in Spend Journey are stored as nested JSON arrays within the Parquet files. These must be exploded into individual rows before analysis. Refer to the individual schema pages for the nested object structure.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.centercheck.com/xbxMQACVcX90R5ugyGdt/flat-file-reference/formats.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
