S3 Parquet Connect

S3 Parquet data store setup in minutes

Setup the Matatika platform to deliver and process your data in S3 Parquet in minutes.

Automate data from S3 Parquet with no code

S3 Parquet is a file format for storing and processing large amounts of data in a distributed computing environment.

S3 Parquet is a columnar storage format that allows for efficient compression and encoding of data, making it ideal for storing and processing large amounts of data in a distributed computing environment. It is designed to work seamlessly with Amazon S3 and other big data processing tools such as Apache Spark and Hadoop. S3 Parquet allows for faster data processing and analysis, as well as reduced storage costs, making it a popular choice for big data applications.

Settings

S3 Path

The path to the S3 bucket and object where the Parquet data is stored.

AWS Access Key Id

The access key ID for the AWS account that has permission to access the S3 bucket.

AWS Secret Access Key

The secret access key for the AWS account that has permission to access the S3 bucket.

Athena Database

The name of the Athena database where the Parquet data will be queried.

Add Record Metadata

Whether or not to add metadata to each record in the Parquet data.

Stringify Schema

Whether or not to convert the schema of the Parquet data to a string format.

Stream Maps

A mapping of column names to stream names for the Parquet data.

Stream Map Config

Configuration options for the stream maps.

Flattening Enabled

Whether or not to flatten nested structures in the Parquet data.

Flattening Max Depth

The maximum depth to which nested structures will be flattened.


View source code

S3 Parquet

Collect and process data from 100s of sources and tools with S3 Parquet.

Interested in learning more?

Get in touch