Glossary

Glossary of Matatika terms.

You can find the technical glossary here.

Channel
Configuration Repository
Dataset
Data Import
Data Source
Data Store
Pipeline
Transforms
Workspace

Channel

Channels are a way to group datasets together into a single source. Channels are made automatically if you push a dataset with a source that doesn’t exist in your workspace.

Configuration Repository

Each workspace is backed by a Git configuration repository.

All data processing and analytical configuration is managed through code, with the Matatika Platform securely managing credentials to Data Sources and Data Stores.

Dataset

Datasets are the insights within the Matatika platform. They can be charts, tables, or csv (excel) download links. The easiest way to define a new dataset is to use the dataset file format.

Custom datasets can be published with the Matatika API or CLI. Try it for yourself with our Getting Started guide to publishing a Dataset.

Matatika Data Sources with Instant Insights will automatically publish datasets during the data import configuration.

Data Import

A data import is another name for a Pipeline, these scheduled set of tasks extract data from a Data Source, load the data into a Data Store, and perform the configured Transforms. You need administrator access to managed the Workspace data imports.

Data Source

A data source is a file, cloud app or database that you import data from. When you choose a data source to use in a data import, you will usually be required to supply settings that control how Matatika connect’s to your data for import. At Matatika, we building a catalog of data sources that come with related transforms and datasets.

A Matatika workspace can import data from any of the 300+ Meltano extractors and Singer Taps

Data Store

A data store is responsible for storing and preparing your data for query. By default, a workspace is provisioned with an isolated PostgreSQL data store.

A Matatika workspace can use any number of JDBC compliant database or serverless data warehouses such as Google BigQuery or AWS Athena.

Pipeline

A Pipeline is scheduled set of Data Component tasks. You need administrator access to managed the Workspace pipelines.

Transforms

Transforms are used to prepare your data for analysis. For example, aggregating measurements can make big data manageable or more performant, dealing with changing names can avoid misleading inconsistencies, and naming your measures and dimensions can align analysis with your business vocabulary.

A Matatika workspace uses dbt for robust and reusable transforms.

Workspace

A workspace is an invite only data collaboration space. You can be an owner, administrator or member of many workspaces; with each workspace connected to isolated Data Stores that are capable of storing billions of rows. Rather than simple dashboards, member collaboration in the workspace used to improve the relevance of the datasets shown in your feed.

All Matatika workspace configuration for datasets, channels and scheduled data imports is able to be managed through the API, CLI, or Configuration Repository.