Glossary of Matatika terms.
You can find the technical glossary here.
- Configuration Repository
- Data Import
- Data Source
- Data Store
Datasets are the insights within the Matatika platform. They can be charts, tables, or csv (excel) download links. The easiest way to define a new dataset is to use the dataset file format.
Matatika Data Sources will automatically publish datasets during the data import configuration.
A data import is another name for a Pipeline, these scheduled set of tasks extract data from a Data Source, load the data into a Data Store, and perform the configured Transforms. You need adminstrator access to managed the Workspace data imports.
A data source is a file, cloud app or database that you import data from. When you choose a data source to use in a data import, you will usually be required to supply settings that control how Matatika connect’s to your data for import. At Matatika, we building a catalog of data sources that come with related transforms and datasets.
A Matatika workspace can import data from any of the 300+ Meltano extractors and Singer Taps
A data store is responsible for storing and preparing your data for query. By default, a workspace is provisioned with an isolated PostgreSQL data store.
A Matatika workspace can use any number of JDBC compliant database or serverless data warehouses such as Google BigQuery or AWS Athena.
A Pipeline is scheduled set of Data Component tasks. You need adminstrator access to managed the Workspace pipelines.
Transforms are used to prepare your data for analysis. For example, aggregating measurements can make big data manageable or more performant, dealing with changing names can avoid misleading inconsistencies, and naming your measures and dimensions can align analysis with your business vocabulary.
A Matatika workspace uses dbt for robust and reusable transforms.
A workspace is an invite only data collaboration space. You can be an owner, administrator or member of many workspaces; with each workspace connected to isolated Data Stores that are capable of storing billions of rows. Rather than simple dashboards, member collaboration in the workspace used to improve the relevance of the datasets shown in your feed.