Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Transform Data

Time required: 15 minutes


Prerequisites


Introduction

A data transform is a layer of logic that transforms one set of data into another - this is typically the ‘T’ in ELT. An example of a simple transform might be to calculate the total amount each customer has spent across all orders they have placed. Matatika uses dbt as a transformation tool, which operates on the concept of transforms-as-code, in the form of templated SQL files.

Create a source

A dbt source is a definition of the “source” database schema. To define a source, create a YAML file (e.g. my_source.yml) under the transforms/models directory (refer to the dbt docs for more information):

version: 2

sources:
  - name: my_source
    schema: "{{ env_var('DBT_SOURCE_SCHEMA') }}"
    tables:
      - name: customers
      - name: orders

Create a model

A dbt model translates to a database table that is created and updated by dbt. To create a model for a workspace, first create a new SQL file (e.g. my_model.sql) under the transforms/models directory. Start writing SQL in this file to define transform logic, using dbt Jinja functions to inject dynamic behaviour:

{{ config(materialized='table') }}

with customers as (
    select * from {{ source('my_source', 'customers') }}
),
orders as (
    select * from {{ source('my_source', 'orders') }}
),
"final" as (
  select
    c.id
    , c."name"
    , SUM(o.product_price * o.quantity) as total_spend
  from customers c
  join orders o on o.customer_id = c.id
  group by c.id, c."name"
  order by total_spend desc
)
select * from "final"

Run your models

dbt models require the source tables to exist before they are run. If the data isn’t already present, you need to run your data import locally.

# ensure dbt is installed
meltano install transformer dbt

# run dbt models
meltano invoke dbt run

See dbt run docs for more information

When a pipeline with dbt is run in the Matatika platform, models in the project will be automatically run.