The Comprehensive Guide to Data Build Tool (dbt)
From Core Concepts to Advanced Implementation and Daily Operations
Introduction to the Modern Data Stack and ELT
The landscape of data engineering has experienced a fundamental paradigm shift, transitioning away from traditional Extract, Transform, Load (ETL) pipelines in favor of the Extract, Load, Transform (ELT) architecture. This evolution was precipitated by the exponential increase in the processing power and storage capabilities of cloud-native data platforms like Snowflake, Databricks, Google BigQuery, and Amazon Redshift.
In the modern ELT paradigm, raw data is extracted from source systems and loaded directly into the data warehouse using automated ingestion tools (like Fivetran or Airbyte). The transformation of this raw data into structured, analytics-ready formats is subsequently executed directly within the warehouse, leveraging its native compute resources.
Within this architecture, Data Build Tool (dbt) is the industry standard for managing in-warehouse data transformations. Operating on the philosophy of "transformations as code," dbt enables data analysts and analytics engineers to write modular, reusable SQL (and Python) queries that dbt dynamically compiles and executes against the target database. By bridging the gap between traditional data analytics and software engineering, dbt introduces rigorous software development best practices—version control, automated testing, continuous integration, and modular deployment—into the data lifecycle.
dbt Core vs. dbt Cloud: It's important to distinguish between the two. dbt Core is the open-source Python framework driven via the Command Line Interface (CLI) that actually compiles and runs your code. dbt Cloud is a managed service provided by dbt Labs that sits on top of Core, offering a web-based IDE, job scheduling, alerting, and out-of-the-box CI/CD integration.
Core Concepts and the dbt Mental Model
Mastering dbt requires the adoption of new mental models. The framework relies on specific terminology to dictate how data flows from raw ingestion to the final delivery of business-ready dashboards.
Models and the Directed Acyclic Graph (DAG)
The fundamental unit of a dbt project is the "model". Traditionally, a model is a single text file with a .sql extension containing a single SELECT statement. Rather than referencing physical database tables via hardcoded strings (which makes code brittle across dev/prod environments), models reference one another dynamically using the Jinja function {{ ref() }}.
When you invoke {{ ref('model_name') }}, dbt evaluates the current environment and resolves the reference into the correct, fully qualified database schema and table name. This also allows dbt to automatically infer relationships and construct a Directed Acyclic Graph (DAG). The DAG is the execution map, ensuring upstream dependencies are always built and validated before dbt attempts to execute downstream models.
Sources and Seeds
- Sources: Represent raw data tables loaded by third-party extraction tools. Defined in YAML configuration files, they create a boundary between raw upstream data and transformed data. You query them using
{{ source('source_name', 'table_name') }}. This formal declaration enables documentation, lineage visualization, and "freshness" checks to monitor if ingestion pipelines are delayed. - Seeds: Static CSV files stored in the
seeds/directory. When executed viadbt seed, dbt creates and populates tables in the warehouse. Seeds are strictly for small, static lookup tables (e.g., zip codes to states, country codes) and must never substitute proper ingestion tools for large datasets.
Setup & Configuration
Properly isolating the local dbt environment, establishing secure connections, and initializing the project scaffolding are the critical first steps.
Installation and Virtual Environments
Always install dbt-core within an isolated Python virtual environment to prevent dependency conflicts.
# Create and activate the virtual environment
python3 -m venv dbt-env
source dbt-env/bin/activate
# Install the core engine and specific adapter (e.g., Snowflake, BigQuery, Postgres)
python -m pip install dbt-core dbt-snowflake
Project Initialization
With dbt installed, initialize your project using the interactive prompt:
dbt init my_analytics_project
The CLI will guide you through connecting to your database, automatically generating your profiles.yml (stored securely in ~/.dbt/) and your project scaffolding (dbt_project.yml, models, tests, macros directories).
Connection Debugging
The first command any developer should run after setup or changing credentials is the debug command. It verifies your Python environment, dbt version, profiles.yml routing, and successfully pings the database:
dbt debug
Expected Terminal Output:
...
Configuration:
profiles.yml file [OK found and valid]
dbt_project.yml file [OK found and valid]
Connection:
host: my-snowflake-account.snowflakecomputing.com
database: ANALYTICS_PROD
schema: dbt_john_doe
Connection test: [OK connection ok]
All checks passed!
Architectural Best Practices: Layering & Governance
Poorly structured dbt projects rapidly degrade into disorganized codebases ("spaghetti DAGs"). The community standard is a three-tiered architecture: Staging, Intermediate, and Marts.
1. Staging: Atomic Building Blocks
The sole location where the {{ source() }} macro should be invoked. Extracts raw data and standardizes it.
- Rule: One-to-one mapping (one raw table = one staging model). No JOINs or heavy aggregations.
- Allowed: Renaming columns, casting data types, standardizing timestamps, boolean flags.
- Naming/Materialization:
stg_[source]__[entity].sql. Materialized asview.
2. Intermediate: Purpose-Built Transformations
Absorbs and manages complex logical operations. Stacks specific, purpose-built transformations before the final presentation layer.
- Rule: Join staging models together. Break complex logic into readable, modular steps.
- Naming/Materialization:
int_[entity]_[verb].sql(e.g.,int_orders_pivoted.sql). Materialized asephemeralorview.
3. Marts: Business-Defined Entities
The culmination of the pipeline, exposed directly to BI platforms (Tableau, Looker, PowerBI).
- Rule: Heavily denormalized ("wide" tables). Pull in descriptive dimensions and pre-calculated metrics.
- Naming/Materialization: Plain English, prefixed by entity type:
fct_orders.sql(Fact) ordim_customers.sql(Dimension). Materialized astableorincremental.
Modern Governance: Groups and Access
As of dbt 1.5+, you can define groups and access levels to govern cross-domain models, preventing developers from arbitrarily referencing internal models from other teams.
models:
- name: int_finance_calculations
group: finance
access: private # Only models in the 'finance' group can ref() this model
- name: fct_revenue
group: finance
access: public # Available to the entire organization
Explore Layer Examples
See how a single data entity flows through the three architectural layers:
1. Staging: stg_stripe__payments.sql
stg_stripe__payments.sql
Lightweight cleaning. Renaming, casting, and standardizing. No joins.
with source as (
select * from {{ source('stripe', 'payment') }}
),
renamed as (
select
id as payment_id,
orderid as order_id,
paymentmethod as payment_method,
status,
-- Convert cents to dollars
amount / 100 as amount,
created as created_at
from source
)
select * from renamed
2. Intermediate: int_orders_pivoted.sql
int_orders_pivoted.sql
Business logic applied. Aggregating data to prepare it for the final mart.
with payments as (
select * from {{ ref('stg_stripe__payments') }}
),
pivoted as (
select
order_id,
sum(case when status = 'success' then amount else 0 end) as total_success_amount,
sum(case when status = 'failed' then amount else 0 end) as total_failed_amount
from payments
group by 1
)
select * from pivoted
3. Mart: fct_orders.sql
fct_orders.sql
Joining the prepared intermediate tables with dimensions to create a wide, BI-ready table.
with orders as (
select * from {{ ref('stg_jaffle_shop__orders') }}
),
order_payments as (
select * from {{ ref('int_orders_pivoted') }}
),
final as (
select
orders.order_id,
orders.customer_id,
orders.order_date,
coalesce(order_payments.total_success_amount, 0) as amount
from orders
left join order_payments using (order_id)
)
select * from final
Materialization Strategies
Materializations dictate the exact DDL strategies dbt utilizes to persist models. Selecting the appropriate one is an ongoing optimization exercise balancing compute costs and query performance.
| Type | Optimal Use Case | Pros / Cons |
|---|---|---|
view |
Staging models, lightweight intermediate logic. | + Zero build time/storage. - Slow queries if complex. |
table |
Final Marts queried heavily by end-users. | + Extremely fast to query. - Slower to build, drops/recreates on every run. |
incremental |
Massive fact tables (billions of rows), event streams. | + Saves immense compute/time. - Complex to configure and maintain. |
ephemeral |
Very lightweight routing logic (compiled as CTEs). | + Keeps schema clean (no physical object). - Can cause DB compiler limits if overused. |
Deep Dive: Designing Incremental Models
Incremental models build the table piecewise. On the first run, it builds a full table. On subsequent runs, it isolates only new or changed rows.
{{
config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge', -- 'delete+insert' or 'append' also available
on_schema_change='sync_all_columns'
)
}}
with source_data as (
select * from {{ ref('stg_app_events') }}
)
select *
from source_data
{% if is_incremental() %}
-- This block is ONLY evaluated during subsequent incremental runs
where event_timestamp >= (select coalesce(max(event_timestamp), '1900-01-01') from {{ this }})
{% endif %}
Compiled SQL Output (During an Incremental Run):
/* dbt automatically wraps this in a MERGE statement under the hood */
with source_data as (
select * from my_prod_db.dbt_schema.stg_app_events
)
select *
from source_data
-- Note how the Jinja block evaluates and injects the dynamic where clause:
where event_timestamp >= (select coalesce(max(event_timestamp), '1900-01-01') from my_prod_db.dbt_schema.fct_app_events)
Choosing the Right Incremental Strategy
The incremental_strategy determines the exact SQL commands dbt runs to weave your new data into the existing table. Choosing incorrectly can lead to duplicates or massive database bills. Explore the strategies below:
Strategy: merge (Default & Safest)
merge (Default & Safest)
When to use: When your source records can be updated over time (e.g., an order changing from 'pending' to 'shipped' to 'delivered'). It requires defining a unique_key.
How it works: dbt translates your model into a standard SQL MERGE statement. It matches incoming rows against existing rows using the unique_key. If a match is found, it updates the existing row. If no match is found, it inserts the new row.
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
)
}}
Strategy: delete+insert
delete+insert
When to use: When your target data warehouse struggles with the compute cost of complex MERGE statements, or when you are processing data in large, discrete batches (like daily partitions) and you prefer to entirely replace a partition if it runs again.
How it works: It is a two-step process. First, dbt executes a DELETE statement targeting the specific partitions or unique_keys found in the new run's batch. Immediately following, it executes an INSERT to add the new batch. It essentially "clears the space" before placing the new data.
{{
config(
materialized='incremental',
unique_key='date_partition',
incremental_strategy='delete+insert'
)
}}
Strategy: append (Fastest Performance)
append (Fastest Performance)
When to use: For purely immutable event streams (e.g., website clickstreams, server logs, IoT sensor data) where a record is never updated or deleted once it is created.
How it works: dbt blindly issues a simple INSERT statement for whatever data passes your is_incremental() filter. It ignores the unique_key. Because it doesn't need to scan the table for existing matches to update, it is lightning fast. Warning: Running the exact same time window twice will result in duplicate rows!
{{
config(
materialized='incremental',
incremental_strategy='append'
-- Notice we don't need a unique_key for append!
)
}}
Advanced Logic: Jinja, Macros, and Packages
Jinja Fundamentals: Loops and Variables
Standard SQL lacks control structures (loops, variables, DRY principles). dbt resolves this by wrapping SQL files in Jinja.
A primary use case for Jinja is pivoting data dynamically. Instead of copying and pasting the same CASE WHEN statement dozens of times, you can use a for loop:
{% set payment_methods = ["bank_transfer", "credit_card", "gift_card", "paypal"] %}
select
order_id,
{% for payment_method in payment_methods %}
sum(case when payment_method = '{{ payment_method }}' then amount else 0 end) as {{ payment_method }}_amount
{%- if not loop.last -%},{%- endif -%}
{% endfor %},
sum(amount) as total_amount
from {{ ref('stg_payments') }}
group by 1
Compiled SQL Output:
select
order_id,
sum(case when payment_method = 'bank_transfer' then amount else 0 end) as bank_transfer_amount,
sum(case when payment_method = 'credit_card' then amount else 0 end) as credit_card_amount,
sum(case when payment_method = 'gift_card' then amount else 0 end) as gift_card_amount,
sum(case when payment_method = 'paypal' then amount else 0 end) as paypal_amount,
sum(amount) as total_amount
from my_prod_db.staging.stg_payments
group by 1
Abstracting Logic with Macros
When logic becomes complex or repetitive across multiple models, you should abstract it into a macro (which acts like a Python function).
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name, scale=2) %}
ROUND( CAST({{ column_name }} AS NUMERIC) / 100.0, {{ scale }} )
{% endmacro %}
Usage in a model: select {{ cents_to_dollars('transaction_amount') }} as amount_usd from ...
Compiled SQL Output (When Invoked):
select
ROUND( CAST(transaction_amount AS NUMERIC) / 100.0, 2 ) as amount_usd
from ...
Package Management (dependencies.yml)
You do not have to write every macro from scratch. The dbt community provides packages. As of dbt 1.6, the standard is dependencies.yml (replacing the older packages.yml).
# dependencies.yml
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: calogica/dbt_expectations
version: 0.9.0
Run dbt deps to install them. Packages like dbt_utils.surrogate_key or dbt_utils.pivot save hundreds of hours of coding.
Explore Essential Macros
Here are some of the most widely used macros that solve everyday data engineering problems:
Pattern: Limit Data in Development
When querying massive tables during local development, it's a best practice to limit the data scanned to save time and warehouse costs. This macro automatically applies a date filter only if you are in the dev environment.
-- macros/limit_data_in_dev.sql
{% macro limit_data_in_dev(column_name, dev_days_of_data=3) %}
{% if target.name == 'dev' %}
where {{ column_name }} >= dateadd('day', -{{ dev_days_of_data }}, current_timestamp)
{% endif %}
{% endmacro %}
Usage in a model:
select * from {{ ref('stg_massive_event_table') }}
{{ limit_data_in_dev('event_timestamp') }}
Pattern: Override Default Schema Naming
By default, dbt concatenates your target schema and custom schemas (e.g., dbt_johndoe_marts). This macro overrides that behavior so production environments build into clean schemas like marts without prefixes.
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if target.name == 'prod' and custom_schema_name is not none -%}
-- In prod, ignore the default schema and strictly use the custom schema
{{ custom_schema_name | trim }}
{%- else -%}
-- In dev, append the custom schema to the user's default schema (e.g., dbt_alice_marts)
{{ default_schema }}_{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
Python Models (Data Science in the DAG)
Since dbt 1.3, developers can create models using Python (specifically for platforms that support it, like Snowflake Snowpark, Databricks, and BigQuery). Python models are ideal for tasks SQL struggles with: complex string manipulation, API parsing, or applying machine learning models.
# models/marts/fct_customer_churn_prediction.py
import pandas as pd
def model(dbt, session):
dbt.config(
materialized="table",
packages=["scikit-learn", "pandas"]
)
# DataFrame representing an upstream dbt model
df = dbt.ref("fct_customer_features").to_pandas()
# Complex Python logic / ML Inference here
df['churn_probability'] = apply_churn_model(df)
return df
Resulting Action Output:
dbt translates this Python script into a stored procedure (e.g., in Snowflake Snowpark) behind the scenes, executes it, and outputs a permanent physical table in your warehouse containing the resulting Pandas DataFrame, completely ready for BI consumption.
Quality: Testing and Model Contracts
Generic and Singular Tests
dbt shifts data quality validation "to the left". Generic tests (unique, not_null, accepted_values, relationships) are applied in YAML.
Singular tests are custom SQL queries saved in the tests/ directory. A singular test must select failing rows. If it returns 0 rows, the test passes.
-- tests/assert_total_payment_amount_is_positive.sql
select
order_id,
sum(amount) as total_amount
from {{ ref('stg_payments') }}
group by 1
having sum(amount) < 0
Expected Output (If the Test FAILS):
| order_id | total_amount |
|----------|--------------|
| 8923 | -15.50 |
| 1024 | -5.00 |
Because this query returned records (2 rows), the pipeline immediately halts and flags an error. If it had returned 0 rows, dbt would mark the test as a success.
Model Contracts (dbt 1.5+)
Tests run after the model is built. Model Contracts enforce constraints during the build process at the database level. If a contract is broken (e.g., a column data type changes or a null is introduced), the model fails to build, preventing bad data from entering the warehouse.
models:
- name: dim_users
config:
contract:
enforced: true
columns:
- name: user_id
data_type: int
constraints:
- type: not_null
- type: primary_key
- name: email
data_type: varchar
Advanced Testing Patterns
Go beyond simple null checks by combining packages and advanced SQL techniques:
Pattern: Testing Business Logic (dbt_expectations)
Using the popular dbt_expectations package, you can write powerful, expressive YAML tests without writing custom SQL. E.g., ensuring string formats or bounding mathematical values.
models:
- name: stg_users
columns:
- name: email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$"
- name: age
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 18
max_value: 120
Pattern: Cross-Model Integrity (Singular Test)
A powerful use for singular tests is ensuring data balances perfectly across two different models. For example, validating that total payments match the recorded order totals.
-- tests/assert_order_totals_match_payments.sql
with orders as (
select order_id, total_amount from {{ ref('fct_orders') }}
),
payments as (
select order_id, sum(amount) as total_paid from {{ ref('fct_payments') }} group by 1
)
select
orders.order_id,
orders.total_amount,
payments.total_paid
from orders
join payments using (order_id)
-- The test FAILS if any records are returned where amounts don't match
where orders.total_amount != payments.total_paid
Tracking Historical Data: dbt Snapshots
Snapshots monitor mutable source tables, detect alterations, and record changes sequentially by appending new rows with validity timestamps, acting as automated Type-2 Slowly Changing Dimensions (SCD Type 2).
{% snapshot orders_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='order_id',
strategy='timestamp',
updated_at='last_modified_at'
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
Resulting Table Output (SCD Type 2 structure):
| order_id | status | last_modified_at | dbt_valid_from | dbt_valid_to |
|----------|---------|---------------------|---------------------|---------------------|
| 101 | pending | 2023-10-01 08:00:00 | 2023-10-01 08:00:00 | 2023-10-02 14:30:00 |
| 101 | shipped | 2023-10-02 14:30:00 | 2023-10-02 14:30:00 | null |
Notice how dbt automatically generated the dbt_valid_from and dbt_valid_to columns to track the history. The active, current record is easily identified by having a null end date.
Snapshot Strategies Explained
How does dbt know when a record actually changed? It relies on the strategy you pick in the configuration.
Strategy: timestamp (Recommended)
timestamp (Recommended)
This strategy compares a reliable updated_at column on your source data against the latest snapshot. If the timestamp is newer, dbt records a change. This is highly performant.
config(
strategy='timestamp',
updated_at='updated_at_column'
)
Strategy: check
check
Use this when your source table does not have a reliable updated_at column. dbt will literally compare a list of columns between the source and the snapshot. If any value differs, it triggers an update. Use check_cols='all' to check every column, or pass a list of specific columns to monitor.
config(
strategy='check',
check_cols=['status', 'priority', 'assignee_id']
)
The Developer's Everyday Commands Handbook
Interacting with dbt requires CLI fluency. Here are the commands you will use daily, moving away from older paradigms and adopting modern dbt features.
Execution & Building
dbt build
Replaces dbt run followed by dbt test. It intelligently builds and tests models node-by-node. If model A fails its test, dbt immediately halts the build of model B (which depends on A), saving compute and preventing cascading errors.
dbt build --select my_model+
Node Selection: Builds my_model and everything downstream of it. Use +my_model for upstream parents, or +my_model+ for the whole lineage slice. Use @my_model to build the model, its children, and the parents of those children (ideal for testing local changes).
dbt retry
The Lifesaver: Did a 3-hour run fail on the very last model due to a typo? Fix the typo and run dbt retry. It reads the previous run state and strictly executes only the models that failed or were skipped.
Targeted Execution & Variables
dbt run --select {{ model }} --vars '{{ vars_dict }}'
dbt test --select {{ model }} --vars '{{ vars_dict }}'
dbt build --select {{ model }} --vars '{{ vars_dict }}'
Runtime Variables: While build is the modern standard, you can still granularly execute a run (transformations only) or test (validations only). Passing --vars allows you to inject JSON dictionaries to override variables at runtime, which is incredibly useful for backfilling specific dates or environments.
Source Management
dbt source freshness --select source:{{ source }}
dbt test --select source:{{ source }}
Validation: Use freshness to verify if your upstream raw data is arriving on time based on your YAML definitions. Use the targeted test command to validate exclusively your raw data ingestion before kicking off your downstream pipeline.
Project Parsing & Compilation
dbt parse
dbt show --select {{ model }} --limit 5
dbt compile --select {{ model }} --vars '{{ vars_dict }}'
Compilation: dbt parse quickly generates the manifest.json without executing any SQL—great for catching Jinja or YAML syntax errors. dbt show compiles Jinja and returns a live data preview directly in your terminal. Finally, dbt compile translates your Jinja into standard SQL and saves it in the target/ folder.
Advanced State & Cloning
dbt run --select {{ model }} --favor-state --defer --state $PROJECT_HOME/artifacts/prd/
dbt clone --state $PROJECT_HOME/artifacts/prd --select {{ model }}
State Management: By passing your production artifacts to --state and using --favor-state --defer, you can build models locally while seamlessly reading from production's upstream dependencies instead of building them all from scratch. The dbt clone command allows you to create lightning-fast, zero-copy clones of tables (on supported platforms like Snowflake or BigQuery) directly into your development environment.
Maintenance & Operations
dbt run-operation list_models_upstreams --args '{{ args_dict }}'
dbt deps
dbt clean && dbt deps
Operations: run-operation executes a specific macro directly without building a model—perfect for listing dependencies or automating database permission grants. Run deps to install upstream packages, and use the classic dbt clean && dbt deps to wipe your target/ cache and re-download everything when facing bizarre caching errors.
Static Data Loading
dbt seed
The Mapping Loader: This command is essential for loading static CSV files located in your seeds/ directory into your data warehouse as tables. It is typically used for mapping tables, country codes, or static reference data that your models need to join against. You can also run dbt build --select my_seed+ to rebuild downstream models when a seed changes.
Historical Tracking
dbt snapshot
SCD Type 2 Automation: If you are tracking historical changes in your mutable source tables (Slowly Changing Dimensions Type 2), this is the command to use. It runs the snapshot definitions to capture the state of your records at a given point in time, allowing you to easily reconstruct historical views of your data.
Documentation & Lineage
dbt docs generate
Project Compilation: This parses your project and generates the manifest.json and catalog.json files, which contain all the metadata about your models, descriptions, tests, and database schema information.
dbt docs serve
The Visualizer: Spins up a local web server to host the documentation generated in the previous step. This is an everyday necessity for visualizing your lineage graph (the DAG) and verifying that your dependencies and model descriptions are rendering correctly before merging code.
Troubleshooting & Discovery
dbt debug
The Connection Checker: Before you go down a rabbit hole trying to figure out why your models are failing to compile, dbt debug checks your profiles.yml, tests your database connection, and verifies your dbt project configurations. It is the first thing you should run when setting up a new environment or facing connection timeouts.
dbt ls
(or dbt list)
The Selector Validator: Instead of running models to see what your --select syntax actually grabs, use dbt ls. For example, dbt ls --select my_model+ will simply output a list of all nodes that match the criteria. It is invaluable for debugging complex node selection syntax without executing any queries against your warehouse.
Automation, Hooks, and Slim CI
Hooks
Hooks are SQL snippets executing at specific intervals (pre-hook, post-hook, on-run-start). They automate governance tasks like granting permissions.
models:
my_project:
+post-hook:
- "GRANT SELECT ON {{ this }} TO ROLE bi_reporter_role"
State-Aware Orchestration (Slim CI & Deferral)
The most critical advancement in dbt deployment architecture is Slim CI. In enterprise projects, running the entire DAG on every Pull Request is too slow and expensive. By providing the manifest.json from the production environment, you can instruct dbt to strictly build what has changed.
dbt build --select state:modified+ --state path/to/prod/artifacts --defer
The --defer flag is magical: it tells dbt, "If an upstream parent model wasn't modified in this PR, don't build it in my dev schema. Instead, just read the production version of that parent model." This reduces CI/CD runtimes from hours to minutes.
Data Build Tool has fundamentally redefined the operational standards of data engineering. By bridging software engineering principles with data transformation, a properly architected dbt environment provides a single, governed, transparent, and highly resilient source of truth.