Developer Blog | dbt Developer Hub

Why we're deprecating the dbt_metrics package

April 26, 2023 · 5 min read

Senior Developer Experience Advocate @ dbt Labs

Hello, my dear data people.

If you haven’t read Nick & Roxi’s blog post about what’s coming in the future of the dbt Semantic Layer, I highly recommend you read through that, as it gives helpful context around what the future holds.

With that said, it has come time for us to bid adieu to our beloved dbt_metrics package. Upon the release of dbt-core v1.6 in late July, we will be deprecating support for the dbt_metrics package.

How we reduced a 6-hour runtime in Alteryx to 9 minutes with dbt and Snowflake

April 25, 2023 · 12 min read

Arthur Marcon

Analytics Engineer @ Indicium Tech

Lucas Bergo Dias

Analytics Engineer @ Indicium Tech

Christian van Bellen

Analytics Engineer @ Indicium Tech

Alteryx is a visual data transformation platform with a user-friendly interface and drag-and-drop tools. Nonetheless, Alteryx may have difficulties to cope with the complexity increase within an organization’s data pipeline, and it can become a suboptimal tool when companies start dealing with large and complex data transformations. In such cases, moving to dbt can be a natural step, since dbt is designed to manage complex data transformation pipelines in a scalable, efficient, and more explicit manner. Also, this transition involved migrating from on-premises SQL Server to Snowflake cloud computing. In this article, we describe the differences between Alteryx and dbt, and how we reduced a client's 6-hour runtime in Alteryx to 9 minutes with dbt and Snowflake at Indicium Tech.

Building a Kimball dimensional model with dbt

April 20, 2023 · 20 min read

Jonathan Neo

Data Engineer @ Canva & Data Engineer Camp

Dimensional modeling is one of many data modeling techniques that are used by data practitioners to organize and present data for analytics. Other data modeling techniques include Data Vault (DV), Third Normal Form (3NF), and One Big Table (OBT) to name a few.

Data modeling techniques on a normalization vs denormalization scale

While the relevance of dimensional modeling has been debated by data practitioners, it is still one of the most widely adopted data modeling technique for analytics.

Despite its popularity, resources on how to create dimensional models using dbt remain scarce and lack detail. This tutorial aims to solve this by providing the definitive guide to dimensional modeling with dbt.

By the end of this tutorial, you will:

Understand dimensional modeling concepts
Set up a mock dbt project and database
Identify the business process to model
Identify the fact and dimension tables
Create the dimension tables
Create the fact table
Document the dimensional model relationships
Consume the dimensional model

dbt Squared: Leveraging dbt Core and dbt Cloud together at scale

April 17, 2023 · 12 min read

João Antunes

Lead Data Engineer, Global Product Strategy @ Roche

Yannick Misteli

Head of Engineering, Global Product Strategy @ Roche

Sean McIntyre

Senior Solutions Architect @ dbt Labs

Teams thrive when each team member is provided with the tools that best complement and enhance their skills. You wouldn’t hand Cristiano Ronaldo a tennis racket and expect a perfect serve! At Roche, getting the right tools in the hands of our teammates was critical to our ability to grow our data team from 10 core engineers to over 100 contributors in just two years. We embraced both dbt Core and dbt Cloud at Roche (a dbt-squared solution, if you will!) to quickly scale our data platform.

The missing guide to debug() in dbt

March 29, 2023 · 7 min read

Benoit Perigaud

Staff Analytics Engineer @ dbt Labs

Editor's note—this post assumes intermediate knowledge of Jinja and macros development in dbt. For an introduction to Jinja in dbt check out the documentation and the free self-serve course on Jinja, Macros, Pacakages.

Jinja brings a lot of power to dbt, allowing us to use ref(), source() , conditional code, and macros. But, while Jinja brings flexibility, it also brings complexity, and like many times with code, things can run in expected ways.

The debug() macro in dbt is a great tool to have in the toolkit for someone writing a lot of Jinja code, but it might be difficult to understand how to use it and what benefits it brings.

Let’s dive into the last time I used debug() and how it helped me solve bugs in my code.

Audit_helper in dbt: Bringing data auditing to a higher level

March 24, 2023 · 15 min read

Arthur Marcon

Analytics Engineer @ Indicium Tech

Lucas Bergo Dias

Analytics Engineer @ Indicium Tech

Christian van Bellen

Analytics Engineer @ Indicium Tech

Auditing tables is a major part of analytics engineers’ daily tasks, especially when refactoring tables that were built using SQL Stored Procedures or Alteryx Workflows. In this article, we present how the audit_helper package can (as the name suggests) help the table auditing process to make sure a refactored model provides (pretty much) the same output as the original one, based on our experience using this package to support our clients at Indicium Tech®.

BigQuery ingestion-time partitioning and partition copy with dbt

March 10, 2023 · 8 min read

Christophe Oudar

Staff Software Engineer @ Teads

At Teads, we’ve been using BigQuery (BQ) to build our analytics stack since 2017. As presented in a previous article, we have designed pipelines that use multiple roll-ups that are aggregated in data marts. Most of them revolve around time series, and therefore time-based partitioning is often the most appropriate approach.

Tips and advice to study for, and pass, the dbt Certification exam

February 16, 2023 · 9 min read

Callie White

Analytics Consultant @ Montreal Analytics

Jade Milaney

Analytics Consultant @ Montreal Analytics

The new dbt Certification Program has been created by dbt Labs to codify the data development best practices that enable safe, confident, and impactful use of dbt. Taking the Certification allows dbt users to get recognized for the skills they’ve honed, and stand out to organizations seeking dbt expertise.

Over the last few months, Montreal Analytics, a full-stack data consultancy servicing organizations across North America, has had over 25 dbt Analytics Engineers become certified, earning them the 2022 dbt Platinum Certification award.

In this article, two Montreal Analytics consultants, Jade and Callie, discuss their experience in taking, and passing, the dbt Certification exam to help guide others looking to study for, and pass the exam.

How we cut our tests by 80% while increasing data quality: the power of aggregating test failures in dbt

January 24, 2023 · 8 min read

Noah Kennedy

Software Developer @ Tempus

Testing the quality of data in your warehouse is an important aspect in any mature data pipeline. One of the biggest blockers for developing a successful data quality pipeline is aggregating test failures and successes in an informational and actionable way. However, ensuring actionability can be challenging. If ignored, test failures can clog up a pipeline and create unactionable noise, rendering your testing infrastructure ineffective.

Power up your data quality with grouped checks

January 17, 2023 · 7 min read

Emily Riederer

Senior Manager Analytics @ Capital One

Imagine you were responsible for monitoring the safety of a subway system. Where would you begin? Most likely, you'd start by thinking about the key risks like collision or derailment, contemplate what causal factors like scheduling software and track conditions might contribute to bad outcomes, and institute processes and metrics to detect if those situations arose. What you wouldn't do is blindly apply irrelevant industry standards like testing for problems with the landing gear (great for planes, irrelevant for trains) or obsessively worry about low probability events like accidental teleportation before you'd locked down the fundamentals.

When thinking about real-world scenarios, we're naturally inclined to think about key risks and mechanistic causes. However, in the more abstract world of data, many of our data tests often gravitate towards one of two extremes: applying rote out-of-the-box tests (nulls, PK-FK relationships, etc.) from the world of traditional database management or playing with exciting new toys that promise to catch our wildest errors with anomaly detection and artificial intelligence.

Between these two extremes lies a gap where human intelligence goes. Analytics engineers can create more effective tests by embedding their understanding of how the data was created, and especially how this data can go awry (a topic I've written about previously). While such expressive tests will be unique to our domain, modest tweaks to our mindset can help us implement them with our standard tools. This post demonstrates how the simple act of conducting tests by group can expand the universe of possible tests, boost the sensitivity of the existing suite, and help keep our data "on track". This feature is now available in dbt-utils.

Making the leap from accountant to analytics engineer

December 15, 2022 · 9 min read

Samuel Harting

Associate Analytics Engineer @ dbt Labs

In seventh grade, I decided it was time to pick a realistic career to work toward, and since I had an accountant in my life who I really looked up to, that is what I chose. Around ten years later, I finished my accounting degree with a minor in business information systems (a fancy way of saying I coded in C# for four or five classes). I passed my CPA exams quickly and became a CPA as soon as I hit the two-year experience requirement. I spent my first few years at a small firm completing tax returns but I didn't feel like I was learning enough, so I went to a larger firm right before the pandemic started. The factors that brought me to the point of changing industries are numerous, but I’ll try to keep it concise: the tax industry relies on underpaying its workers to maintain margins and prevent itself from being top-heavy, my future work as a manager was unappealing to me, and my work was headed in a direction I wasn’t excited about.

Introducing the dbt_project_evaluator: Automatically evaluate your dbt project for alignment with best practices

November 30, 2022 · 7 min read

Grace Goheen

Analytics Engineer @ dbt Labs

Why we built this: A brief history of the dbt Labs Professional Services team

If you attended Coalesce 2022, you’ll know that the secret is out — the dbt Labs Professional Services team is not just a group of experienced data consultants; we’re also an intergalactic group of aliens traveling the Milky Way on a mission to enable analytics engineers to successfully adopt and manage dbt throughout the galaxy.

How to move data from spreadsheets into your data warehouse

November 23, 2022 · 11 min read

Joel Labes

Senior Developer Experience Advocate @ dbt Labs

Once your data warehouse is built out, the vast majority of your data will have come from other SaaS tools, internal databases, or customer data platforms (CDPs). But there’s another unsung hero of the analytics engineering toolkit: the humble spreadsheet.

Spreadsheets are the Swiss army knife of data processing. They can add extra context to otherwise inscrutable application identifiers, be the only source of truth for bespoke processes from other divisions of the business, or act as the translation layer between two otherwise incompatible tools.

Because of spreadsheets’ importance as the glue between many business processes, there are different tools to load them into your data warehouse and each one has its own pros and cons, depending on your specific use case.

A journey through the Foundry: Becoming an analytics engineer at dbt Labs

November 22, 2022 · 6 min read

Wasila Quader

Associate Analytics Engineer @ dbt Labs

Data is an industry of sidesteppers. Most folks in the field stumble into it, look around, and if they like what they see, they’ll build a career here. This is particularly true in the analytics engineering space. Every AE I’ve talked to had envisioned themselves doing something different before finding this work in a moment of serendipity. This raises the question, how can someone become an analytics engineer intentionally? This is the question dbt Labs’ Foundry Program aims to address.

Demystifying event streams: Transforming events into tables with dbt

November 4, 2022 · 13 min read

Charlie Summers

Staff Software Engineer @ Merit

Let’s discuss how to convert events from an event-driven microservice architecture into relational tables in a warehouse like Snowflake. Here are a few things we’ll address:

Why you may want to use an architecture like this
How to structure your event messages
How to use dbt macros to make it easy to ingest new event streams

Stronger together: Python, dataframes, and SQL

October 18, 2022 · 13 min read

Doug Beatty

Senior Developer Experience Advocate @ dbt Labs

For years working in data and analytics engineering roles, I treasured the daily camaraderie sharing a small office space with talented folks using a range of tools - from analysts using SQL and Excel to data scientists working in Python. I always sensed that there was so much we could work on in collaboration with each other - but siloed data and tooling made this much more difficult. The diversity of our tools and languages made the potential for collaboration all the more interesting, since we could have folks with different areas of expertise each bringing their unique spin to the project. But logistically, it just couldn’t be done in a scalable way.

So I couldn’t be more excited about dbt’s polyglot capabilities arriving in dbt Core 1.3. This release brings Python dataframe libraries that are crucial to data scientists and enables general-purpose Python but still uses a shared database for reading and writing data sets. Analytics engineers and data scientists are stronger together, and I can’t wait to work side-by-side in the same repo with all my data scientist friends.

Going polyglot is a major next step in the journey of dbt Core. While it expands possibilities, we also recognize the potential for confusion. When combined in an intentional manner, SQL, dataframes, and Python are also stronger together. Polyglot dbt allows informed practitioners to choose the language that best fits your use case.

In this post, we’ll give you your hands-on experience and seed your imagination with potential applications. We’ll walk you through a demo that showcases string parsing - one simple way that Python can be folded into a dbt project.

We’ll also give you the intellectual resources to compare/contrast:

different dataframe implementations within different data platforms
dataframes vs. SQL

Finally, we’ll share “gotchas” and best practices we’ve learned so far and invite you to participate in discovering the answers to outstanding questions we are still curious about ourselves.

Based on our early experiences, we recommend that you:

✅ Do: Use Python when it is better suited for the job – model training, using predictive models, matrix operations, exploratory data analysis (EDA), Python packages that can assist with complex transformations, and select other cases where Python is a more natural fit for the problem you are trying to solve.

❌ Don’t: Use Python where the solution in SQL is just as direct. Although a pure Python dbt project is possible, we’d expect the most impactful projects to be a mixture of SQL and Python.

Analysts make the best analytics engineers

September 29, 2022 · 11 min read

Brittany Krauth

Manager, Analytics & Insights @ Degreed

When you were in grade school, did you ever play the “Telephone Game”? The first person would whisper a word to the second person, who would then whisper a word to the third person, and so on and so on. At the end of the line, the final person would loudly announce the word that they heard, and alas! It would have morphed into a new word completely incomprehensible from the original word. That’s how life feels without an analytics engineer on your team.

So let’s say that you have a business question, you have the raw data in your data warehouse, and you’ve got dbt up and running. You’re in the perfect position to get this curated dataset completed quickly! Or are you?

The case against `git cherry pick`: Recommended branching strategy for multi-environment dbt projects

September 13, 2022 · 11 min read

Grace Goheen

Analytics Engineer @ dbt Labs

You can now use a Staging environment!

This blog post was written before Staging environments. You can now use dbt Cloud can to support the patterns discussed here. Read more about Staging environments.

Why do people cherry pick into upper branches?

The simplest branching strategy for making code changes to your dbt project repository is to have a single main branch with your production-level code. To update the main branch, a developer will:

Create a new feature branch directly from the main branch
Make changes on said feature branch
Test locally
When ready, open a pull request to merge their changes back into the main branch

Basic git workflow

If you are just getting started in dbt and deciding which branching strategy to use, this approach–often referred to as “continuous deployment” or “direct promotion”–is the way to go. It provides many benefits including:

Fast promotion process to get new changes into production
Simple branching strategy to manage

The main risk, however, is that your main branch can become susceptible to bugs that slip through the pull request approval process. In order to have more intensive testing and QA before merging code changes into production, some organizations may decide to create one or more branches between the feature branches and main.

KonMari your data: Planning a query migration using the Marie Kondo method

September 8, 2022 · 10 min read

Lauren Benezra

Analytics Engineer @ dbt Labs

If you’ve ever heard of Marie Kondo, you’ll know she has an incredibly soothing and meditative method to tidying up physical spaces. Her KonMari Method is about categorizing, discarding unnecessary items, and building a sustainable system for keeping stuff.

As an analytics engineer at your company, doesn’t that last sentence describe your job perfectly?! I like to think of the practice of analytics engineering as applying the KonMari Method to data modeling. Our goal as Analytics Engineers is not only to organize and clean up data, but to design a sustainable and scalable transformation project that is easy to navigate, grow, and consume by downstream customers.

Let’s talk about how to apply the KonMari Method to a new migration project. Perhaps you’ve been tasked with unpacking the kitchen in your new house; AKA, you’re the engineer hired to move your legacy SQL queries into dbt and get everything working smoothly. That might mean you’re grabbing a query that is 1500 lines of SQL and reworking it into modular pieces. When you’re finished, you have a performant, scalable, easy-to-navigate data flow.

Leverage Accounting Principles when Modeling Financial Data

September 7, 2022 · 14 min read

Joe Markiewicz

Analytics Engineering Manager (Fivetran dbt package maintainer) @ Fivetran

Analyzing financial data is rarely ever “fun.” In particular, generating and analyzing financial statement data can be extremely difficult and leaves little room for error. If you've ever had the misfortune of having to generate financial reports for multiple systems, then you will understand how incredibly frustrating it is to reinvent the wheel each time.

This process can include a number of variations, but usually involves spending hours, days, or weeks working with Finance to:

Understand what needs to go into the reports
Model said reports
Validate said reports
Make adjustments within your model
Question your existence
Validate said reports again

You can imagine how extremely time consuming this process can be. Thankfully, you can leverage core accounting principles and other tools to more easily and effectively generate actionable financial reports. This way, you can spend more time diving into deeper financial analyses.

Developer Blog | dbt Developer Hub

Why we built this: A brief history of the dbt Labs Professional Services team​

Why do people cherry pick into upper branches?​

Why we built this: A brief history of the dbt Labs Professional Services team

Why do people cherry pick into upper branches?