metabase · audit · data-quality

7 Metabase audits later, the same problems keep showing up

After auditing seven Metabase setups for SaaS companies, the same patterns keep showing up: dead dashboards, duplicate metrics, joins that quietly inflate revenue. Here's what I look for first.

·10 min read·Nick Valiotti

The first thing I do when a client hands me access to their Metabase is sort the collections by "last viewed". I do not look at the home page, I do not click the most popular dashboards. I just sort, and then I scroll.

By the time I hit the bottom of the list, I usually know about 70% of what I need to write in the audit report. The rest is just naming the patterns and putting numbers next to them.

I have done this seven times now for SaaS companies between Series A and Series C, anywhere from 80 to 600 dashboards each. Different industries, different teams, different stacks underneath. And almost the same Metabase. Below are the seven things I keep finding, in roughly the order I find them.

1. Most dashboards are dead, and nobody wants to be the one who deletes them

Every single instance I have audited had between 60% and 80% of dashboards that nobody had opened in the last 90 days. The record so far is one B2B SaaS where 84% of dashboards had not been viewed in three months, and the most recent view on the rest was overwhelmingly the person who built it.

The interesting part is not that they exist. Dead dashboards are normal output of any company that ships analytics over years. The interesting part is the conversation when I show this number to the team.

Someone always says "yeah but we look at that one during the QBR". Someone else says "the CEO might still open it". Then somebody mentions a dashboard nobody has opened in 14 months and there is a long pause where it becomes obvious that not even the person who asked for it remembers what is on it.

The fix is boring. Archive everything not viewed in 90 days, write a one-line policy that anything in archive longer than 30 days gets deleted, and put a quarterly review on someone's calendar. (Metabase's audit logs tell you exactly what was opened and when, if you are on Pro.) The hard part is getting agreement that "nobody opened this for a year" is enough evidence to act, not "but somebody might".

If you only do one thing after reading this post, do this. It is the single highest-signal cleanup you can run. And it makes everything else easier, because you stop tripping over zombies while looking for the actually broken stuff.

2. The same metric exists in five different places, with five different numbers

Pick any SaaS metric you can think of: MRR, active users, churn, signups by source. In every audit I have done, that metric existed in at least three saved questions and at least two dashboards, and at least two of them disagreed with each other.

The biggest gap I have measured was on MRR. The finance dashboard showed one number, the growth dashboard showed a number 11% higher, and the "executive overview" pulled a third one that was about halfway between. All three were technically "correct" given how each query was written. None of them matched the Stripe dashboard, which was the actual source of truth.

The reason this happens is straightforward. Someone needs the metric, the existing version is in a collection they cannot see or do not know about, so they write their own. Three months later, somebody else does the same thing. There is no audit trail, no naming convention that surfaces "hey, MRR already exists", no review step. Metabase's Models feature is the official answer to this; in practice it works only if a single person owns the model layer and the rest of the team agrees to query through it.

3. "Active user" means six different things and nobody talks about it

Adjacent to the duplication problem, but worse. Same word, different definitions, all in production.

In one recent audit I found:

  • "Active user" = logged in in the last 30 days (product team)
  • "Active user" = had a session in the last 7 days (growth team)
  • "Active user" = performed any of three core actions in the last 30 days (product team again, different dashboard)
  • "Active user" = paid customer with at least one seat used in the billing period (finance)
  • "Active user" = email opens in the last 30 days (lifecycle)
  • "Active user" = anyone in the users table where deleted_at IS NULL (an old dashboard nobody had touched in two years, but it was still feeding a Slack alert)

All six were correct in their original context. The problem is they all show up in conversation as just "active users", and people quote each other's numbers without checking which definition they came from. The CFO sees one number on Monday and a different one on Wednesday and starts to lose trust.

The fix is a tiny glossary, anywhere, that gives each version a unique name. "Product MAU", "Billed actives", "Email engaged". You do not need a metrics layer for this. You need three sentences and the discipline to rename existing cards.

4. The wrong join, every single time

This is the one that costs money. Every Metabase setup I have audited has had at least one card with a join that quietly inflates the answer.

The classic version looks like this:

SELECT u.id, COUNT(o.id) AS orders, SUM(o.amount) AS revenue
FROM users u
LEFT JOIN orders o ON o.user_email = u.email
GROUP BY u.id;

Looks fine. Reads fine. Runs fast. But users.email is rarely unique in practice (deleted users with reused emails, sales-assist signups, fake test accounts), and orders.user_email was populated long before the email-uniqueness constraint was added. So this query duplicates orders for any user whose email was reused, and the revenue number is wrong.

In the worst case I have seen, this exact pattern was inflating a "revenue by acquisition channel" dashboard by about 6%. The growth team was using it to argue for more spend on a particular paid channel. The actual numbers, once we joined on user_id, looked completely different and the channel that was "winning" was the one losing.

Other variations I keep seeing:

  • LEFT JOIN to a table that should be a LEFT JOIN ... LIMIT 1 (latest event), pulling every event row instead of the latest, which inflates everything downstream
  • JOIN on a nullable foreign key, silently dropping rows
  • JOIN to a slowly-changing dimension table without a date filter, so historical rows get current dimension values

None of these will throw an error. None of these will look obviously wrong on the dashboard. The query just returns a number, and the number gets used in a board deck.

When I do an audit, I pick the 20 most-viewed cards and read every single SQL statement. About half the time, at least one of them has a join issue that materially changes the answer.

5. Hardcoded dates in production SQL

Every audit, without exception, has had at least one dashboard that filters by something like:

WHERE created_at >= '2024-01-01'

Sometimes it is in a question that was built for a one-off analysis and got promoted into a "real" dashboard without anyone updating the filter. Sometimes it is in a question that was specifically about Q1 2024 and got copied into a new dashboard called "monthly performance" without anyone changing the date. Sometimes the date is hardcoded inside a CTE, three levels deep, where the dashboard filter does not even reach.

The symptom is always the same: the dashboard "stops working" at some point, by which the team means "the numbers are still updating, but they look weird and nobody knows why".

The deeper problem is that Metabase makes it easy to accept a date filter parameter at the dashboard level, but inside SQL questions you have to remember to use template variables ({{date}}) and [[ ... ]] optional clauses. People skip this because the question runs fine without it during development. It is only when somebody puts it on a dashboard with a date picker that the mismatch shows up, and even then, often nobody notices for months.

6. The one slow card that takes the whole instance down

Every Metabase I have audited has had at least one query that scans tens of millions of rows on every refresh. Usually it is a card on a dashboard that auto-refreshes every 5 minutes. Usually it is doing a cross join, or a window function over an unindexed column, or a text search across a giant log table.

In one audit, I found a single card that was running a LIKE '%error%' against a 400M-row events table, on a 5-minute refresh, on a dashboard that nobody had opened in 60 days. It was costing about $1,200 a month in Snowflake credits. The card was originally built for an incident in 2023 and just never got turned off.

Cards like this hurt twice. They blow up your warehouse bill, and they make the whole Metabase feel slow because heavy queries hold connections and queue up behind everyone else's clicks.

The audit pattern here is to pull query duration stats from the Metabase application database (or, on Cloud, from the admin tools panel), sort by P95 duration descending, and look at the top 20. About a third of those will be on dashboards that are dead anyway. Another third will be salvageable with an index or a date filter. The last third are usually doing something genuinely expensive that needs a different approach (a materialized view, a dbt model, a sampled version).

7. "Final report v3 NEW (use this one)"

This is the smallest pattern but the one that tells you the most about the team. When I see a collection with names like:

  • "Marketing Dashboard"
  • "Marketing Dashboard v2"
  • "Marketing Dashboard - new"
  • "Marketing Dashboard (final)"
  • "Marketing Dashboard FY25"
  • "Mkt Dash USE THIS ONE"

I already know what the rest of the audit is going to look like. There is no naming convention, no archival policy, no ownership, no review cadence. Whoever built each one is gone or moved teams. The "use this one" version is two months old, and somewhere in the next quarter someone will build a v6 because they cannot find v5.

The fix is mostly social. A naming convention (we use [Team] [Domain] [Frequency], like "Growth - Acquisition - Weekly"), an owner field on every dashboard that has more than 5 viewers, and an archive collection that is the only place stale stuff lives. None of this is technical. All of it requires somebody to care enough to enforce it for a quarter.

What I do about all of this

The first version of every audit report I delivered was a 30-page PDF with screenshots and SQL snippets and a prioritized cleanup plan. The clients liked it, paid for it, and then mostly did not act on it, because the gap between "here is the problem" and "here is the cleaned-up state" is months of someone's time.

That is most of why MetaLens exists. The patterns above are real, repetitive, and detectable from the Metabase API. You do not actually need a fractional CDO to find dead dashboards, duplicate metrics, hardcoded dates, and slow cards. You need someone (or something) to read the API consistently, flag the issues, and put them in front of the team in a format that makes action obvious.

If you are running Metabase and any of the seven patterns above made you wince, you have at least three of them right now. Probably five. Connect MetaLens to your instance, give it 5 minutes, and you will get a numbered list back. Then it is just a matter of which ones you want to fix first.

Most teams pick the dead-dashboard cleanup. It is the easiest win, and it makes everything else easier to see.

Connect your Metabase to MetaLens

Free plan, no card.

Start free

More posts

Audit your Metabase free →