Challenge

Duplicates: The final boss of your IT systems.

Every company has them. Most know it. Nobody tackles it. Duplicates are not the problem, they are the symptom.

The reality

Duplicates don't come from laziness. They come from pragmatism.

At some point someone decided: creating a new record is faster than implementing logic. Thirty seconds instead of a sprint. Problem solved, at least for today.

Meanwhile the interface between shop system and ERP keeps running quietly. Every day. Without a duplicate check. Because the check was not planned when it was set up, because nobody had the budget, and because it "works".

And so customer "Miller GmbH" appears three times. Then ten times. Then nobody knows anymore which one is correct.

1 in 3records in an average company database is a duplicate or outdated

30 sec.is how long it takes to create a new record. That is why it happens.

€0budget was planned for the duplicate check. That is why it stays.

Recognise this?

Four ways duplicates get into your systems.

We see these situations in almost every company that comes to us.

The interface without logic

Shop system and ERP exchange data, but nobody implemented a duplicate check when setting it up. Every order from an existing customer creates a new record. Every day. Automatically.

The result: Thousands of duplicates are created without any human involvement, and nobody notices.

The dummy customer

Debtor number 12345. Or "Walk-in customer". Thousands of transactions sitting under one single catch-all record because it was faster than proper customer assignment. Fine for accounting back then. For every integration today: a nightmare.

The result: No clean customer assignment, no working reporting, no personalisation.

The data migration nobody checked

System migration five years ago. Old data was imported without cleansing, without deduplication, because there was no time. What was called "we'll clean that up later" is now the foundation.

The result: Every new import layers on top of the old chaos. The "later" never comes.

Multiple systems, no single source of truth

CRM, ERP, shop system, accounting, each maintains its own customer list. The same company is called "Miller GmbH" in the CRM, "Miller" in the ERP and "millerGmbH" in the shop.

The result: No interface finds the match. Every integration fails on the same edge cases.

The typical mistake

Clean once. Then it happens again.

Many companies eventually react. They commission a cleansing project, invest weeks, remove duplicates. And six months later the next pile has grown.

Because the root cause was never fixed.

The interface without a duplicate check keeps running. The process of quickly creating a new record is still the default. Nobody has taken responsibility for data quality. There is no monitoring that flags new duplicates.

A one-time cleanse is like pulling weeds without removing the roots. Looks tidy for a moment. Then it grows back.

The MacNorris approach

Cleaning duplicates is step two. Step one is understanding why they appear.

We do not start with the cleanse. We start by understanding where duplicates are created in your system and why.

Which interfaces run without a duplicate check? Which manual processes systematically produce new entries? Are there dummy records that are actually symptoms of a process problem? Who is responsible for data quality, and if nobody is: why not?

Only when we know the causes do we cleanse. And at the same time we close the channels through which new duplicates are created every day. Not as a multi-year data governance project, but as a concrete intervention that has immediate impact.

The goal is not a database that is clean once. The goal is a database that stays clean.

From practice

18,000 duplicates. Four weeks. Never again.

An e-commerce company wants to introduce AI in customer service. First look at the customer database: 43,000 records, 18,000 of them duplicates. Created over four years through a shop system ERP interface without a duplicate check. Plus a dummy customer account with 11,000 transactions attached.

The AI project becomes a data project. But this time done properly.

Root cause analysis in week 1: three duplicate sources identified
Interface retrofitted with duplicate logic
Cleanse: 18,000 duplicates removed in four weeks
Monitoring set up: new duplicates flagged daily
Dummy account dissolved, transactions correctly assigned
AI automation live after six weeks

“We dragged the data along for years. Now we wonder why we didn't start earlier.”

Head of Operations, E-Commerce company

Frequently asked questions

What you usually ask us about duplicates.

Deduplication tools exist. But without prior analysis they merge the wrong records just as often as the right ones. Automation without logic makes the problem worse, it does not solve it.

Not necessarily. Often it is enough to fix the interfaces that produce duplicates. The ERP itself stays untouched.

First visible results in two to four weeks. More important than speed is that the root causes are fixed at the same time, so it does not start all over again after five months.

In most companies nobody is. That is the root of the problem. We help not just with the cleanse but with establishing clear responsibilities and simple processes that keep data quality intact long term.

YOUR DATA IS GROWING. YOUR DATA QUALITY IS NOT.

Tell us briefly what it looks like for you, we will tell you in one conversation where the duplicate sources are.

Let's solve it More about software consulting