How I Fixed a Two-Year Data Bug That Was Costing Thousands Every Month

Saturday, Apr 4, 2026 | 4 minute read | Updated at Saturday, Apr 4, 2026

@

Some bugs sit in the backlog for months. This one had been sitting there for two years.

Discovering the Problem

When I started digging into our data pipelines, one thing kept coming up in conversations with the operations team. Clients on NHS programmes were failing to enrol properly, and advisors were spending 15 to 30 minutes manually fixing each one. Over 60 clients a month. The maths on that is grim.

The setup was this: Microsoft Bookings handled appointment scheduling. Session data got passed from Bookings into Azure Cosmos DB via an Azure Service Bus. That Cosmos data was what everything downstream used to track a client’s programme journey.

The problem was that the Service Bus had been configured without adequate capacity. Under load it was dropping messages. Not delaying them. Dropping them. So Cosmos DB would end up with incomplete referral records. A client might have nine sessions written when they needed all thirteen to complete their NHS programme pathway. Incomplete data meant failed enrolment. Failed enrolment meant the organisation was not getting paid for that referral.

Two years. Thousands of pounds a month. Ongoing leadership escalations. And the fix was 15 to 30 minutes of manual advisor time per client, every single time.

Thinking Through the Options

The obvious fix would have been the root cause: the Service Bus capacity. But that sat within a third-party integration layer outside my remit. I could not touch it.

So the question became: given that messages are going to keep being dropped, how do I detect when it happens and correct the data automatically?

My first instinct was to reconstruct the missing sessions from scratch. Hardcode the expected session types and values, fill the gaps. I ruled that out quickly. Session structures vary between services and programmes. Hardcoding would have been brittle and needed constant maintenance as things evolved.

Then I thought about it differently. If a referral is incomplete, there will almost certainly be another referral on the same service that is complete. Why not use that as a source of truth?

That was the insight that made everything work.

The Solution

I built two Power Automate flows packaged as a managed Dataverse solution.

Detection Flow runs on a daily schedule. It queries Cosmos for referrals in FirstSession status with fewer than 14 booking records. All NHS programme referrals should have exactly 13 sessions plus an initial assessment, so anything below that threshold gets flagged. When it finds affected records it triggers the fix flow for each one.

Fix Flow is where it gets interesting. Rather than reconstructing sessions from scratch, the flow queries Cosmos for a reference client on the same service with a full set of 13 sessions. It diffs the two bookings arrays, identifies exactly which session types are missing, borrows the structural data from the reference client, and constructs the missing session objects from that.

The tricky part was ordering. Cosmos does not guarantee array order, but the downstream flows expected sessions in programme sequence. Just merging the arrays would append missing sessions at the end and break everything downstream.

I built a SessionOrderMap as a Compose action, a static key-value object mapping each sessionType to its correct position in the sequence. Tagged each booking with a sort index, ran the sort function on the tagged array, then stripped the temporary index before writing back to Cosmos. The array lands clean, correctly ordered, ready for whatever comes next.

On top of that the flow writes an audit record to Dataverse on every fix, a permanent log of what was corrected and when. A Teams notification goes to the operations team after each run. Dataverse session lookup fields get updated to keep the CRM in sync.

The Result

185 backlogged referrals cleared on the first run.

The manual 15 to 30 minutes per client is gone. Daily detection means future affected records get caught and fixed automatically. Leadership escalations stopped. The operations team gets a Teams notification every morning so they have full visibility without ever needing to open Cosmos.

The Service Bus root cause is still there. The right long-term fix is increasing throughput tiers or implementing dead-letter queue handling so messages are retained on failure rather than dropped. But the data is clean, the revenue is protected, and the advisors got their time back.

© 2026 Ibrahim Stephenson

This Site

This is where I share blog posts, projects, and things I’m thinking about. Mostly around Power Platform, but occasionally beyond it.

About Me

I’m Ibrahim, a Power Platform Developer based in London.

This is my corner of the internet. Somewhere to write about the things I’m building, the problems I’m solving, and whatever else is on my mind.

Currently drowning in backlog.

Sole internal Power Platform developer at a UK health and wellbeing charity, building and maintaining the Dynamics 365 systems that run NHS health programmes. Always down to talk about whatever, whenever.

If you are looking for something to read and want to hear my opinion (that no one asked for) on the books i/ve read, then take a look at my library.

Unfortuntely my real life Library is much smaller since i keep giving away books after i/ve read them, but maybe that/s my love language.

Powercademy

I contribute to and edit Power Platform content for Powercademy across YouTube and LinkedIn.

Being part of the team has genuinely broadened my perspective, new features, implementations, and ways of using Power Apps I would never have discovered on my own. If you work in the Power Platform space, it is worth a subscribe.

Library

Take a look at my ratings and reviews for all the books i’ve read, animes i’ve watched and mangas i’ve also read…

Browse the full library →

Get in Touch

Feel free to drop me a message below, whether you want to say Hi, discuss a project or simply connect. My inbox is always open.

Social Links