We Replaced ETL with MCP

11 points by rashidae a day ago

I have asked AI on multiple occasions to take items from some input and output a table, or a json structure and every time it has simply skipped or ignored several items from the input for no reason.

This sounds like a terrible idea, and nearly impossible to debug when it inevitably drops data.

rashidae 2 hours ago

Yeah, we’ve seen that too. Raw AI output isn’t reliable enough for high-stakes data work.
That’s exactly why we don’t let AI run migrations. We use it to speed up the boring parts, like mapping table structures. But humans are always in control.

SolubleSnake 3 hours ago

As others have mentioned this is an extremely odd thing to expect to work....

I'll give an example. I worked for a FTSE 100 company using a very old Product Lifecycle Management system (model manager - based actually on pre-DOS technology)....we had to upgrade it to a new fancy one.

Therefore we had to migrate all data relating to the company, and group companies engineering designs...everything to do with 2D drawings, 3D designs...any important connections etc....all electrical designs....excel sheets related to these containing lists of PCBs and their component parts in Bills Of Materials etc...There is absolutely no way in hell I would trust AI with almost any of that, to get it right....or even to attempt a load without almost immediately erroring.

rashidae 2 hours ago

Totally agree. We wouldn’t trust AI to run that kind of migration either... And we don’t.
But here’s what we do use AI for: • Mapping legacy schemas • Spotting patterns • Generating boilerplate ETL code fast
Then humans step in: • Validate every mapping • Write custom logic for edge cases • Test everything... every field, every BOM, every relationship • Migrate with deterministic, human-reviewed code

rooftopzen 16 hours ago

You naively replaced deterministic process w probabilistic process - following a trend that is uneducated.

I am taking screenshots of blogposts like this for a museum exhibit opening next year - lmk if you’re willing.

rashidae 2 hours ago

We're not replacing deterministic processes with probabilistic ones, that would be insane for production data.
Here's what actually happens:
1. MCP exposes system schemas in a standardized way 2. AI analyzes the schemas and suggests mappings 3. Engineers review and validate every mapping 4. AI generates deterministic integration code (think: writing the SQL, not running it) 5. We test with real data before any production deployment

nickphx 15 hours ago

That's a bold move. Hopefully there are no stray cats.

rashidae a day ago

We used to spend 40–80 hours writing and maintaining brittle ETL code for every integration. Now we spend 4–8 hours deploying MCP (Model Context Protocol) interfaces and letting AI handle the rest. No hardcoded pipelines.

criticalfault 17 hours ago

Can you give some more info on the results?
Meaning, correctness, completeness, etc...
Would you use it for e.g. tax information? Because if wrong, you could get fined.
- rashidae 2 hours ago
  
  We're using AI to write the boring integration code that moves data from System A to System B. The actual data processing is deterministic code that's tested like any critical system.
  Correctness: 100% schema mapping accuracy after human validation. We've never had a data type mismatch or field misalignment make it to production. The AI suggests mappings at ~85% accuracy, humans catch and correct the remaining 15%.
  Completeness: Zero data loss incidents. We run reconciliation reports comparing source record counts to destination. Any discrepancy fails deployment. Most common issue: the AI initially missing compound key relationships, which we catch in testing.
  Tax/Financial Data: Yes, we handle financial data for several clients, including:
  QuickBooks to data warehouse pipelines (invoice/payment data)
  Payroll system integrations
  Revenue reconciliation between CRM and accounting
  Our approach for sensitive data:
  AI generates the integration logic, never sees actual records
  Test with synthetic data matching production schemas
  Run parallel processing for 1-2 cycles to verify accuracy
  Maintain full audit logs of all transformations
  Human sign-off required before production cutover