Cleaning Data: Why AI Isn't a Magic Fix

Most organizations bought AI to fix their data problems this year. What they got instead was a faster, wider view of how bad those problems already were. In this PIMvendors.com webinar, co-founder Stephan Spijkers sits down with Susan Walsh, known across the data world as the Classification Guru, to break down where AI is earning its keep in product and business data work, where it is quietly making a mess bigger, and why the fix still runs through people, not prompts.

Speakers
Stephan Spijkers – Co-Founder, PIMvendors.com

Susan Walsh – Founder, The Classification Guru Ltd

Nine years of hands-on data classification and cleansing work for more than 100 clients worldwide, author of two books on fixing dirty data.

AI amplifies the mess you already have
Susan’s client workload has grown since AI adoption picked up, and she traces it directly to the technology: AI does not distinguish clean data from dirty data, it just processes faster and spreads the result further. A company with inconsistent product naming or mismatched units does not get that fixed by AI. It gets that inconsistency multiplied across every downstream system that touches it.

No single model wins every job
Susan pushes back on the idea of picking one AI tool and using it for everything. GPT, Claude, and Google’s models each suit different tasks, and the right choice depends on the data and the job at hand. She uses AI to refine ideas and draft reports, but will not let it run unsupervised as an agent, citing Ford’s decision to rehire roughly a thousand staff after AI-driven automation fell short.

Pattern matching is not context
The clearest example from the conversation: Susan’s team built an AI module to normalize supplier names, and it merged three unrelated businesses (a hairdresser, a cleaning company, and a law firm, all coincidentally named Walsh) into a single group. The model saw a repeated word and assumed a match. At volume, across a spreadsheet of a million rows, an error like that is nearly impossible to catch by eye.

The fix is documentation, not a one-time cleanup
Asked where she would spend limited AI investment, Susan chose documentation over a single cleaning pass. Technology cycles through every few years, but clear rules for how data gets entered, formatted, and labeled hold up regardless of which tool a company is using. Clean data without documentation drifts back to dirty within months.

Where AI actually earns its place
Both speakers agree AI performs well on checkable, well-defined tasks: spreadsheet formulas, lookups, pattern detection across large datasets. It performs worse on subjective, contextual judgment calls, the kind of classification work that still depends on a person who understands the business behind the data.

Start with a pivot table, not a platform
Susan’s practical starting point for any team beginning this work: pull product data into a pivot table, sort by description against product code, and look for inconsistencies a person would never spot in a raw dataset but that jump out once the data is organized. Prioritize by business impact, not by what is easiest to fix first.

Susan Walsh will bring more of this thinking to Product Content Europe on October 6 in Utrecht, the product content event PIMvendors.com runs in partnership with EMM Consultancy. Early bird tickets are open at productcontenteurope.com through July 15.

👉 See how PIMvendors.com helps teams build the data foundation AI actually needs: pimvendors.com

Cleaning Data: Why AI Isn’t a Magic Fix