The Data Foundation AI Actually Needs: How Covalo is Building an Industry-Wide PIM for Personal Care

The conversation about AI in product-intensive industries has matured past the question of which model to use. The question now is whether the underlying data is structured, enriched, and interoperable enough for AI to act on reliably. In the personal care and chemical ingredients sector, the answer, for most companies, is still no. In this PIMvendors.com webinar, Stephan Spijkers sits down with Yann Chilvers, co-founder of Covalo, to examine what genuine data readiness looks like at an industry level, and why solving it requires a shared backbone rather than a better internal system.

Covalo started as a search engine for cosmetic ingredients. Over seven years, it has grown into the data infrastructure layer for the personal care industry, connecting more than 6,000 brands, 1,500 ingredient manufacturers, and 35,000 users across 145 countries. The journey from search tool to industry-wide PIM was not planned. It was pulled by a single, persistent customer problem: fragmented, incompatible, and chronically outdated product data that blocked innovation, slowed product launches, and made AI applications unreliable before they had even started.

Speakers:

Stephan Spijkers – Co-Founder, PIMvendors.com

Yann Chilvers – Co-Founder, Covalo

Watch the Full Webinar:

Key Takeaways:

Having data internally is not the same as having data that flows across an industry. Large personal care companies generally have substantial internal data assets. Many have invested significantly in ERP, PLM, and internal PIM systems. Yet even the largest brands in the sector still rely on email exchanges with suppliers to collect ingredient data for every new formulation project. Each brand sends the same requests to the same suppliers independently, generating massive duplication and keeping data perpetually out of sync. The problem is not volume. It is interoperability. When data cannot move reliably between organizations in a standardized format, every company effectively starts from zero on every project, regardless of how organized their internal systems are.

AI readiness requires data enrichment that no single company can produce alone. Clean internal data is a necessary condition for AI reliability, but it is not sufficient. Useful AI agents operating in the personal care space need ingredient data enriched with regulatory context, toxicology assessments, sustainability classifications, and market intelligence. No individual company holds all of that. And AI models trained or prompted on shallow, commercially oriented data deliver outputs that are, as Yann puts it directly, trash out. The BCG research cited in the session places the chemical and personal care industry at the bottom of AI maturity rankings, with a strong correlation between that ranking and the state of shared data infrastructure. The implication is that AI performance in this sector is constrained not by model capability but by the data layer beneath it.

Data governance has become a boardroom priority, driven by cost as much as quality. Organizations across the PIM ecosystem are now discussing data lineage, version control, and structured cleansing in conversations where those terms rarely appeared 18 months ago. The forcing function is not philosophical. Companies that ran large models across poorly governed, version-cluttered data sets burned through their full-year AI budgets in the opening months of 2026 with little to show for it. Running an advanced model over a file named “PIM vendor workshop webinar version 5.6” that no one will reference again is expensive, environmentally costly, and produces no value. The discipline of knowing what data exists, what is current, and what should be retired is no longer an IT concern. It is a cost center that reports to the CFO.

Vertical AI solutions built on deep domain data consistently outperform horizontal general-purpose alternatives. Covalo deploys more than 20 times faster than a conventional PIM implementation. The reason is not superior engineering. It is focus. A vertically specialized solution does not need to accommodate every possible data model or industry configuration. It knows the ontology, the regulatory environment, the supplier relationships, and the formulation workflows of its specific domain. That specificity also means that every new customer strengthens the network rather than simply adding an account. General-purpose platforms optimize for modularity and breadth. Vertical platforms optimize for depth, speed, and the network effects that make each participant’s data more valuable than it would be in isolation. For AI specifically, narrow models trained on high-quality domain data outperform large general models on domain-specific tasks by a consistent and widening margin.

Speed to market in personal care is an existential variable, and the data infrastructure determines it. Bringing a personal care product from concept to shelf currently takes between one and five years. Over half of launches fail. The primary causes are consistent: failure to adapt fast enough to new regulation, failure to respond to shifting consumer demand, and failure to pivot when supply chains are disrupted. Incoming EU regulation, including the Green Deal and a range of chemical compliance frameworks, means that an estimated 80% of existing products will require reformulation before 2030. Companies that do not have the data infrastructure to identify affected formulations, qualify alternative ingredients, and run regulatory checks rapidly will not meet that timeline. The organizations that started building that infrastructure in 2024 and 2025 have a compounding advantage. Those that wait until the deadline is visible will not have time to close the gap.

Network effects in data platforms compound when the layer is shared rather than siloed. Covalo’s marketplace is free to list on and free to use, a deliberate structural decision rather than a growth tactic. Suppliers who share ingredient data see more qualified inbound requests from brands. That commercial signal incentivizes richer, more accurate data contributions. Richer data attracts more brands and formulators. More traffic and analytics flow back to suppliers, reinforcing the cycle. The closed, proprietary alternative produces the opposite dynamic: every company maintains its own version of shared data, every update requires bilateral communication, and the collective cost of that duplication is carried invisibly by every participant in the industry. Covalo’s 3x growth over the past two years is a function of the network effect reaching a self-reinforcing threshold, not a marketing push.

The shift from search tool to industry data backbone was driven by customer pain, not product vision. Covalo did not begin with a plan to become an industry-wide PIM layer. Customers raised data fragmentation as the primary blocker to every other initiative, including AI pilots, sustainability assessments, and supply chain resilience projects. A grant-funded recommendation engine for sustainable ingredient alternatives failed not because the technology was inadequate but because the standardized, deep data required to run the assessments reliably did not exist. The product roadmap followed the problem. That sequence matters for any organization evaluating vertical data infrastructure: the signal that a shared data layer is necessary is not a vendor pitch. It is the moment when every internal initiative stalls at the same point.

The architecture question is no longer internal versus external. It is whether your data can move at all. Stephan closes the session with a framing that applies beyond personal care. Organizations building AI strategies today are working across multiple layers simultaneously: the product data itself, the relationships and context around it, the APIs and MCP connections that make it accessible to agents, and the IT architecture that determines whether any of that is viable at scale. Most companies are still operating at layer one, improving descriptions and filling attributes. The organizations that will be structurally advantaged in three to five years are the ones investing now in the layers that determine whether their data can reach the agents, systems, and partners that will need it.

👉 Wondering whether your product data architecture is ready for what comes next? Compare PIM solutions and book a call with our team at pimvendors.com