Data Quality

The NAICS Problem Is a Symptom — Your Business Data Has the Same Disease

A utilities client's system had wrong business classification codes on thousands of accounts. The data was entered once and never validated. This is a fixable problem — and the same pattern appears in every industry.

Jesse Myers

24 Jun 2026 • 2 min read

A utilities company asked us a pointed question recently: how do we know that the business classification codes in our system are actually correct?

The honest answer is they don't. And in most large organizations, neither does anyone else.

NAICS codes — the six-digit industry classification codes used across government and industry — are self-reported. Businesses declare them when registering for federal contracts, filing with the SEC, applying for healthcare licenses, or registering with state agencies. Nobody audits them. Nobody reconciles them. And when a business changes — new tenant in a building, new primary service line, acquisition — the code often doesn't get updated.

For this client, the stakes were concrete. They use NAICS codes to prioritize power restoration after outages. The wrong code on a building means the wrong priority. A restaurant classified as an urgent care gets power before a fire station that should have gone first. It's a data quality problem with operational consequences.

The pattern is everywhere

We see this same problem across every industry we work in. Healthcare — patient records with outdated provider classifications affecting billing and routing. Financial services — customer industry codes used for risk modeling that were set at account opening and never revisited. Manufacturing — supplier classifications that determine procurement policies, maintained by whoever entered them five years ago.

The data exists. It's just stale, inconsistent, and nobody owns the ongoing reconciliation.

What we built

We built a classification tool that cross-references seven public data sources simultaneously — SEC EDGAR, SAM.gov, NPPES (the CMS healthcare provider registry), USASpending federal contracts, OpenStreetMap, CMS Care Compare, and a web scrape fallback — and uses Claude to synthesize a confidence-rated NAICS code recommendation.

When multiple sources agree, confidence is High. When only a web scrape confirms, it's Medium. When nothing is found, it's Low and the user knows to verify manually. Every result includes a source citation so the CSR understands exactly why the tool returned what it returned.

The strategic lesson

The tool itself took an afternoon. What it represents took longer to understand: classification and credentialing data problems are almost always solvable at low cost by triangulating across public sources that already exist. The data is out there. Most organizations just aren't systematically using it to validate what's in their own systems.

If your organization uses any kind of business classification data to drive decisions — customer segmentation, risk models, operational routing, compliance workflows — it's worth asking when that data was last validated and against what source. The answer is usually "when it was entered" and "nothing."

That's fixable. The public registries are there. The AI synthesis layer is cheap and accurate. The gap is usually just connecting them.

Safire Business Services provides technology consulting and vCIO advisory services for businesses that need enterprise-grade thinking without enterprise-grade overhead. Based in Edmond, Oklahoma.

For the technical architecture: Building an AI NAICS Lookup Tool: Seven Sources, One Answer by Jesse Myers.

For the business case: We Built an AI-Powered NAICS Code Lookup Tool in an Afternoon on the 2057 Holdings blog.

Noevant on how tools like this fit into an AI operational stack: One Afternoon, Seven APIs, One Answer