Opinion & Analysis

How to Assess Data Quality for AI: A 3-Step Framework for Unstructured Data

Written by: Raj Debnath | Director, Data & AI Governance and Quality at Manulife, Sylvia Lu | Manager of Data Governance and Quality at Manulife

Updated 2:00 PM UTC, April 7, 2026

As AI advances at breakneck speed, AI-ready data is now mission-critical. AI leaders like Manulife rely on clean, prepared, and well-governed data to deliver reliable and trustworthy outcomes from their AI initiatives. This makes data quality a critical component of any AI initiative. Poor data quality can undermine trust in AI outputs, introduce bias into the solution and jeopardize regulatory compliance.

What exactly does it mean for the data to be AI-ready, and how can we measure its quality? Traditional data quality management framework works with structured data that is stored in tables with rows and columns.

However, the vast and growing volumes of unstructured data in the form of images, textual data, audio, JSON/XML/HTML files, and programming codes, representing scanned business documents, call recordings, transcripts, log files constitute the majority of the data used by organizations. AI solutions make use of this kind of data on an increasing scale.

This expansive data landscape requires a new strategy and framework for data quality management.

While traditional structured approach to data quality testing is very limiting on unstructured data, assessing the data against the below criteria achieves similar objectives and helps determine if the data is fit-for-purpose and AI ready:

Reliability or trustworthiness of the data sources: Trusted and verified sources with consistent track record as well as secure and controlled channels through which data is captured can help improve the reliability factor.
Searchable, clearly labeled files with technical (e.g., file name, author, created date, etc.) and business metadata (e.g., policy number associated with scanned documents) can improve explainability and increase confidence in AI results.
Usability of the data files refers to how easily the data can be read, understood, and utilized, for its intended purpose. For example, high-resolution images with minimal markups and scanned documents that adhere to expected structures and fields such as standard business forms, demonstrate good usability.

These three assessment dimensions are the building blocks of our Data Quality Framework for Unstructured Data. One of the key practical applications of this framework is how Manulife adheres to our Responsible AI Principles and performs AI/ML model governance and validation.

Across industries, data quality management for AI model inputs is evolving, particularly for unstructured data. By introducing this new framework alongside existing structured data quality measures, Manulife can provide a more comprehensive approach to identifying and mitigating data-related risks in AI/ML models. Let me break it down into three simple steps:

Step 1: Based on the AI use case, we evaluate how data quality can affect its performance and error potential. Understanding the model’s sensitivity to data quality issues can help determine the necessary level of scrutiny. For example, models analyzing customer call durations may tolerate minor quality issues, while AI solutions used to detect potential fraud are highly sensitive to data integrity.
Step 2: Informed by Step 1, appropriate level of data quality assessment is performed by the AI development teams and results are documented for each data source.
Step 3: Teams assign a data quality risk rating to each data source, which contributes to the overall model error potential rating and feeds into the model validation and ongoing monitoring requirements.

Applying these steps to assess both structured and unstructured data used in AI development, supports alignment with regulatory requirements and internal risk management frameworks.

Manulife was ranked first in the life insurance sector for AI maturity and among the top five insurers overall in the Evident AI Index for Insurance. We demonstrate our commitment to innovation and customer-centric solutions, underpinned by our Responsible AI Principles, which ensure solutions are ethical, transparent, and accountable.

With the application of this framework in AI/ML model risk management, we aim to identify targeted actions and controls to strengthen AI/ML model governance and improve on the reliability of AI models by laying the groundwork with AI-ready data.

About the Author:

A recognized leader in Data & AI Governance, Raj Debnath creates and implements enterprise-wide governance strategy in data and AI ecosystem. For over two decades he worked globally and across industries to help build analytics and AI-based insights, improve operational efficiency, and achieve regulatory compliance. As Manulife’s Global Data Quality Lead, he currently helps data offices across the world develop AI-ready data foundations and provides thought leadership through development of frameworks and POVs.

Sylvia Lu is a data governance and quality leader focused on advancing enterprise data and AI governance. She specializes in translating governance and regulatory expectations into practical operating models that enable measurable business value and responsible AI adoption. In her current role as Manager of Data Governance and Quality at Manulife, Lu works across technology and governance teams to strengthen data excellence and support confident, well-governed decision-making.