Data Governance Remediation for a Licensed Insurer
Regulatory notices concentrate the mind. When a licensed insurance company received a formal notice following a routine audit — three material findings, six months to remediate, with a re-examination scheduled — they needed to move fast and get it right. Nematix was engaged to design and implement a data governance framework that would close all three findings and establish a sustainable governance operating model.
The outcome: audit passed with no material findings. Data lineage traceable end-to-end across all fourteen systems. Two point three million records deleted that exceeded retention policy. A data governance committee established with Board-level reporting.
The Situation
The insurer had grown significantly over the previous decade, partly through organic growth and partly through two acquisitions that brought additional product lines and, with them, additional legacy systems. The result was a data estate of fourteen systems — policy administration, claims management, actuarial modelling, CRM, financial reporting, and several acquired platforms that had never been fully integrated.
Each system had been implemented with its own data model, its own access controls, and its own approach to data retention. There was no unified data catalogue, no consistent taxonomy for sensitive data classification, and no technical enforcement of retention policies — only a document on SharePoint that described what should happen.
The regulatory notice identified three specific findings:
- Consent basis not demonstrable — the insurer was using customer personal data in underwriting models but could not produce evidence that customers had consented to this specific use at the time of collection
- Personal data in non-production environments — a penetration test had found live customer names, IC numbers, and policy details in a development database used for testing
- Retention policies not technically enforced — the insurer’s stated data retention policy was seven years, but analysis showed records significantly older than that in active systems, with no automated mechanism for deletion
The six-month timeline to remediate was the regulator’s deadline, not a negotiating position.
The Challenge
The technical challenges were significant, but the organisational ones were equally important.
Data discovery at scale. The fourteen systems contained an estimated 40–60 million records. Identifying which records contained personal data, classifying them by sensitivity, and understanding their provenance required automated tooling — manual discovery at this scale was not feasible within the timeline.
Consent archaeology. The insurer had been collecting customer data since the early 2000s. Consent mechanisms had changed multiple times over that period — paper forms, web checkboxes, call centre verbal consent — and records of what had been consented to were inconsistent and sometimes absent. Determining the legal basis for current data use required working backwards through the consent history for each data category.
The legacy policy admin system. The most sensitive data — policy details, medical disclosures, beneficiary information — lived in a vendor-managed policy administration system that the insurer accessed but did not operate. Getting the vendor to implement masking in non-production environments required negotiation, formal change requests, and testing that consumed significant calendar time.
No governance owner. There was no Chief Data Officer, no data governance function, and no single person who felt responsible for the data estate as a whole. Every remediation action required cross-functional coordination that had no existing mechanism.
Our Approach
The engagement was structured in four phases timed against the six-month deadline.
Weeks 1–4: Data discovery and classification
We deployed automated data discovery tooling across all fourteen systems, using pattern matching to identify personal data fields (IC numbers, passport numbers, phone numbers, email addresses, financial account references) and classify them by sensitivity tier. The output was a data inventory: every system, every table, every column containing personal data, with a sensitivity classification and a record count.
The discovery surfaced several findings the insurer was unaware of: a decommissioned claims system that was still running and accessible (containing 1.8 million historical records), and personal data in eleven internal Confluence pages used for process documentation.
Weeks 5–8: Consent mapping
Working with the insurer’s legal team and actuarial division, we mapped every data element used in underwriting models to its consent basis. For data collected under the current consent framework, consent records were retrievable and auditable. For older data collected under previous frameworks, we assessed whether continued use was permissible under the applicable data protection legislation’s legitimate interests provisions.
The output was a consent lineage document: for each underwriting model input variable, a documented consent basis and the evidence supporting it.
Weeks 9–16: Technical remediation
Three parallel workstreams:
Non-production masking: All fourteen systems’ non-production environments were remediated. For internally managed systems, we implemented data masking pipelines that anonymised personal data before it was copied to development or test environments. For the vendor-managed policy administration system, we worked with the vendor to implement their native masking tool and verified the output before sign-off.
Retention enforcement: We implemented automated data retention jobs for each system — scheduled processes that identified records exceeding the retention threshold and either archived them to cold storage (for records with ongoing regulatory hold requirements) or deleted them. The first run of the deletion jobs removed 2.3 million records that had exceeded the seven-year retention policy.
Consent audit trail: A centralised consent management service was built and integrated with all customer-facing systems. New consent interactions were logged with a timestamp, channel, specific consent granted, and the consent text version presented to the customer. Historical consent records were migrated and normalised into the same schema.
Weeks 17–24: Governance operating model
Technology alone doesn’t maintain governance — the organisation needed to. We designed and stood up a governance operating model: a Data Stewardship structure with a named Data Steward per business unit, a Data Governance Committee meeting quarterly with Board-level representation, and a governance monitoring dashboard tracking data quality, consent coverage, and retention compliance metrics.
Staff training was delivered to 340 employees across policy, claims, IT, and actuarial teams, covering their specific data handling responsibilities under the new framework.
Outcome
| Metric | Before | After |
|---|---|---|
| Regulatory findings (re-examination) | 3 material | 0 material |
| Data lineage for underwriting models | Not demonstrable | 100% traceable |
| PII in non-production environments | Present in all 14 systems | Masked in all 14 systems |
| Records exceeding retention policy | Unknown | 0 (2.3M deleted) |
| Consent coverage (active customers) | Unknown | 94% documented |
| Data Stewards appointed | 0 | 8 (one per business unit) |
The re-examination took place on schedule. The regulator accepted the evidence package, found no material findings, and closed all three original issues. The governance operating model established during the remediation has continued to run with the internal team.
Key Takeaways
Automated discovery before manual remediation. The scale of the problem — forty to sixty million records across fourteen systems — made manual discovery impractical. Automated tooling found issues (the decommissioned claims system, the Confluence pages) that a manual approach would likely have missed, and those missed findings could have jeopardised the remediation outcome.
Consent is a historical problem as much as a current one. The insurer’s current consent framework was adequate. The challenge was the historical data collected under earlier, less specific frameworks. Building a consent lineage — working backwards through the evidence for each data category — was the most time-intensive part of the engagement and the most critical for regulatory credibility.
Governance without ownership doesn’t last. Technical remediation that isn’t backed by an organisational structure and accountable roles reverts. The Data Stewardship model and governance committee were not box-ticking exercises — they were the mechanism by which the insurer would remain compliant after Nematix left.
This engagement draws on our Data Intelligence & Analytics services. If your organisation is navigating a data governance or regulatory challenge, let’s talk.