Generative Ai
Building Responsible AI Policies for Your Organisation
Dec 01, 2025

Building Responsible AI Policies for Your Organisation

Every organisation deploying GenAI needs a policy. Here are the six decisions it must make, with language that keeps the policy enforceable, not decorative


Most organisations developing an AI policy produce a document that is aspirational rather than operational. It says “we will use AI responsibly” without specifying what that means. It says “we will protect personal data” without identifying which data, which systems, and what controls. It says “human oversight will be maintained” without defining who does that oversight, under what procedure, or what they are authorised to do when they disagree with the system’s output.

A policy written this way is not useless — it signals intent and provides a basis for conversation. But it is not enforceable. When something goes wrong, an aspirational policy cannot tell you who was accountable, what control failed, or what procedure should have been followed. A policy is only as useful as its least specific clause.

The organisations that have built effective AI governance are the ones that treated policy development as a decision-making process, not a writing process. Before the first word is written, six decisions need to be made. The policy then documents those decisions in a way that is specific enough to be tested and enforced.

Decision 1 — Acceptable Use

The acceptable use decision answers three questions: which business processes may use GenAI without further approval, which may not use it under any circumstances, and which may use it subject to a defined additional approval process.

The failure mode in acceptable use policy is excessive generality. “GenAI may be used for productivity enhancement” is a sentence that permits almost anything, and therefore governs nothing. The specification must be at the level of use case category.

A working example of specific acceptable use language:

  • GenAI may be used for drafting internal documents, email composition, and meeting summarisation without further approval.
  • GenAI may be used for drafting external customer communications subject to review and approval by a named senior officer before distribution.
  • GenAI may not be used to generate final credit decisions, risk ratings, or regulatory reports without human review and sign-off by an authorised individual.
  • GenAI use in any customer-facing application — chatbots, virtual assistants, recommendation engines — requires a deployment approval by the Chief Risk Officer before production launch.
  • GenAI may not be used to process confidential counterparty information or material non-public information (MNPI) in any external system.

Notice that each statement names a use case category, states what is permitted or prohibited, and where further approval is required, names who provides it. This is the level of specificity that makes a policy operational.

The acceptable use section should also address shadow AI: employees using personal accounts on consumer LLM platforms (ChatGPT.com, Claude.ai, Gemini) for work purposes. This is widespread and will remain so regardless of what the policy says. The practical approach is to acknowledge it, establish clear rules about what data may not be entered into any consumer AI tool, and provide approved alternatives that meet the security and compliance requirements your people actually need.

Decision 2 — Data Handling

The data handling decision answers: what data may be sent to external LLM APIs, and what may not?

The prohibition list should be defined by data category, not by level of sensitivity in the abstract. Common categories to explicitly prohibit from external LLM API calls without specific additional controls:

  • Personal data of customers, employees, or counterparties without a documented lawful basis, executed data processing agreement with the API provider, and appropriate consent or exemption.
  • Confidential client information — terms of mandates, transaction details, commercial proposals — unless the API provider has executed a binding confidentiality agreement and a data processing agreement prohibiting use of the data for model training.
  • Material non-public information (MNPI): any information that is not public and that could influence investment decisions. The consequences of MNPI being processed through a third-party system are not limited to data protection — they extend to securities law obligations.
  • Classified or government-designated sensitive information, where the organisation handles it.

The permitted data list should be equally specific: publicly available information, internal knowledge base content that has been classified as non-confidential, anonymised or synthetic data that has been validated through a re-identification assessment.

The data handling policy should also address the question of data residency. Most major LLM API providers offer data residency options — processing and storage within a defined region — that may be relevant to regulatory requirements (BNM’s RMiT has data localisation requirements for certain categories of data held by financial institutions) or to organisational risk appetite.

Decision 3 — Human Review Requirements

The human review decision answers: for which categories of AI output is human review mandatory before action is taken?

The default should be broad: any GenAI output that will trigger a financial transaction, be sent to a customer or counterparty, be submitted to a regulator, or be used as the basis for an employment decision requires human review before that action is taken. This is not a performance question — it is a governance question. The human review requirement exists to maintain human accountability for consequential decisions, not because the AI’s output is necessarily wrong.

The documentation challenge is maintaining this requirement under productivity pressure. The most common failure mode is not that human review is absent in the policy — it is that operational teams, facing volume pressure, treat the review requirement as a box to tick rather than a substantive check. The policy should specify not just that review is required but what the reviewer is expected to evaluate.

A worked example for credit-adjacent GenAI applications: the human reviewer must confirm that the output (a) does not contradict the structured model’s decision, (b) does not contain personal data handling that was not covered by the applicable data processing agreement, and (c) does not include content that would violate the organisation’s fair treatment obligations. The reviewer signs off in the audit log with their identity and the timestamp. This is a substantive review, not a rubber stamp.

For use cases with very high volume — customer communications generated by an LLM at scale — individual review of every output is not practical. In these cases, the policy should require review at the workflow level: approval of the prompt template and the system configuration, sampling-based review of outputs against defined quality criteria, and automatic routing of flagged outputs to human review before delivery.

Decision 4 — Vendor Assessment

The vendor assessment decision answers: what must a GenAI vendor demonstrate before the organisation uses their platform?

This applies to LLM API providers (OpenAI, Anthropic, Google, Azure OpenAI), to AI application vendors building on those APIs, and to any vendor providing AI-assisted services to the organisation.

Minimum assessment criteria that should be non-negotiable:

  • Security certification: SOC 2 Type II report (preferred) or ISO 27001 certification, current within the last 12 months. These certifications do not cover AI-specific risks, but they establish a baseline of information security governance.
  • Data processing agreement: a binding contractual agreement covering what the vendor does with data processed through their platform, explicit prohibition on using the organisation’s data to train or fine-tune their models, sub-processor disclosure and controls, breach notification timelines aligned with PDPA requirements (72 hours to the Commissioner, prompt notification to the organisation), and data deletion obligations on contract termination.
  • Model update notification policy: the vendor must commit to providing advance notice before making material changes to their models that could affect the performance or behaviour of the organisation’s AI applications. Unannounced model updates in production systems create operational and compliance risk.
  • Data residency options: where regulatory requirements impose data localisation obligations, the vendor must be able to demonstrate that processing and storage occur within the required jurisdiction.
  • Contractual prohibition on training use: explicit contractual language prohibiting the use of the organisation’s data for model training, supplemented by technical controls where available.

The assessment should be documented. A completed vendor assessment record — not just a signed DPA — should be on file for every AI vendor in active use. This record is the evidence that the organisation performed due diligence if a vendor-related incident occurs.

Decision 5 — Audit Logging

The audit logging decision answers: what must be recorded for every GenAI operation, and for how long?

Minimum logging requirements for every GenAI application in production:

  • Timestamp of the operation
  • Application or system identifier
  • User identifier (hashed or pseudonymised for privacy; the organisation must retain the ability to de-hash if required for a regulatory inquiry or subject access request)
  • Model used and version
  • Input token count and output token count
  • Any tool calls or retrieval operations performed

For regulated use cases — credit decisions, customer communications, regulatory reporting, employment decisions — the log record should additionally include the full prompt as delivered to the model, the full output as generated by the model, any human review action taken (reviewer identity, timestamp, decision), and the retrieval context if RAG was used.

Retention periods should be defined by use case category and aligned with regulatory requirements. BNM’s RMiT requires audit trails for technology operations to be retained for a minimum of three years. Audit logs for AI-assisted credit decisions should be retained for at least as long as the credit relationship plus the limitation period for potential claims — typically six years in a Malaysian context. Do not default to indefinite retention: it creates data minimisation issues under PDPA.

Decision 6 — Incident Response

The incident response decision answers: what happens when a GenAI system produces a harmful, incorrect, or biased output at scale?

“At scale” is the key qualifier. Individual erroneous outputs from an LLM are routine and are managed through the human review process. An incident response procedure is triggered when: a systematic error affects a material number of transactions or customer interactions, a bias or fairness failure is identified in outputs affecting a protected group, a security incident results in the compromise of data processed by a GenAI system, or a regulatory concern is raised in relation to a GenAI application.

The incident response procedure should define: the escalation path from identification to senior management notification, the communication protocol for internal and external communication (including regulatory notification where required), the rollback procedure — how the application is suspended or reverted to a safe state while the incident is investigated, the post-incident review process and its mandatory outputs.

The rollback procedure deserves specific attention. For AI applications that are tightly integrated into operational processes, suspension may have significant operational impact. The procedure should specify not just how to suspend the application but what the manual alternative process is while it is suspended. Designing this before deployment, not after the incident, is the practical requirement.

Making Policy Enforceable

The six decisions above produce a policy that is specific enough to be tested. But specificity is necessary, not sufficient. Policies without enforcement mechanisms are aspirational documents regardless of their specificity.

The enforcement mechanisms that make an AI policy operational are:

Deployment approval gates: no GenAI application may be deployed to production without a completed approval record documenting the acceptable use assessment, data handling review, vendor assessment, logging configuration, and incident response plan. The approval authority should be senior enough to create accountability.

Quarterly review of active deployments: every production GenAI application should be reviewed on a quarterly basis against its original approval record, checking for drift in use case, data handling changes, model version changes, and performance against any defined quality metrics.

Named AI Risk Owner: each production GenAI application has a named individual — typically in the business function deploying the application — who is accountable for its ongoing compliance with this policy. This individual’s name appears in the deployment record and is the first escalation point for any concern about the application’s behaviour.

Consequences for policy violations: the policy should state that deploying a GenAI application without completing the required approval process, or using GenAI in a prohibited way, is a policy violation subject to the organisation’s standard disciplinary process. Without this, the policy is a request, not a requirement.

The Policy Review Cadence

GenAI capabilities and the regulatory environment surrounding them are changing at a pace that makes annual minimum review a genuine minimum, not a comfortable schedule.

A policy written in mid-2024 may be materially inadequate by mid-2026 — because new model capabilities have expanded what is technically possible (and what therefore needs to be governed), because regulatory guidance has been issued that changes the compliance requirements, or because an organisational incident has revealed a gap in the policy’s coverage.

Build the review process into the calendar before you publish the policy. Schedule an annual comprehensive review, and establish trigger-based reviews — reviews initiated by specific events — for three categories of trigger: a major regulatory change affecting AI governance (new BNM guidance, PDPA enforcement action, EU AI Act implementation milestone), a major model change by a primary vendor, or a significant incident involving any of the organisation’s GenAI applications.

The review process should be documented. A policy that is reviewed without a record of what was considered, what was changed, and why is functionally unreviewable. The review record is also evidence of ongoing governance that a regulator or auditor will look for.

A Policy Is a Decision Document

A responsible AI policy, done well, is not a statement of values. Your organisation’s values are relevant to how you approach AI governance, but they are not the policy. The policy is the record of the decisions your organisation has made about how AI will be governed — who is accountable, what is permitted and prohibited, what must be logged, what must be reviewed, and what happens when things go wrong.

Make the decisions specific. Assign the accountability by name or role. Build the review process into the operational calendar before you publish the document. A policy of this kind is an organisational commitment, not a communications exercise — and it is the kind of commitment that a regulator, a client, or a counterparty can evaluate when they need to.


Find out how Nematix’s Strategy & Transformation practice can align your technology investments to business outcomes.