Where I let AI in, and where I don't

Tuesday I gave Claude a 215-line Zod schema for a license-category form and asked it to wire up the form, the field-array hooks, and the validation. Twelve minutes later I had six section components and a working create-and-edit screen. I'd have spent an afternoon typing it.

This morning I'm reviewing a pull request that changes one line in our audit-scoring service — the weighted-average step that turns per-question scores into a section contribution. My monitor is split between the diff and the migration file the change depends on. I haven't touched Claude. I won't, on this one.

Both feel correct.

I build a CBDC platform deployed at 6+ central banks across Africa and the Caribbean. Regulators issue rules on one side, and financial service providers comply with them on the other. The platform handles licensing, examinations, and audit scoring. It involves real users and real money. Audit obligations can be checked years later by people who weren't involved when the code was created.

When a feature is wrong in normal software, you can fix it quickly. But when a compliance scoring rule is wrong, a bank could lose its license, a regulator faces criticism, and your entire weekend turns into a postmortem session.

Over the past year, I have integrated AI assistance, mainly Claude and sometimes Codex, into much of my work. I want to outline where I rely on it significantly and where I don't, along with my reasons for those choices.

Where I lean on it

This is the easy half. When the answer is clear from the spec, AI is faster than typing. Much faster. That takes up more of my week than I expected before I started keeping track.

Scaffolding from a typed schema. So I write the validation, then AI takes over and writes the form, the field-array hooks, the test stubs, the typed React Hook Form integration, everything else. Last week's license-category schema was 215 lines; the form components, the hook, and the bindings around it were closer to a thousand. Compared to the 215 lines I took some time to write.

Propagating an API shape change through the frontend. When a backend DTO changes, like a nested status field being flattened or an endpoint splitting in two, twenty components and forty tests need to adjust to the new shape. AI does the renaming. I make sure the assertions still mean what they used to mean.

Fixture data with the right shape and the wrong values. Our seed scaffolding needs realistic license categories, examination submissions, and document uploads without including anything that resembles a real bank. AI excels at this. I've stopped writing seed fixtures by hand. There's no point trying to write seed data when AI does it much better and faster.

First drafts of PR descriptions and design docs. I write the main points, like the decision, the trade-offs, the alternatives I rejected. AI develops the prose. I cut about a third of what it writes, because AI tends to overwrite.

// The schema is the spec. AI handles the rest.
 
const licenseCategorySchema = z.object({
  categoryCode: z.string().regex(/^[A-Z0-9_-]+$/),
  requirements: z.array(z.object({
    text: z.string().min(1).max(1000),
    mandatory: z.boolean().default(true),
  })).min(1),
  documentRequirements: z.array(documentRequirementSchema).default([]),
  renewalRequired: z.boolean().default(true),
  currencyCode: z.string().optional(),
  annualFee: z.number().nonnegative(),
  validityPeriodMonths: z.number().int().positive(),
  // ...
}).superRefine((value, ctx) => {
  // cross-field: fees imply currency; renewal implies annualFee + validity
  if (value.renewalRequired && !value.annualFee) {
    ctx.addIssue({
      path: ['annualFee'],
      message: 'Annual renewal fee is required when renewal is enabled',
    });
  }
});
 
// I get back: the form, the section components, the field-array hooks,
// the validation messages, the typed RHF integration, a passing test stub.
//
// I spend my time on the things the schema can't say:
//   - what "renewal required" actually means under THIS regulator's rules
//   - the audit-trail entry written when a category changes
//   - the error message a compliance officer reads at 11pm

That's most of my week.

Where I don't, and why

The other list is shorter. It is also the half I am paid for, and it is the half where AI performs the worst.

Compliance scoring. This is the engine I started with. It always has the same answers, the same form schema, and the same scoring model. The result is the same today as it will be three years from now when an auditor wants to recompute. AI can introduce variation that I can't defend in a meeting with my bosses. I write it by hand and treat it like the most important code in the system because, for the people who use it, it really is.

Audit-trail writers. If you cannot reconstruct what happened on a specific date for a specific FSP, the legal review fails. This involves three properties that auditors care about: append-only (no edits; corrections are new entries), content-addressed (a hash acts as the receipt for the payload), and chained (each entry references the previous one for that actor, so you cannot quietly drop one from the middle). AI sometimes suggests "cleaner" versions that eliminate the hash because it seems like unnecessary overhead. I keep rejecting that idea.

Security-boundary code. Maker-checker, role and clearance checks, and multi-tenant isolation between regulators and FSPs are crucial. Silent failures in this area do not fail tests; by the time anyone notices, you may have shown one regulator's queue to another or auto-approved an operation that required a second pair of eyes. Being thorough is the only way to be sure.

Error messages. The compliance officer reading a "Something went wrong" error toast decides whether your platform is the kind of software she trusts with her license. That voice is specific and AI does not have it yet, I think. I write those messages by hand, and so do the people on my team. Our product team works hard to come up with copy that show up on the platform, we can't trust AI with that. At least, not today.

// The shape I push for in design reviews. AI proposes cleaner
// versions that drop one of the properties. I refuse them.
 
type AuditEntry = {
  id: string;                  // ULID — sortable, stable, no clock dependence
  parentId: string | null;     // chains back to the prior entry for this actor
  actor:
    | { kind: 'regulator'; userId: string; clearance: 'L1'|'L2'|'L3'|'L4' }
    | { kind: 'fsp'; userId: string; institutionId: string }
    | { kind: 'system'; component: string };
  action: string;              // taxonomy-controlled — never free-form
  payload: unknown;            // canonical JSON
  payloadHash: string;         // sha256 of canonical-JSON payload
  recordedAt: string;          // ISO 8601, UTC, second precision — no ms drift
};

The common factor among everything on this list is that being quietly wrong is worse than being slow. This also defines regulated software.

The shift that mattered most

The key point isn’t either list. It’s how separating them changed my work.

Writing a clear prompt and creating a focused design document are really the same skill. AI is most helpful when the problem is clearly defined before I start the conversation. If I provide a clean specification, I can get something ready to ship in twelve minutes, or three minutes. If I give it a general idea, I can end up wasting forty minutes before realizing it’s not right.

The bottleneck shifted from typing to thinking. That’s what no one warned me about. I now define problems more carefully, not because I’ve become a better engineer, but because I want to give the AI a clear problem. This newfound discipline matters more than the speed boost. Sometimes I take more time than I used to, to think about a feature, to define the problem in a clean way, but the rewards are usually worth it. From experience, there are way less mistakes or errors when you do the problem thinking yourself, documenting it accordingly, before letting AI assist with implementation.

This isn’t just true for regulated systems. The same problem appears wherever silent failure is the main cost. The answer is likely the same. AI is a force multiplier when the input is solid and a confidence trap when the input is unclear. That’s just how the tool works.

My job itself hasn’t changed much. Senior engineering still values good judgment, and good judgment is mainly about knowing where you can move quickly. The tools have changed, but the skill hasn’t.

I will let an AI help create a form. I will not let it design my audit trail. I think that's about right.