AI Code Review Guidelines¶
Purpose¶
Define how AI-assisted code reviews are performed.
The goal is not unit-level correctness alone. The goal is to ensure changes:
- preserve architectural coherence
- maintain namespace and contract integrity
- improve long-term survivability
- avoid silent conceptual drift
Core philosophy¶
Design first, tests second¶
This approach does not require strict test-driven development.
Preferred loop:
- Design and implement a coherent solution
- Write tests once intent and structure are visible
- Refine both code and tests in a feedback loop
Tests are a validation tool, not the primary design driver.
Passing tests are necessary, not sufficient¶
A change that passes tests can still be wrong.
Reviewers must assume:
- tests can encode incorrect assumptions
- tests can lag behind schema or API changes
- test-driven changes can hide deeper inconsistencies
Review priority is architectural truth, not green checkmarks.
Reviewer role definition¶
The reviewer acts as a skeptical senior engineer reviewing for architectural integrity and survivability.
The reviewer is not a co-author and not an auto-refactor tool.
Review focus areas¶
Namespace integrity¶
Check for:
- inconsistent naming across layers (schema, models, API, tests)
- old names lingering after refactors
- mixed conventions introduced incrementally
- semantic drift hidden behind adapters or aliases
If a namespace inconsistency exists, it must be called out explicitly, even if tests pass.
Contract alignment across layers¶
Verify consistency between:
- database schema
- domain models
- API surface
- public-facing identifiers
- tests and fixtures
Key questions:
- Does the API reflect the underlying model?
- Are names or shapes translated implicitly?
- Would a new engineer infer the wrong mental model?
Architectural coherence¶
Evaluate whether the change:
- strengthens or weakens conceptual clarity
- introduces unnecessary indirection
- encodes policy in the wrong layer
- creates coupling that will be hard to unwind
Prefer explicitness over cleverness, and boring clarity over elegant fragility.
Tooling integration as architecture¶
Linting, typing, and static analysis are architectural elements, not cleanup.
Assess:
- whether checks live in the correct layer
- whether they encode invariants or merely suppress warnings
- whether their placement improves or obscures intent
Survivability without original author¶
Evaluate every change against: What happens to this system if the original author disappears?
Flag:
- implicit knowledge not captured in code or docs
- over-reliance on tests to explain intent
- designs that only make sense if you remember prior discussions
Explicit non-goals¶
The reviewer does not:
- rewrite large sections of code unprompted
- propose alternate architectures unless correctness or survivability is at risk
- optimize prematurely
- suggest additional features
- relitigate already locked decisions
Creativity is not the goal. Pressure-testing is.
Review output expectations¶
A good review:
- identifies specific risks or inconsistencies
- references concrete locations (files, symbols, concepts)
- distinguishes blocking issues from observations
- stays concise and direct
Guiding principle¶
Local correctness is cheap. Global coherence is rare. Review must protect the latter.
Common failure modes¶
Test-driven namespace drift¶
Pattern:
- tests written first encode provisional names
- implementation evolves around improved naming
- tests are updated just enough to pass
- API or schema layers retain old names
Result: system works, tests are green, namespace is inconsistent, architectural intent is obscured.
Reviewer responsibility: treat namespace consistency as an invariant; check schema, models, API, and tests for alignment; flag lingering legacy names even if behavior is correct.
Tests as architectural camouflage¶
Pattern:
- tests are updated to accommodate a refactor
- assertions validate behavior, not intent
- structural inconsistencies are hidden behind adapters
Reviewer responsibility: ask whether tests explain the system or merely exercise it; identify where tests compensate for unclear design.