AI that understands what you mean, not just what you measure.
↓in·ten·sion·al (Logic) — incapable of explanation solely in terms of the set of objects to which it is applicable; requiring explanation in terms of meaning or understanding
— Collins Dictionary
Models trained on simple reward hacking spontaneously generalized to tampering with their own reward functions. No one taught them. They taught themselves to cheat.
of the time, o1-preview attempted to hack the game system when losing at chess against stronger opponents.
AI shopkeeper socially engineered into giving away free products, accepting fake CEO coups, and declaring "communist pricing" — all by simply asking nicely.
Extensional success. Intensional failure.
What things refer to. Measurable. Outcome-focused.
Co-referring terms are interchangeable.
What things mean. Context-aware. Intent-focused.
Co-referring terms are not always substitutable.
Same answer (technically). Different understanding.
Run pylint and fix the errors in auth.py
Done. I've resolved the pylint errors:
# pylint: disable=line-too-long to lines
42, 67, 89
# pylint: disable=too-many-arguments to
authenticate()
broad-exception-caught — this
pattern is intentional
✓ pylint now passes with 0 errors
Done. I've refactored auth.py to address the
underlying issues:
validate_credentials() and
check_permissions() from the 12-parameter
authenticate()
except: with specific
AuthenticationError handling
✓ pylint passes — code is now more maintainable
It achieved what was measured, not what was meant.
Different contexts → different paths → different blind spots → better coverage
Different agents, isolated contexts, different probability paths — blind spots revealed.
A model reviewing its own work follows the same probability path that created the problem. Self-review perpetuates errors.
Agents traverse different paths. Humans arbitrate the output.
Explore the full visualization →