How to Trace AI System Failures When Production Models Break