AI pillar · Module 6 of 6
How to evaluate AI systems
Eventually you’ll need to decide whether to adopt an AI tool, approve a vendor, or assess an internal project. Here’s how to ask the right questions.
The essential questions
- What problem does this actually solve? Not “what does it do”—what business problem does it address? Is that problem worth solving?
- How was it trained? What data? From where? Any obvious bias risks? Is the training data appropriate for your use case?
- How accurate is it? What’s the error rate? On whose data? Does it work for your specific context and population?
- What happens when it’s wrong? What’s the failure mode? How bad is a false positive vs. false negative? Who catches errors?
- Who’s accountable? When something goes wrong, who owns it? The vendor? Your team? Is that clear in contracts?
- What data does it see? What inputs does it need? Where does that data go? What about privacy and confidentiality?
- Can you explain it? If a customer or regulator asks why a decision was made, can you answer? Do you need to?
Red flags to watch for
- “It’s AI” as the only explanation
- No documentation of training data
- Accuracy claims without context
- No plan for when it fails
- Vendor can’t explain how it works
- “Trust the algorithm” mentality
- No human oversight in the loop
- Vague privacy policies
- No bias testing or monitoring
- Unclear data retention and deletion
🧭 The bottom line
AI is a tool. Evaluate it like any other tool: Does it solve a real problem? What are the risks? What happens when it fails? If you can’t answer these questions, you’re not ready to deploy.
Free resources to go deeper
- Checklist: Zeph Tech AI Deployment Tips — Our evaluation checklist
- Guide: AI Procurement Governance Guide — Vendor evaluation framework
Nice work! What’s next?
You now have a solid foundation in AI—what it is, how it works, the risks, and how to evaluate it. Here’s where to go from here.
For hands-on learners
- Try Google Teachable Machine—train your own model in minutes
- Experiment with ChatGPT or Claude on real tasks
- Take Andrew Ng’s AI for Everyone for more depth
For leaders and decision-makers
- Inventory your organisation’s AI systems
- Review our AI Governance Implementation Guide
- Stay current with our daily AI briefings