@mohitjoshi14
by mohitjoshi14
Automated LLM quality gate using evalsense. Creates and runs a statistical eval for a recently built LLM feature, then gives a ship/no-ship decision based on accuracy, precision, recall, and F1 thresholds.