This post was originally published by Source Author on Venture Beat.

The intelligence of AI models isn’t what’s blocking enterprise deployments. It’s the inability to define and measure quality in the first place.

That’s where AI judges are now playing an increasingly important role. In AI evaluation, a “judge” is an AI system that scores outputs from another AI system. 

Judge Builder is Databricks’ framework for creating judges and was first deployed as part of the company’s Agent Bricks technology earlier this year. The framework has evolved significantly since its initial launch in response to direct user feedback and deployments.

Early versions focused on technical implementation but customer feedback revealed the real bottleneck was