Why most enterprise AI agents never reach production and how Databricks plans to fix it

vb daily phone

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more


Many enterprise AI agent development efforts  never make it to production and it’s not because the technology isn’t ready. The problem, according to Databricks, is that companies are still relying on manual evaluations with a process that’s slow, inconsistent and difficult to scale.

Today at the Data + AI Summit, Databricks launched Mosaic Agent Bricks as a solution to that challenge. The technology builds on and extends the Mosaic AI Agent Framework the company announced in 2024. Simply put, it’s no longer good enough to just be able to build AI agents in order to have real-world impact.

The Mosaic Agent Bricks platform automates agent optimization using a series of research-backed innovations. Among the key innovations is the integration of TAO (Test-time Adaptive Optimization), which provides a novel approach to AI tuning without the need for labeled data. Mosaic Agent Bricks also generates domain-specific synthetic data, creates task-aware benchmarks and optimizes quality-to-cost balance without manual intervention.

Fundamentally the goal of the new platform is to solve an issue that Databricks users had with existing AI agent development efforts.

“They were flying blind, they had no way to evaluate these agents,” Hanlin Tang, Databricks’ Chief Technology Officer of Neural Networks, told VentureBeat. “Most of them were relying on a kind of manual, manual vibe tracking to see if the agent sounds good enough, but this doesn’t give them the confidence to go into production.”

From research innovation to enterprise AI production scale

Tang was previously the co-founder and CTO of Mosaic, which was acquired by Databricks in 2023 for $1.3 billion.

At Mosaic, much of the research innovation didn’t necessarily have an immediate enterprise impact. That all changed after the acquisition.

“The big light bulb moment for me was when we first launched our product on Databricks, and instantly, overnight, we had, like thousands of enterprise customers using it,” Tang said.

In contrast, prior to the acquisition Mosaic would spend months trying to get just a handful of enterprises to try out products. The integration of Mosaic into Databricks has given Mosaic’s research team direct access to enterprise problems at scale and revealed new areas to explore.

This enterprise contact revealed new research opportunities. 

“It’s only when you have contact with enterprise customers, you work with them deeply, that you actually uncover kind of interesting research problems to go after,” Tang explained. “Agent Bricks….is, in some ways, kind of an evolution of everything that we’ve been working on at Mosaic now that we’re all fully, fully bricksters.”

Solving the agentic AI evaluation crisis

Enterprise teams face a costly trial-and-error optimization process. Without task-aware benchmarks or domain-specific test data, every agent adjustment becomes an expensive guessing game. Quality drift, cost overruns and missed deadlines follow.

Agent Bricks automates the entire optimization pipeline. The platform takes a high-level task description and enterprise data. It handles the rest automatically.

First, it generates task-specific evaluations and LLM judges. Next, it creates synthetic data that mirrors customer data. Finally, it searches across optimization techniques to find the best configuration.

“The customer describes the problem at a high level and they don’t go into the low level details, because we take care of those,” Tang said. “The system generates synthetic data and builds custom LLM judges specific to each task.”

The platform offers four agent configurations:

  • Information Extraction: Converts documents (PDFs, emails) into structured data. One use case could be retail organizations that use it to pull product details from supplier PDFs, even with complex formatting.
  • Knowledge Assistant: Provides accurate, cited answers from enterprise data. For example, manufacturing technicians can get instant answers from maintenance manuals without digging through binders.
  • Custom LLM: Handles text transformation tasks (summarization, classification). For example, healthcare organizations can customize models that summarize patient notes for clinical workflows.
  • Multi-Agent Supervisor: Orchestrates multiple agents for complex workflows. One use case example is financial services firms that can coordinate agents for intent detection, document retrieval and compliance checks.

Agents are great, but don’t forget about data

Building and evaluating agents is a core part of making AI enterprise ready, but it’s not the only part that’s needed.

Databricks positions Mosaic Agent Bricks as the AI consumption layer sitting atop its unified data stack. At the Data + AI Summit, Databricks also announced the general availability of its Lakeflow data engineering platform, which was first previewed in 2024.

Lakeflow solves the data preparation challenge. It unifies three critical data engineering journeys that previously required separate tools. Ingestion handles getting both structured and unstructured data into Databricks. Transformation provides efficient data cleaning, reshaping and preparation. Orchestration manages production workflows and scheduling.

The workflow connection is direct: Lakeflow prepares enterprise data through unified ingestion and transformation, then Agent Bricks builds optimized AI agents on that prepared data. 

“We help get the data into the platform, and then you can do ML, BI and AI analytics,” Bilal Aslam,  Senior Director of Product Management at Databricks told VentureBeat. 

Going beyond data ingestion, Mosaic Agent Bricks also benefits from Databricks’ Unity Catalog’s governance features. That includes access controls and data lineage tracking. This integration ensures that agent behavior respects enterprise data governance without additional configuration.

Agent Learning from Human Feedback eliminates prompt stuffing

One of the common approaches to guiding AI agents today is to use a system prompt. Tang referred to the practice of ‘prompt stuffing’ where users shove all kinds of guidance into a prompt in the hope that the agent will follow it.

Agent Bricks introduces a new concept called – Agent Learning from Human Feedback. This feature automatically adjusts system components based on natural language guidance. It solves what Tang calls the prompt stuffing problem. According to Tang, the prompt stuffing approach often fails because agent systems have multiple components that need adjustment.

Agent Learning from Human Feedback is a system that automatically interprets natural language guidance and adjusts the appropriate system components. The approach mirrors reinforcement learning from human feedback (RLHF) but operates at the agent system level rather than individual model weights.

The system handles two core challenges. First, natural language guidance can be vague. For example, what does ‘respect your brand’s voice’ actually mean? Second, agent systems contain numerous configuration points. Teams struggle to identify which components need adjustment.

The system eliminates the guesswork about which agent components need adjustment for specific behavioral changes.

“This we believe will help agents become more steerable,” Tang said.

Technical advantages over existing frameworks

There is no shortage of agentic AI development frameworks and tools in the market today. Among the growing list of vendor options are tools from Langchain, Microsoft and Google.

Tang argued that what makes Mosaic Agent Bricks different is the optimization. Rather than requiring manual configuration and tuning, Agent Bricks incorporates multiple research techniques automatically: TAO, in-context learning, prompt optimization and fine-tuning.

When it comes to agent to agent communications, there are a few options in the market today, including Google’s Agent2Agent protocol. According to Tang, Databricks is currently exploring various agent protocols and hasn’t committed to a single standard.

Currently, Agent Bricks handles agent-to-agent communication through two primary methods:

  1. Exposing agents as endpoints that can be wrapped in different protocols.
  2. Using a multi-agent supervisor that is MCP (Model Context Protocol) aware.

Strategic implications for enterprise decision-makers

For enterprises looking to lead the way in AI, it’s critical to have the right technologies in place to evaluate quality and effectiveness.

Deploying agents without evaluation isn’t going to lead to an optimal outcome and neither will having agents without a solid data foundation. When considering agent development technologies, it’s critical to have proper mechanisms to evaluate the best options.

The Agent Learning from Human Feedback approach is also noteworthy for enterprise decision makers as it helps to guide agentic AI to the best outcome.

For enterprises looking to lead in AI agent deployment, this development means evaluation infrastructure is no longer a blocking factor. Organizations can focus resources on use case identification and data preparation rather than building optimization frameworks.

memoment editorial note: This article analyzes new advancements in artificial intelligence, AGI research, and singularity theories that reshape our technological future.


This article was curated by memoment.jp from the feed source: Venture Beat AI.

Original article: https://venturebeat.com/ai/why-most-enterprise-ai-agents-never-reach-production-and-how-databricks-plans-to-fix-it/

© All rights belong to the original publisher.