Project Castellana: Safety implementation of a VC Agent

Implementation of an AI Agent with Open Source Models and Single-Agent Safety

May 25, 2025

AI Agents are rapidly transforming how software is built. As this is the Agentic era and software engineering has change I wanted to create a project that teach me how to write software securely

Software ate the world and AI is eating software and venture capital is no exception. At our firm, writing memos is a core part of our investment process. These memos explain our analysis, due diligence, and investment thesis, regardless of the startup’s stage.

To support this process, I’ve been using tools like Perplexity to assist with market analysis. It has significantly reduced research time from around one week to just a few hours. However, while Perplexity is great for accelerating research, it doesn’t meet all the requirements needed for seamless integration into our internal investment workflows.

This led to Project Castellana, a prototype AI agent that can help write investment memos, built with safety engineering principles from day one.

The Problem: How Do We Actually Build Useful, Safe AI Agents?

To build a functioning AI agent, we need a few key components:

An agentic framework – a software development kit (SDK) that lets us orchestrate interactions between tools and large language models (LLMs).
A clear role and task division – defining what each agent in the system should do.
Tools – custom-built or external tools that each agent can use to complete its tasks safely and accurately.

Some of the most popular open-source agentic frameworks include:

LangChain
LlamaIndex
CrewAI
AgentStack

For Project Castellana, I chose CrewAI because it allows for structured, multi-agent collaboration in a modular way.

from crewai import Agent
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from crewai_tools import EXASearchTool

The Agent Architecture

The system follows a hierarchical multi-agent approach, where each agent has a well-defined responsibility:

Strategic Advisor Agent

Goal: Oversee and coordinate the crew’s work, ensuring high-quality, relevant, and non-generic outputs.
Context: Acts as an experienced project manager focused on aligning output with market-specific investment needs.

def get_strategy_advisor(trace_id=None):

return create_agent(
  role='Project Manager',

  goal='Efficiently manage the crew and ensure high-quality task completion with a focus on ensuring that the results are very specific and relevant and not generic and too zoom out',

  backstory="""You're an experienced project manager, skilled in overseeing complex projects and guiding teams to success. Your role is to coordinate the efforts of the crew members, ensuring that each task is completed on time and that the results are relevant and specific to  the market.""",

  tools=[],
  trace_id=trace_id,
  agent_name='strategy_advisor'
)

Competitor Research Agent

Goal: Identify and analyze real startups in defined AI subsegments.
Context: Specialized in spotting emerging, verifiable startups excluding well-known players like Google, Meta, Anthropic, OpenAI, etc.

def get_competitor_analyst(trace_id=None):
return create_agent(

  role='AI Startup Intelligence Specialist',

  goal='Identify and analyze relevant AI startups within specific AI subsegment markets',

  backstory="""Expert in mapping competitive landscapes for specific AI verticals. Specialized in identifying real, named emerging startups and scale-ups rather than tech giants like IBM, OpenAI, Google, META, Anthropic, HuggingFace. Known for finding verifiable information about startups' funding, technology, and market focus.""",

  tools=[exa_search_tool],
  trace_id=trace_id,
  agent_name='competitor_analyst'
)

Tools the Agents Use

To support the above agents, I developed the following tools:

Market Size Tool – Estimates the total addressable market for a given segment.

def estimate_market_size(data: str) -> str:

return f"Estimated market size based on: {data}"
  market_size_tool = Tool(
  name="Market Size Estimator",
  func=estimate_market_size,
  description="Estimates market size based on provided data."
)

CAGR Calculator – Automatically computes compound annual growth rates from public or private data sources.

def calculate_cagr(initial_value: float, final_value: float, num_years: int) -> float:

cagr = (final_value / initial_value) ** (1 / num_years) - 1

return cagr

cagr_tool = Tool(

  name="CAGR Calculator",
  func=calculate_cagr,
  description="Calculates CAGR given initial value, final value, and number of years."
)

Search Tool (via Exa) – Allows agents to access real-time web search results, optimized for sourcing startup-specific information.

class CustomEXASearchTool(EXASearchTool):
  def __init__(self):
  super().__init__(
    type='neural',
    use_autoprompt=True,
    startPublishedDate='2021-10-01T00:00:00.000Z',
    endPublishedDate='2023-10-31T23:59:59.999Z',
  excludeText=['OpenAI', 'Anthropic', 'Google', 'Mistral', 'Microsoft', 'Nvidia','general AI market', 'overall AI industry', 'IBM', 'Mistral'],
  numResults=10
 )

exa_search_tool = CustomEXASearchTool()

Embedding Safety Engineering Principles in Project Castellana

The objective of Project Castellana is that the agentic system is built with safety engineering principles to make AI agents reliable and deployable in high-stakes professional contexts like investment decision-making.

Risk Decomposition

Project Castellana starts by identifying potential failure points:

Data inaccuracy (e.g., hallucinated market size)
Non-compliant output (e.g., biased or misleading content)
Oversight failures (e.g., one agent missing red flags)

These are broken down in terms of likelihood, severity, and exposure, allowing the design to target the most impactful risks early.

Safe Design Principles

Redundancy

Outputs of the agent are in place to support cross-verification of key findings by triggering human-in-the-loop reviews of the sources used by the agent.

Separation of Duties

The multi-agent structure ensures no single agent performs all tasks. Each agent has a tightly scoped responsibility, which limits cascading failure risks.

Principle of Least Privilege

Agents only have access to the tools and data relevant to their roles. For instance, the Strategic Advisor cannot directly query Exa—it relies on outputs from specialized agents.

Fail-Safes (In Progress)

Future iterations may include uncertainty estimates that flag outputs for human review if the confidence falls below a defined threshold.

Transparency

Outputs include tool provenance (e.g., “Market Size sourced from X, calculated via Y”), and internal reasoning steps can be logged and reviewed. This improves human interpretability.

Defense in Depth

The system is being designed to include multiple validation layers before an output is accepted into a memo—agent-level verification, tool-level checks, and optional human review.

Systemic Safety and Accident Models

Rather than focusing solely on the reliability of individual components—such as the Get Competitors Agent—Project Castellana is being developed with systemic risk in mind: the kinds of failures that emerge not from a single malfunction, but from the interactions and dependencies between agents, tools, and user feedback loops.

This mirrors safety models used in high-stakes domains like aviation, where accidents typically arise from a chain of events rather than one isolated breakdown. In complex systems, failures rarely occur in isolation; they are often the result of cascading errors, misaligned assumptions, or silent coordination breakdowns.

Castellana applies principles from systems engineering and accident modeling to proactively manage these risks, ensuring the entire agentic workflow behaves robustly and predictably—even under pressure.

Here's how:

1. Agent-to-Agent Communication Monitoring

Each agent in Castellana operates with a well-defined role, but their outputs are often inputs for others. For example, the Get Competitors Agent provides findings to the Strategic Advisor, who integrates them into the memo. Systemic risks arise if:

The Get Competitors Agent misinterprets the prompt and outputs incomplete data.
The Strategic Advisor assumes the data is comprehensive and doesn't seek corroboration.

To counteract this, Castellana introduces explicit handoff protocols, where agents pass metadata along with their output (e.g., source quality, timestamp, uncertainty), giving downstream agents richer context to assess validity.

2. Tool-Agent Interaction Governance

Agents rely on external tools—like Exa for search or a CAGR calculator—for critical data. Systemic risk surfaces when tools fail silently, return outdated data, or are misused. For example:

If Exa delivers results from 2020 without date metadata, an agent might incorrectly interpret them as current.
A parsing error in the Market Size Tool could propagate false estimates across the memo.

Castellana addresses this by:

Adding tool wrappers that enforce input/output validation and context tagging.
Logging all tool interactions so anomalies can be traced post-hoc.

Tail Events and Black Swans

Even if 99% of memos are accurate, the 1% that are confidently wrong pose significant reputational or financial risk. Black swan scenarios could include:

A flawed valuation that makes it into a partner meeting
A hallucinated startup cited as a key competitor
An inappropriate thesis generated from faulty data

By embracing the precautionary principle and horizon scanning (e.g., agents flagging “unknown unknowns” or anomalous outputs), Castellana aims to mitigate such risks even if they can’t be predicted.

Implementation Gaps and Next Steps

While the structure and intent of Project Castellana align strongly with safety engineering principles, not all principles are fully implemented yet. For instance:

Fail-safe mechanisms and confidence thresholds are being explored.
Redundancy and defense in depth are currently manual but will be automated.
Comprehensive logging and explainability will require further development.

Application of Single Agent Safety

Beyond classical safety engineering, the sources describe AI-specific safety concerns such as monitoring, robustness, alignment, and systemic safety. Here’s how these apply to Project Castellana:

Monitoring

Monitoring involves identifying hazards, reducing exposure, understanding internal representations, detecting anomalies, and increasing transparency.

Project Castellana already emphasizes transparency as a safety feature, with outputs indicating the tool provenance (e.g., “Market Size sourced from X”) to improve human interpretability and accountability.

To support monitoring and observability, Project Castellana uses Portkey.ai, a platform for managing and monitoring LLM-based agents in production. Portkey provides telemetry, error tracking, and prompt/response inspection capabilities that align with the monitoring and systemic safety goals described above. This operational layer helps bridge theory (AI safety principles) and practice (safe deployment of Castellana agents)

try:
   from portkey_ai import createHeaders, PORTKEY_GATEWAY_URL
   PORTKEY_AVAILABLE = True
except ImportError:
   PORTKEY_AVAILABLE = False
   print("Portkey not available, falling back to direct OpenAI usage")


def get_portkey_llm(trace_id=None, span_id=None, agent_name=None):
   if PORTKEY_AVAILABLE:
       headers = createHeaders(
           provider="openai",
           api_key=os.getenv("PORTKEY_API_KEY"),
           trace_id=trace_id,
       )
       if span_id:
           headers['x-portkey-span-id'] = span_id
       if agent_name:
           headers['x-portkey-span-name'] = f'Agent: {agent_name}'


       return ChatOpenAI(
           model="gpt-4o",
           base_url=PORTKEY_GATEWAY_URL,
           default_headers=headers,
           api_key=os.getenv("t")
       )
   else:
       # Fallback to direct OpenAI usage
       return ChatOpenAI(
           model="gpt-4",
           api_key=os.getenv("OPENAI_API_KEY")
       )

Future enhancements could include:

Developing benchmarks and evaluations to assess the accuracy and quality of investment memo outputs.
Implementing anomaly detection to flag unexpected or potentially hazardous agent behavior.
Exploring mechanistic interpretability to better understand agents’ decision processes, though this remains a challenging area.

Robustness

Robustness addresses vulnerabilities in AI systems, including resistance to adversarial examples and Trojans.

Project Castellana acknowledges key risks like data inaccuracies and non-compliant outputs.
It applies redundancy (cross-verifying information across sources) and defense in depth (multiple validation layers, such as automated consistency checks and human-in-the-loop reviews), both critical in mitigating robustness failures.
Further steps could involve:
- Ensuring adversarial robustness for the models and tools used.
- Auditing against Trojans, especially if open-source or externally trained models are incorporated.

Alignment

Alignment is about ensuring that AI agents act in line with human intent, avoiding deceptive or unintended behavior.

Castellana uses separation of duties and the principle of least privilege to constrain agent behavior.

A Strategic Advisor Agent oversees outputs for quality and specificity, supporting high-level alignment with the memo-writing goal.

AI XHIELD

Discussion about this post