// 07_AI_AGENTS

The BeeAI Security Analyst

Jan 20, 2026 • Interactive Web Demo

While CLI agents are powerful, sometimes a visual dashboard is essential for monitoring complex, multi-step reasoning processes. The BeeAI Analyst combines the raw power of local LLMs with a reactive FastAPI web interface.

Below you can see a simulation of the actual web interface. Watch as the agent receives a high-level security mandate, plans its research using the ThinkTool, gathers information via DuckDuckGo and Wikipedia, and produces a strategic final report.

📟 Live Terminal Events & Tools

Ready for input...

💡 Analysis Report Final Answer

This "Glass Box" approach allows operators to trust the AI's conclusions by verifying the sources (Wikipedia, DuckDuckGo) and reasoning steps (ThinkTool) that led to those results.

Under the Hood

The BeeAI Analyst is designed for performance and privacy. Unlike cloud-based agents, this entire stack runs locally on your machine, ensuring no sensitive data leaves your network.

1. FastAPI & Async Architecture

The backend is powered by FastAPI and Uvicorn, leveraging Python's asyncio to handle multiple concurrent connections without blocking. We use Server-Sent Events (SSE) to stream the agent's thinking process to the frontend in real-time, giving the user immediate feedback.

2. Resource Management with Semaphores

Running large language models (LLMs) locally is GPU-intensive. To prevent "Out-Of-Memory" (OOM) errors, the system implements an Async Semaphore (gpu_semaphore). This acts as traffic control, ensuring only one heavy inference task occupies the GPU at a time, while other requests are queued efficiently.

3. The BeeAI Framework

At its core lies the BeeAI Framework. It orchestrates the agent's lifecycle:

ThinkTool: Allows the agent to pause and plan its next steps.
Research Tools: Integration with DuckDuckGo, Wikipedia, and OpenMeteo for real-world data.
Memory: Unlimited memory allows the agent to maintain context throughout the session.

4. Local LLM via Ollama

The intelligence is provided by Ollama, running a specially tuned gemma-agent model. By using the OpenAI-compatible endpoint, we can swap out the underlying models (Llama 3, Mistral, Gemma) without changing a single line of application code.

5. RAG & Document Intelligence with Docling

The Analyst features advanced Retrieval-Augmented Generation (RAG) capabilities. Users can upload various file formats (PDF, DOCX, images) which are processed using Docling for high-quality text extraction. The content is then partitioned and stored in a local vector database, enabling the agent to provide context-aware answers based on your private documents.

The full source code, including the FastAPI server, agent configuration, and frontend templates, is available on GitHub.

View on GitHub