The BeeAI Security Analyst
While CLI agents are powerful, sometimes a visual dashboard is essential for monitoring complex, multi-step reasoning processes. The BeeAI Analyst combines the raw power of local LLMs with a reactive FastAPI web interface.
Below you can see a simulation of the actual web interface. Watch as the agent receives a high-level security mandate, plans its research using the ThinkTool, gathers information via DuckDuckGo and Wikipedia, and produces a strategic final report.
🛡️ BeeAI Analyst (FastAPI)
This "Glass Box" approach allows operators to trust the AI's conclusions by verifying the sources (Wikipedia, DuckDuckGo) and reasoning steps (ThinkTool) that led to those results.
Under the Hood
The BeeAI Analyst is designed for performance and privacy. Unlike cloud-based agents, this entire stack runs locally on your machine, ensuring no sensitive data leaves your network.
1. FastAPI & Async Architecture
The backend is powered by FastAPI and Uvicorn, leveraging Python's asyncio to handle multiple concurrent connections without blocking. We use Server-Sent Events (SSE) to stream the agent's thinking process to the frontend in real-time, giving the user immediate feedback.
2. Resource Management with Semaphores
Running large language models (LLMs) locally is GPU-intensive. To prevent "Out-Of-Memory" (OOM) errors, the system implements an Async Semaphore (gpu_semaphore). This acts as traffic control, ensuring only one heavy inference task occupies the GPU at a time, while other requests are queued efficiently.
3. The BeeAI Framework
At its core lies the BeeAI Framework. It orchestrates the agent's lifecycle:
- ThinkTool: Allows the agent to pause and plan its next steps.
- Research Tools: Integration with DuckDuckGo, Wikipedia, and OpenMeteo for real-world data.
- Memory: Unlimited memory allows the agent to maintain context throughout the session.
4. Local LLM via Ollama
The intelligence is provided by Ollama, running a specially tuned gemma-agent model. By using the OpenAI-compatible endpoint, we can swap out the underlying models (Llama 3, Mistral, Gemma) without changing a single line of application code.
5. RAG & Document Intelligence with Docling
The Analyst features advanced Retrieval-Augmented Generation (RAG) capabilities. Users can upload various file formats (PDF, DOCX, images) which are processed using Docling for high-quality text extraction. The content is then partitioned and stored in a local vector database, enabling the agent to provide context-aware answers based on your private documents.
The full source code, including the FastAPI server, agent configuration, and frontend templates, is available on GitHub.
View on GitHub