Developer Builds Open-Source AI NVR System via Vibe Coding
Developer Ships Open-Source AI Surveillance System Built Entirely With Vibe Coding
A solo developer has released AI-NVR, an open-source AI-powered network video recorder system built almost entirely through vibe coding — the increasingly popular practice of using AI assistants to generate entire software projects. Created during China's May Day holiday break, the project demonstrates both the power and the limitations of letting large language models write your code, while also filling a genuine gap in the smart home surveillance market.
The project, available now on GitHub, connects standard IP cameras to vision AI models for intelligent scene analysis and real-time notifications — something that existing consumer-grade camera systems and even popular open-source alternatives like Frigate NVR struggle to deliver with sufficient flexibility.
Key Takeaways
- AI-NVR is a fully open-source surveillance system that uses vision language models to analyze camera feeds in real time
- The entire project was built using vibe coding over a 5-day holiday, consuming approximately 4.92 billion tokens in AI-assisted development
- It supports multiple simultaneous cameras with multi-frame analysis sent to configurable vision models
- Compatible vision models include LFM-2.5-VL-1.6B and Qwen3.5-0.8B, with support for adding custom models
- Notifications are delivered via webhooks, enabling integration with messaging platforms like Slack, Discord, or Telegram
- The developer acknowledges the system's effectiveness is heavily dependent on the underlying vision model's capabilities
A Quirky Origin Story Highlights a Real Problem
The project's origin story is as entertaining as it is relatable. The developer discovered that their Malinois dog, kept in an outdoor area, was being mounted by stray dogs at night. Their existing TP-Link camera system offered basic smart detection, but it lacked the granularity needed for the specific scenario: detecting when 2 or more dogs appeared in a designated zone simultaneously and sending an immediate alert.
This is a use case that perfectly illustrates the gap between consumer-grade 'smart' cameras and truly intelligent surveillance. Most affordable IP cameras can detect motion or even recognize human shapes, but asking them to count specific animals in a defined region and trigger context-aware notifications is far beyond their capabilities.
The developer first turned to Frigate, the popular open-source NVR system that integrates with Home Assistant. However, they found it difficult to configure for their specific needs. Rather than spending weeks writing a custom solution from scratch, they turned to AI-assisted development.
Vibe Coding at Scale: 4.92 Billion Tokens in 5 Days
The term vibe coding — coined by Andrej Karpathy in early 2025 — describes the practice of describing what you want to an AI coding assistant and letting it generate the implementation. This project represents one of the more ambitious examples of the approach applied to a complete, functional system.
Over the 5-day May Day holiday, the developer engaged in what they describe as 'high-intensity AI exploitation,' burning through roughly 4.92 billion tokens across their interactions with coding AI models. The result is a working system that, while admittedly rough around the edges, solves the core problem and is extensible enough for others to build upon.
This token consumption figure is noteworthy. At current API pricing — for example, GPT-4o's input rate of approximately $2.50 per million tokens — that volume of interaction could represent significant costs, though the developer likely used a combination of models at varying price points. It underscores both the accessibility and the hidden costs of vibe coding for non-trivial projects.
How AI-NVR Works Under the Hood
The system's architecture is designed around a straightforward but powerful pipeline. Here's how it operates:
- Camera Integration: AI-NVR connects to standard IP cameras (like TP-Link models) via RTSP streams, supporting multiple simultaneous feeds
- Multi-Frame Capture: Rather than analyzing single snapshots, the system captures multiple frames and sends them together to the vision model for richer context
- Vision Model Analysis: The frames are processed by configurable vision language models (VLMs) that can understand and describe what's happening in the scene
- Signal System: The AI's analysis updates internal 'signals' — essentially state variables that track conditions of interest (e.g., 'number of dogs detected')
- Webhook Notifications: When signals meet defined thresholds, the system fires webhook-based alerts that can integrate with virtually any notification service
The multi-frame approach is particularly clever. By sending several frames at once, the system gives the vision model temporal context — making it easier to distinguish between a dog walking through the frame and a dog that has stopped and is interacting with another animal.
Vision Model Performance Varies Significantly
The developer tested several compact vision language models and shared candid assessments of their performance. Two models stood out as particularly useful:
- LFM-2.5-VL-1.6B: A lightweight vision-language model that the developer found surprisingly capable for basic scene understanding tasks
- Qwen3.5-0.8B: An even smaller model from Alibaba's Qwen family that offers reasonable performance at minimal computational cost
Both models are small enough to run on consumer hardware, which is critical for a surveillance application that needs to process frames continuously. However, the developer is transparent about the system's limitations, noting that 'the NVR's effectiveness is mediocre overall — its capability is primarily determined by the vision model being used.'
This honest assessment highlights a broader truth about the current state of edge AI: while small vision models have improved dramatically, they still struggle with nuanced scene understanding compared to larger models like GPT-4o or Claude's vision capabilities. The trade-off between model size, inference speed, and accuracy remains one of the central challenges in deploying AI for real-time applications.
Where AI-NVR Fits in the Smart Surveillance Landscape
The home surveillance market is evolving rapidly, with AI capabilities becoming a key differentiator. Major players like Google Nest, Ring (Amazon), and Arlo all offer some degree of intelligent detection, but their capabilities are typically limited to predefined categories — people, packages, vehicles, and animals.
On the open-source side, Frigate NVR has emerged as the leading solution, integrating Google's Coral TPU for object detection and working seamlessly with Home Assistant. However, Frigate primarily uses traditional object detection models (like YOLO variants) rather than vision language models, which limits its ability to understand complex scenes or respond to natural-language queries about what's happening in a feed.
AI-NVR takes a fundamentally different approach by leveraging vision language models — the same technology that powers multimodal AI assistants. This means users can potentially describe detection scenarios in natural language rather than configuring rigid detection zones and object classes. The trade-off is higher computational cost and potentially slower inference times.
Other projects in this space include:
- Frigate NVR: Object detection focused, YOLO-based, Home Assistant integration
- Blue Iris: Commercial Windows-based NVR with AI plugin support
- Viseron: Open-source NVR with machine learning capabilities
- Scrypted: Home automation platform with NVR features and HomeKit support
What This Means for Developers and Hobbyists
AI-NVR represents a compelling template for a new category of AI applications: systems where vision language models replace traditional computer vision pipelines. For developers interested in building smart home or IoT solutions, the project offers several takeaways.
First, the barrier to building functional AI systems continues to drop. A single developer, armed with AI coding assistants, produced a working multi-camera surveillance system in under a week. The code may need polish, but the core functionality works.
Second, small vision models are becoming genuinely useful for edge applications. Models under 2 billion parameters can now handle basic scene understanding tasks that would have required much larger models just a year ago. As these models continue to improve — particularly with optimizations for on-device inference — applications like AI-NVR will only get more capable.
Third, the webhook-based notification approach is smart architectural design. By decoupling the detection logic from the notification delivery, the system can integrate with any platform without requiring built-in support for each one.
Looking Ahead: Room to Grow
The developer acknowledges that AI-NVR has 'obvious room for optimization' and plans to continue improving it when time permits. Several areas stand out as promising directions for future development:
Model performance will likely improve as newer, more capable small VLMs are released. The rapid pace of development in the open-source model community — with organizations like Meta, Alibaba, and Mistral regularly releasing improved models — means that AI-NVR's capabilities could improve significantly just by swapping in better models.
Community contributions could also accelerate development. The project's open-source nature means other developers facing similar niche surveillance needs can contribute improvements, add camera compatibility, or optimize the inference pipeline.
The project also raises interesting questions about the future of vibe coding as a development methodology. While the 4.92 billion tokens consumed suggest the process was far from effortless, the end result — a functional, open-source system built by a single person in 5 days — would have been nearly impossible through traditional development approaches in the same timeframe.
For anyone interested in experimenting with AI-powered surveillance, the project is available at github.com/2234839/ai-nvr. Just be prepared: as the developer learned, the hardest part isn't building the system — it's finding a vision model smart enough to tell you when your dog is in trouble.
📌 Source: GogoAI News (www.gogoai.xin)
🔗 Original: https://www.gogoai.xin/article/developer-builds-open-source-ai-nvr-system-via-vibe-coding
⚠️ Please credit GogoAI when republishing.