📑 Table of Contents

Build a Persistent AI Assistant on GCP with Telegram

📅 · 📁 Tutorials · 👁 7 views · ⏱️ 12 min read
💡 A step-by-step look at combining GCE, Hermes Agent, and Telegram to create an always-on AI assistant that remembers everything.

The Dream of an Always-On AI Agent Is Now Within Reach

For years, the dream of a truly autonomous, always-on AI assistant has felt just out of reach — a concept relegated to science fiction or limited by the fragile, stateless nature of most chat interfaces. We have grown accustomed to assistants that forget us the moment we close the browser tab. But a growing wave of developers is now proving that with the right cloud infrastructure, open-source agent frameworks, and messaging platforms, building a persistent AI companion is not only possible — it is surprisingly practical.

The architecture in question combines three powerful components: Google Compute Engine (GCE) for always-on hosting, the Hermes Agent framework for intelligent task execution, and Telegram as the universal messaging interface. Together, they form a stack that turns ephemeral chatbot interactions into a durable, stateful AI experience.

Why Persistence Matters

Most AI assistants today operate in a request-response loop. You ask a question, get an answer, and the context vanishes. This stateless design is fine for one-off queries, but it falls apart when you want an agent that tracks your projects, remembers your preferences, follows up on tasks, or monitors data streams over days and weeks.

Persistence changes the game entirely. A persistent agent maintains memory across sessions, can be triggered by scheduled events or external signals, and accumulates knowledge over time. It shifts the AI from being a reactive tool to a proactive collaborator.

The Stack: GCE + Hermes Agent + Telegram

Google Compute Engine: The Foundation

Google Compute Engine provides the virtual machine infrastructure that keeps the agent alive 24/7. Unlike serverless functions that spin up and down, a GCE instance runs continuously, making it ideal for long-running processes like an AI agent.

Developers can start with a modest e2-micro or e2-small instance — costing as little as $6-$10 per month — and scale up as workloads grow. GCE also offers persistent disks, which means the agent's memory, logs, and configuration files survive reboots and maintenance events. Google's global network ensures low-latency connectivity, and integration with other GCP services like Cloud Storage, BigQuery, and Secret Manager adds further flexibility.

The key advantage here is simplicity. You SSH into a VM, set up your environment, and deploy. There is no complex container orchestration required for a single-agent setup.

Hermes Agent: The Brain

The Hermes Agent framework serves as the cognitive engine of the system. Built around the open-source Hermes series of large language models — originally fine-tuned by Nous Research — the framework supports tool use, function calling, and structured reasoning out of the box.

What makes Hermes particularly well-suited for persistent agents is its emphasis on agentic behavior. Rather than simply generating text, Hermes models are trained to decompose tasks, call external APIs, manage multi-step workflows, and maintain conversational context across interactions. When paired with a vector database or simple JSON-based memory store on the GCE persistent disk, the agent can recall past conversations, user preferences, and incomplete tasks.

Developers typically integrate Hermes through frameworks like LangChain, LlamaIndex, or custom Python orchestration scripts. The agent loop follows a familiar pattern: receive a message, retrieve relevant memory, reason about the appropriate action, execute tools if needed, store new information, and respond.

Telegram: The Interface

Telegram might seem like an unconventional choice for an AI interface, but it is quietly becoming one of the most popular front ends for personal AI agents. With over 900 million monthly active users, Telegram offers a bot API that is free, well-documented, and remarkably powerful.

A Telegram bot can receive text, images, files, and voice messages. It supports inline keyboards, scheduled messages, and group interactions. Most importantly, it is available on every major platform — iOS, Android, Windows, macOS, Linux, and the web — meaning the user can reach their AI assistant from virtually anywhere without installing a custom app.

Setting up a Telegram bot takes minutes. Developers register a bot through Telegram's BotFather, receive an API token, and connect it to a Python listener (typically using the python-telegram-bot library) running on the GCE instance. Every incoming message triggers the Hermes Agent pipeline, and the response flows back through Telegram.

Architecture Walkthrough

Here is how the components connect in practice:

  1. User sends a message via Telegram on their phone or desktop.
  2. Telegram forwards the message to the bot's webhook or long-polling listener on the GCE VM.
  3. The agent pipeline activates: the message is parsed, relevant memory is retrieved from the persistent disk, and the Hermes model processes the input with full context.
  4. Tool execution occurs if needed — the agent might query a weather API, search the web, check a calendar, or write to a database.
  5. Memory is updated: new facts, task states, and conversation history are written to the local store.
  6. The response is sent back through the Telegram API to the user.

This loop runs continuously, and because the GCE instance never shuts down, the agent is always listening. Scheduled cron jobs can also trigger proactive behaviors — daily summaries, reminders, or data monitoring alerts — without any user input.

Key Implementation Considerations

Memory Management

Persistent memory is the feature that separates this architecture from typical chatbots. Developers have several options: a simple SQLite database on the persistent disk, a local ChromaDB or FAISS instance for vector-based semantic retrieval, or even a managed solution like Google Cloud Firestore for distributed state.

The choice depends on scale. For a personal assistant serving one user, a local SQLite database with a few hundred megabytes of vector embeddings is more than sufficient. For multi-user deployments, a managed database makes more sense.

Security and Secrets

Running an always-on agent means API keys, tokens, and credentials must be handled carefully. Google Cloud Secret Manager integrates natively with GCE and provides a secure way to store and rotate sensitive values. Telegram bot tokens, LLM API keys, and third-party service credentials should never be hardcoded.

Additionally, developers should restrict the Telegram bot to authorized user IDs only. A simple allowlist in the message handler prevents unauthorized users from interacting with the agent.

Cost Optimization

The beauty of this stack is its affordability. A preemptible or spot GCE instance can reduce compute costs by 60-91% compared to on-demand pricing. For agents that do not require GPU inference — those calling external LLM APIs like OpenAI, Anthropic, or Google's Gemini — even the smallest VM tier works well.

If running the Hermes model locally is preferred, a GCE instance with an NVIDIA T4 GPU (roughly $0.35/hour on-demand, or significantly less with committed use discounts) can handle inference for smaller quantized models.

Reliability

To ensure the agent survives crashes, developers should use systemd service units or process managers like supervisord to automatically restart the Python process. Google Cloud's instance uptime SLA of 99.99% for single-instance VMs provides a solid foundation, and startup scripts can re-initialize the agent if the VM itself is restarted.

Real-World Use Cases

This architecture is already powering a range of creative applications:

  • Personal productivity agents that track tasks, summarize emails, and send daily briefings via Telegram.
  • Research assistants that monitor arXiv for new papers matching specific keywords and deliver digests.
  • DevOps monitors that watch server health metrics and alert the user through Telegram with actionable suggestions.
  • Language learning companions that maintain long-running conversational contexts and adapt to the learner's progress over weeks.
  • Trading and market watchers that analyze price movements and notify users of significant events.

The Bigger Picture

This GCE-Hermes-Telegram stack represents a broader trend in the AI industry: the shift from centralized, platform-dependent AI experiences toward personal, self-hosted agents. As open-source models continue to close the gap with proprietary offerings, and as cloud infrastructure becomes ever more affordable, the barriers to building your own persistent AI assistant are dropping fast.

Google Cloud Platform's ecosystem makes this particularly seamless. GCE provides the compute, Persistent Disk provides the memory, Secret Manager provides the security, and Cloud Logging provides observability — all within a single console. Pair that with the Hermes framework's agentic capabilities and Telegram's ubiquitous reach, and you have a production-ready personal AI stack that costs less than a streaming subscription.

What Comes Next

The evolution of this architecture is already underway. Developers are experimenting with multi-agent systems where several Hermes-powered agents collaborate on a single GCE instance, each with specialized roles. Others are integrating voice input through Telegram's voice message API and speech-to-text models, creating a hands-free AI assistant experience.

As Google continues to expand GCP's AI-native services — including Vertex AI integration and TPU access — the ceiling for what a single persistent agent can accomplish will only rise. The always-on, never-forgetting AI assistant is no longer a distant vision. It is a weekend project.