Building JARVIS: AI Agent for Raspberry Pi

On January 1st I woke up with a simple idea: what if I could talk to Claude through Telegram, and it could actually do things for me—not just answer questions?


What I Actually Built

Here's what JARVIS can do:

  • Respond via voice or text — I send a voice message, it transcribes and responds. I can ask for audio replies too.
  • Remember things across conversations — "Remember I like oat milk" actually persists.
  • Run shell commands — Safe ones, in a sandbox.
  • Publish blog posts — It can write and publish to my blog via API.
  • Update itself — I can push code from my phone via Claude Code, and JARVIS pulls the changes and restarts. No SSH required.

All of this runs on 512MB of RAM, inside a locked-down Docker container with no host access—except for a single trigger file that signals "please update me."


The "Aha" Moment

The magic clicked when I realized Claude's tool-use isn't just for answering questions. It's for taking action.

When I message JARVIS, here's what happens:

Me: "Remember to buy coffee tomorrow"
     ↓
JARVIS receives message via Telegram
     ↓
Claude thinks: "I should use the remember tool"
     ↓
Tool executes: saves "buy coffee" to my todo list
     ↓
JARVIS: "Got it, I'll remind you about coffee"

Claude isn't just generating text. It's deciding which tool to use, calling it, reading the result, and responding. That's the difference between a chatbot and an agent.


For the Curious: How It Works

The core loop:

  1. Message comes in from Telegram
  2. Add it to conversation history (keeps last 50 messages for context)
  3. Send to Claude with a list of available tools
  4. Claude either responds with text OR requests a tool
  5. If tool requested → execute it → send result back to Claude → repeat
  6. Final response goes back to Telegram

The tools JARVIS has:

Tool What it does
bash Runs safe shell commands (ls, cat, date, git—no rm or sudo)
remember / recall Persistent memory across sessions
create_note Publishes markdown to my blog
self_update Pulls latest code from GitHub and restarts
speak Sends audio response via text-to-speech

The stack:

  • Claude TypeScript Agent SDK for tool orchestration
  • Telegram bot via grammY framework
  • Groq for voice transcription (Whisper) and TTS (PlayAI)
  • Docker for isolation
  • Raspberry Pi Zero 2 W as the host (512MB RAM, ARM64)

Project structure:

jarvis/
├── src/
│   ├── index.ts              # Entry point
│   ├── agent/
│   │   ├── agent.ts          # Claude SDK integration
│   │   ├── memory.ts         # Conversation history
│   │   └── tools/            # bash, file, memory, update, speak...
│   ├── telegram/
│   │   ├── bot.ts            # grammY client
│   │   ├── handlers.ts       # Message routing
│   │   └── middleware.ts     # Auth + rate limiting
│   └── security/
│       └── whitelist.ts      # User validation
├── docker/
│   ├── Dockerfile            # Production (ARM64, hardened)
│   └── docker-compose.yml
└── scripts/
    ├── auto-update.sh        # Polls git + watches trigger
    └── jarvis-updater.service # systemd unit

How it all connects:

┌──────────────────────────────────────────────────┐
│          Raspberry Pi Zero 2 W (512MB)           │
│                                                  │
│  ┌─────────────────────────────────────────────┐ │
│  │            Docker Container                 │ │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────────┐  │ │
│  │  │ grammY  │→ │  Agent  │→ │ Claude API  │  │ │
│  │  │Telegram │  │  Loop   │  │ (tools)     │  │ │
│  │  └─────────┘  └─────────┘  └─────────────┘  │ │
│  │       ↓            ↓                        │ │
│  │  ┌─────────────────────────────────────┐    │ │
│  │  │  Tools: bash, memory, speak, etc.  │    │ │
│  │  └─────────────────────────────────────┘    │ │
│  └──────────────────┬──────────────────────────┘ │
│                     │ writes                     │
│                     ▼                            │
│  ┌─────────────────────────────────────────────┐ │
│  │  /var/jarvis/update-trigger                 │ │
│  └─────────────────────────────────────────────┘ │
│                     ▲                            │
│                     │ watches                    │
│  ┌─────────────────────────────────────────────┐ │
│  │  jarvis-updater.service (systemd)           │ │
│  │  → git pull → docker compose restart        │ │
│  └─────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘

The Self-Update Trick

This is my favorite part. The bot runs in a sandboxed container with no host access—so how can it update itself?

The answer: a trigger file.

The container can write to exactly one location on the host: /var/jarvis/update-trigger. Meanwhile, a systemd service on the host watches that file.

Me: "Update yourself"
     ↓
Claude calls self_update() tool
     ↓
Tool writes timestamp to /var/jarvis/update-trigger
     ↓
Host's jarvis-updater.service detects the file change
     ↓
Runs: git pull → npm install → npm build → docker compose restart
     ↓
JARVIS comes back online with the new code

The bot can't execute arbitrary code on the host. It can only say "please update me now." The host decides whether to honor that.

There's also a polling fallback. The updater checks GitHub every 60 seconds anyway:

# In auto-update.sh daemon mode
while true; do
    git fetch origin main
    if [ $(git rev-parse HEAD) != $(git rev-parse origin/main) ]; then
        git pull && docker compose up -d --build
    fi
    sleep $POLL_INTERVAL
done

So I have two paths:

  1. Push to GitHub → Pi polls and auto-deploys within 60 seconds
  2. Tell JARVIS to update → Instant trigger via Telegram

Why this matters: I can push code from my phone using Claude Code iOS, and JARVIS picks it up automatically. No SSH, no manual deploys, no laptop required.


Voice Flow

When I send a voice message:

  1. Telegram sends the audio file
  2. JARVIS downloads it and sends to Groq's Whisper API
  3. Transcribed text goes to Claude
  4. If I said "reply with voice," Claude uses the speak tool
  5. PlayAI generates audio, JARVIS sends it back

I added this feature while my car was driving itself. One voice message to JARVIS: "Add a feature so you can reply with audio when I ask." Worked first try.


Security (Because I'm Not Crazy)

Running an AI with shell access sounds terrifying. Here's how I locked it down:

Access Control:

  • Whitelist only — Only my Telegram ID can talk to it
  • Rate limited — Token bucket: 10 requests, 0.5/sec refill

Sandboxed Execution:

  • Command whitelist — Only these: echo, ls, pwd, cat, head, tail, wc, date, whoami, git
  • Path restricted — File operations limited to /app, /tmp, /app/data
  • No network tools — No curl, wget, or anything that could exfiltrate data

Docker Hardening:

  • Non-root user — Container runs as UID 1001, not root
  • Read-only filesystem — Container can't modify its own code
  • Dropped capabilities — Minimal Linux capabilities
  • Single write point — Only /var/jarvis/update-trigger is writable to host

Audit Trail:

  • Every tool call logged with timestamps
  • Every message logged (encrypted at rest)

The self-update mechanism is the only bridge between container and host, and it's one-way: the bot can request an update, but it can't control what code gets deployed. That always comes from the git remote.


The Memory System

Each user gets their own categorized memory:

/app/data/memory/{userId}/
  ├── todo/items.json
  ├── today/items.json
  ├── memory/items.json
  └── posts/items.json

Example:

Me: "Remember that I prefer dark mode"
→ Saved to memory/items.json with timestamp

Me: "What are my preferences?"
→ Claude calls recall(), gets the data, responds naturally

JARVIS can also move items between categories—"move 'buy coffee' from todo to today."


"Think Hard" Mode

Most messages use Claude Haiku (fast, cheap). But sometimes I need real reasoning:

Me: "think hard: analyze my spending patterns from last month"

The "think hard:" prefix switches to Claude Opus. Same tools, more intelligence.


What's Next

Right now JARVIS is reactive—it waits for me to message it. The next version will be proactive:

  • Break down goals into subtasks
  • Execute multi-step plans without supervision
  • Learn from outcomes and adjust

The loop becomes: Observe → Plan → Act → Reflect → Learn → Repeat

That's when things get really interesting.


Try It Yourself

The full code is on GitHub:

https://github.com/acrosa/alfajor-raspberry

You'll need:

TELEGRAM_BOT_TOKEN=xxx
ANTHROPIC_API_KEY=xxx
TELEGRAM_ALLOWED_USERS=your_telegram_id
# Optional: GROQ_API_KEY for voice
# Optional: COLLECTED_NOTES_API_KEY for blog publishing

Then:

docker compose up

The Takeaway

The gap between "AI that answers questions" and "AI that does things" is smaller than I thought. Claude's tool-use, a Raspberry Pi, and a weekend of hacking got me an agent that:

  • Lives in my pocket (via Telegram)
  • Remembers everything I tell it
  • Can update its own code
  • Runs securely in a sandbox

We're at the very beginning of personal AI agents. This is a proof of concept, but the pattern scales. What would you build if your AI could take action?


Built with Claude, deployed on a Pi, controlled from a Tesla. The future is weird and I'm here for it.

Powered by Collected Notes