Building a Personal YouTube Summarizer with n8n, Telegram, and Azure AI

The Problem

We all have those moments: someone shares a 45-minute YouTube video, and you want to know if it's worth your time. I wanted a quick way to get summaries of YouTube videos without manually watching them or copy-pasting URLs into various tools.

The solution? A personal Telegram bot that instantly summarizes any YouTube video I send it.

What I Built

I created an n8n workflow that:

Receives YouTube links via a Telegram bot
Extracts video transcripts using a custom Flask API
Sends the transcript to Azure OpenAI (GPT-4o-mini) for summarization
Returns a concise 3-5 point summary back to Telegram

Important: I added authentication to ensure only I can use the bot—this is a personal learning project, not a public service, and I wanted to avoid unexpected API costs.

The Architecture

Components

Telegram Bot: The user interface. I send YouTube links here and receive summaries.

n8n Workflow: The orchestration layer connecting all services.

Custom YouTube Transcript API: A Flask API using yt-dlp to extract video transcripts, running in Docker.

Azure OpenAI: GPT-4o-mini model for generating intelligent summaries.

The Workflow Flow

Here's how the workflow operates:

Webhook Trigger: Receives POST requests from Telegram when I send a message
Authentication Check: Validates that the message comes from my Telegram chat ID
Video ID Extraction: JavaScript code extracts the YouTube video ID from various URL formats
Transcript Fetching: HTTP request to my local transcript API at http://youtube-transcript-api:5000/transcript
AI Summarization: Azure OpenAI processes the transcript and generates 3-5 key points
Telegram Reply: Sends the formatted summary back to me
Webhook Response: Acknowledges receipt to Telegram

Key Implementation Details

Authentication

The most critical part was limiting access to just me. I used an n8n "If" node that checks two conditions:

Message text exists (it's not an image or sticker)
Chat ID equals my personal Telegram ID

// Conditions in n8n
$json.body.message.text (exists)
$json.body.message.chat.id (equals YOUR_CHAT_ID)

If either condition fails, the workflow stops—no transcript is fetched, no API calls are made, no costs incurred.

Handling YouTube URLs

YouTube URLs come in many formats: youtube.com/watch?v=, youtu.be/, embeds, etc. I wrote a JavaScript code node to extract video IDs reliably:

const url = $input.first().json.body.message.text;
const videoIdMatch = url.match(/(?:youtu\.be\/|youtube\.com\/watch\?v=|youtube\.com\/embed\/|youtube\.com\/v\/)([a-zA-Z0-9_-]{11})/);
const videoId = videoIdMatch ? videoIdMatch[1] : null;

The Transcript API

I built a simple Flask API that wraps yt-dlp for transcript extraction. It runs in Docker and connects to n8n via a shared Docker network:

networks:
  n8n_n8n-network:
    external: true

This allows n8n to call it at http://youtube-transcript-api:5000 without exposing ports externally.

Pro tip: YouTube occasionally shows "Sign in to confirm you're not a bot" errors. The solution? Export your browser cookies using a browser extension and mount them into the Docker container.

Azure OpenAI Integration

I used n8n's Azure OpenAI Chat Model node with GPT-4o-mini. The prompt is straightforward:

System: You are a helpful assistant that summarizes YouTube videos.
User: Summarize this YouTube transcript in 3-5 key points. Transcript:
{{ $json.full_text }}

The model responds with a clean, formatted summary that gets sent directly to Telegram.

What I Learned

n8n is Powerful but Requires Attention to Detail

Referencing data between nodes using expressions like ={{ $('Webhook Trigger').item.json.body.message.chat.id }} takes practice
The visual workflow makes debugging much easier than code-only solutions

Docker Networking Matters

Getting the transcript API and n8n to communicate required understanding Docker networks. Once I added them to the same network, everything clicked.

Authentication is Essential

Even for personal projects, adding a simple chat ID check prevents accidental usage and runaway costs.

AI Models are Surprisingly Good at Summarization

GPT-4o-mini consistently produces high-quality summaries that capture the essence of videos without hallucinating content.

Running Costs

This is effectively free for my usage:

n8n: Self-hosted, no cost
Azure OpenAI: Pay-per-token
Telegram: Free
Docker: Local resources

With authentication in place, I control exactly when the AI is called.

Future Improvements

Some ideas I'm considering:

Add support for videos without transcripts using Whisper
Store summaries in a database for future reference
Add commands like /summary vs /detailed for different summary lengths
Support playlist summarization

Try It Yourself

The full code is available on GitHub. You'll need:

n8n instance (Docker recommended)
Telegram bot token
Azure OpenAI API key
Docker for the transcript API

Import the workflow JSON, set up your credentials, update the chat ID to yours, and you're good to go!

Want a quick overview of this project? Check it out in my portfolio →

Conclusion

This project was an excellent way to dive deeper into n8n's capabilities and learn about integrating various APIs and AI models. The real win was building something genuinely useful for my daily workflow—I now use this bot multiple times a week.

If you're learning n8n or exploring AI integrations, I highly recommend building something similar. The hands-on experience of connecting real services is invaluable.

This is a demo project built for learning purposes. The authentication ensures it remains a personal tool rather than a public service.