GistClips | Shantanu Bhusari

Summary

A playlist context research platform: users define a learning or research goal, attach it to a YouTube playlist, and GistClips builds a searchable AI knowledge base across all videos in that collection. The result is a chat interface for cross-video research, context-aware insights, and weekly synthesis digests — with quick per-video summaries available as a fallback.

The platform is four services in a Go workspace: a Go Fiber API gateway, a Python ingest worker, a Go notification worker, and a Next.js dashboard. Each service owns exactly the work it is suited for. The boundary between Go and Python is SQS — Go publishes processing jobs; Python consumes them, runs the AI pipeline (LangChain + Google GenAI + vector embeddings), and stores results in MongoDB and a vector database. When a playlist finishes processing, the notification worker picks up the event and delivers completion alerts.

What shipped: playlist + context management, async video ingestion via SQS, transcript extraction (youtube-transcript-api / yt-dlp), generalized content extraction as a reusable foundation layer, context-aware insight generation with LangChain and Google Generative AI, vector embeddings for semantic search, Redis bloom filters for deduplication, and a Go notification worker for job completion events.

Architecture Decisions

Why Go + Python instead of Python for everything

The options considered: Python for all services (FastAPI for HTTP + AI workers), Go for all services (no AI/ML ecosystem), Go API gateway + Python workers.

The constraint: LangChain, the Google GenAI client, and every major vector database SDK are Python-first. There are no equivalent Go libraries with the same maturity. Writing AI pipelines in Go would mean reimplementing or wrapping libraries that exist and are maintained in Python.

The decision: Go handles the HTTP API and orchestration. Python handles the AI work. The boundary is SQS — Go publishes a playlist processing job to the queue; the Python ingest worker consumes it, runs transcript extraction, generalized content extraction, and context-aware insight generation, then stores results. A separate Go notification worker listens for completion events and delivers alerts.

The trade-off: Two runtimes, two deployment configurations, two sets of dependencies. The pipeline is harder to trace end-to-end across language boundaries. The notification worker is a third Go service rather than folding completion events into the API — more moving parts, but cleaner separation.

What I'd change: Nothing structural. This is the only defensible architecture for AI/ML work at this point in the ecosystem. The four-service split reflects real boundaries: HTTP concerns, AI processing concerns, and notification concerns are genuinely different workloads.

Why a two-layer content extraction pipeline

The options considered: Generate context-aware insights directly per video, or split into a generalized extraction step first and a context-aware step second.

The constraint: Multiple users can attach different research goals to the same video. If context-aware insight generation is the only layer, the same video must be reprocessed (and re-billed for AI API calls) once per user context.

The decision: Split into two layers. The first pass produces a generalized content extraction for each video — transcripts, topics, key points — stored and reused as a foundation. The second pass takes that foundation and applies the user's specific context to generate aligned insights. A video processed once can serve any number of contexts without re-fetching or re-transcribing.

The trade-off: Two pipeline stages instead of one. More storage for the intermediate generalized layer. The generalized layer must be rich enough to serve diverse contexts — if it's too thin, context-aware insights will be shallow.

What I'd change: Add a cache invalidation strategy for the generalized layer when a video is significantly updated.

Why Redis bloom filters for deduplication

The options considered: Check MongoDB for an existing document before processing (database lookup per submission), keep a Redis set of processed video IDs, Redis bloom filter.

The constraint: The same video can appear in multiple playlists or be resubmitted. Processing it twice wastes AI API calls and compute. The check must be fast — it happens on every ingest job before any work begins.

The decision: Redis bloom filter. A probabilistic data structure that answers "have I seen this before?" in O(1) with no false negatives. If it says unseen, the video is definitely unprocessed.

The trade-off: Bloom filters produce false positives at a configurable rate. A genuinely new video might occasionally be identified as a duplicate and skipped. At a 0.1% false positive rate, this is an acceptable trade-off for the deduplication benefit.

What I'd change: Nothing. The false positive rate is tunable and the cost of occasionally skipping a new video is far lower than the cost of reprocessing every duplicate.

Why SQS decouples submission from processing

The options considered: Process synchronously in the HTTP request, invoke the Python worker directly from Go via gRPC, publish to SQS.

The constraint: Transcript extraction, AI summarisation, and vector embedding generation take seconds to minutes per video. An HTTP request cannot wait that long. A playlist of 20 videos could take 10–20 minutes to process end-to-end.

The decision: The Go API accepts the playlist processing request, publishes to SQS, and returns immediately. The Python ingest worker consumes messages and runs the full pipeline. When processing completes, the Go notification worker delivers completion events to the user.

The trade-off: The user does not get immediate results. The submission → results flow requires a push notification for completion. Adding the notification worker as a separate service keeps the API stateless but adds another deployment unit.

What I'd change: The notification worker currently handles email alerts. A future version would add WebSocket support in the API for real-time dashboard updates, reducing the polling load on the client.