shantanu.dev
All Projects

Client project · Full-Stack · Jan

EdTech ERP Platform

Each institution's data must never touch another's. Row-level multi-tenancy was rejected. Database-per-tenant with JWT-embedded context was the answer.

Node.jsTypeScriptMongoDBDockerJWT

Summary

A B2B SaaS ERP for private coaching institutions — student enrollment, batch management, attendance tracking, fee collection with installment plans, test results, and a lead-to-student sales funnel. Multiple institutions run on the same platform with complete data isolation between them.

The core challenge was multi-tenancy. A student at Institution A must never see Institution B's data, even on a shared platform. The naive approach — a tenant_id column on every table — works but puts the isolation burden on every query. A single missing WHERE tenant_id = ? clause exposes all tenant data. The approach taken was stricter: each institution gets its own MongoDB database. Cross-tenant data access is physically impossible from the application layer.

What shipped: 4 TypeScript microservices (User/Auth, Academic, Student, Lead), database-per-tenant isolation, a custom PRN ID system encoding tenant + role + year into a single ID, a pooled connection manager with LRU eviction, and Docker multi-stage builds with non-root execution.

Architecture Decisions

Why database-per-tenant over shared schema

The options considered: Shared database with tenant_id columns on every table, schema-per-tenant (one schema per institution in the same MongoDB instance), database-per-tenant (one MongoDB database per institution).

The constraint: Data isolation must be guaranteed at the infrastructure level, not the application level. A tenant_id approach means every query must include the correct filter — one missing clause exposes the entire platform's data. A database boundary cannot be bypassed by a query bug.

The decision: Each institution gets its own MongoDB database (DB001, DB002, ...). The connection manager routes each request to the correct database based on the tenant context in the JWT.

The trade-off: Connection management complexity. 100 institutions means up to 100 concurrent database connections. Required building a custom pooled connection manager with LRU eviction and reference counting to keep connections within the process limit.

What I'd change: The database naming convention (DB001, DB002) is opaque. Use human-readable slugs (e.g., institution-name-prod) from the start. Migrating from numeric identifiers to slugs after institutions are live is painful.

Why embed tenant context in the JWT

The options considered: Store only a user ID in the JWT and resolve tenant context from a central registry on every authenticated request, embed the database slug directly in the JWT at login time, pass tenant context as a request header.

The constraint: Every authenticated request needs to know which database to query. A central registry lookup adds a round-trip database query to every API call — effectively doubling the query count for the entire platform.

The decision: The database slug (dbSlug) is embedded in the JWT at login time. The auth middleware extracts it from the token — zero additional queries per request.

The trade-off: If an institution's database is renamed or migrated to a new slug, all existing JWTs become invalid and all users must re-login. Token invalidation is all-or-nothing.

What I'd change: Include a tokenVersion field in the JWT. On database migration, increment the version server-side. Requests with an outdated version are rejected with a 401, triggering a re-login. This limits the disruption to users of the migrated institution only.

Why the PRN ID encodes tenant + role + year

The options considered: UUID per user (random, opaque), auto-increment per table (predictable but requires a DB call to generate), custom encoded ID scheme.

The constraint: The system must identify which tenant a user belongs to from their ID alone, without a database query. A student presenting their ID card at reception should allow the system to immediately route to the correct database — before any authentication.

The decision: PRN format: [DB-number 2 digits][role-code 1 digit][year 2 digits][sequence 7 digits]. Example: 0191240000001 = DB01, student role (9), year 2024, first student in that organisation. The database name is derivable from the ID with a string operation.

The trade-off: The ID format is coupled to the system's internal database naming structure. Changing the database naming scheme requires migrating all existing PRN IDs.

What I'd change: The encoding was the right call. The only improvement would be better inline documentation of the encoding scheme — the logic was clear at the time but required re-reading the code to understand months later.

Why direct function calls between services instead of HTTP

The options considered: HTTP REST calls between services (true microservices), message queue between services, shared function library with direct imports (monorepo pattern).

The constraint: All 4 services are in a TypeScript monorepo with shared types. HTTP calls between services add network overhead, require service discovery, and need error handling for network failures — significant complexity for a small team.

The decision: Services import and call each other's function classes directly as TypeScript imports. No HTTP, no queue, no network overhead within the monorepo.

The trade-off: Services are tightly coupled at the code level. The student service cannot be deployed without the academic service. Independent scaling per service is not possible. This contradicts the isolation principle of microservices architecture.

What I'd change: This was a pragmatic choice for the project scale and timeline, and it worked. The honest lesson: this is "microservices" in name and "modular monolith" in deployment — and that's the right call at this scale. For a larger team or higher scale, HTTP or message-passing would justify the overhead.