sajjad.ai AI Platform

System Architecture, RAG Pipeline & Multi-Model Integration

sajjad.ai is an enterprise-ready, multi-lingual AI software platform built from scratch. It empowers users to run conversational AI, interact with massive documentation sets via Retrieval-Augmented Generation (RAG), and execute smart content generation workflows in English, Arabic, and Urdu.

Technical Stack

.NET Core 8 Web APIs React 18 (TypeScript) OpenAI API DeepSeek API Ollama Local LLM Orchestration Vector Database (RAG) SQL Server / EF Core Clean Architecture Redis (Caching)

System Architecture

The system follows Clean Architecture principles, dividing concerns into Domain, Application, Infrastructure, and API presentation layers. This enables plug-and-play model updates and simple integrations with cloud databases or local servers.

Key Engineering Solutions

Real-Time SSE Streaming

Problem: Standard HTTP request-response patterns result in long load times and block pages when streaming tokens from OpenAI/DeepSeek.

Solution: Configured .NET Core Controller Web APIs with `Response.ContentType = "text/event-stream"` and serialized structured chunks on-the-fly. This pipes model responses instantly to the React frontend, allowing a smooth typing text animation with minimal latency.

Multi-Model Orchestration Layer

Problem: Interacting with different vendor APIs (OpenAI, DeepSeek, and local Ollama) typically requires different clients, schemas, and payload specifications.

Solution: Architected a unified `ILLMService` interface using standard dependency injection. The system resolves execution targets at runtime based on the client configuration, abstracting chat histories, system prompt styling, and connection handshakes.

Hybrid RAG Chunking & Metadata Injection

Problem: Standard vector databases search documents based on semantic similarity, but often lose the broader context of complex file sheets or PDF indexes.

Solution: Developed a custom C# parser that reads PDF, DOCX, and text datasets. Document streams are parsed with dynamic chunk overlapping (256-token window with 32-token overlap) and injected with file metadata tags (headers, page indices) prior to embedding generation. This significantly improves search precision in the vector search layer.

System Capabilities

Real-Time Chat

Supports multi-session chat states.
Instant responsiveness via Server-Sent Events.
Prompt engineering templates for developer use.

Document Interrogation

Advanced PDF/DOCX text extracting.
High accuracy vector query matching.
Strict context-grounded response answering.

Explore Live

The platform is active and fully functional. You can review its live components directly at:

Launch sajjad.ai Discuss Architecture on LinkedIn