System Architecture, RAG Pipeline & Multi-Model Integration
sajjad.ai is an enterprise-ready, multi-lingual AI software platform built from scratch. It empowers users to run conversational AI, interact with massive documentation sets via Retrieval-Augmented Generation (RAG), and execute smart content generation workflows in English, Arabic, and Urdu.
Technical Stack
.NET Core 8 Web APIsReact 18 (TypeScript)OpenAI APIDeepSeek APIOllama Local LLM OrchestrationVector Database (RAG)SQL Server / EF CoreClean ArchitectureRedis (Caching)
System Architecture
The system follows Clean Architecture principles, dividing concerns into Domain, Application, Infrastructure, and API presentation layers. This enables plug-and-play model updates and simple integrations with cloud databases or local servers.
Key Engineering Solutions
Real-Time SSE Streaming
Problem: Standard HTTP request-response patterns result in long load times and block pages when streaming tokens from OpenAI/DeepSeek.
Solution: Configured .NET Core Controller Web APIs with `Response.ContentType = "text/event-stream"` and serialized structured chunks on-the-fly. This pipes model responses instantly to the React frontend, allowing a smooth typing text animation with minimal latency.
Multi-Model Orchestration Layer
Problem: Interacting with different vendor APIs (OpenAI, DeepSeek, and local Ollama) typically requires different clients, schemas, and payload specifications.
Solution: Architected a unified `ILLMService` interface using standard dependency injection. The system resolves execution targets at runtime based on the client configuration, abstracting chat histories, system prompt styling, and connection handshakes.
Hybrid RAG Chunking & Metadata Injection
Problem: Standard vector databases search documents based on semantic similarity, but often lose the broader context of complex file sheets or PDF indexes.
Solution: Developed a custom C# parser that reads PDF, DOCX, and text datasets. Document streams are parsed with dynamic chunk overlapping (256-token window with 32-token overlap) and injected with file metadata tags (headers, page indices) prior to embedding generation. This significantly improves search precision in the vector search layer.
System Capabilities
Real-Time Chat
Supports multi-session chat states.
Instant responsiveness via Server-Sent Events.
Prompt engineering templates for developer use.
Document Interrogation
Advanced PDF/DOCX text extracting.
High accuracy vector query matching.
Strict context-grounded response answering.
Explore Live
The platform is active and fully functional. You can review its live components directly at: