We are building a compact, proprietary LLM designed to internalize client-provided knowledge in various formats (links, documents, media, etc.). Our model: * Runs entirely on-premise, ensuring data privacy; * Excels at working with proprietary knowledge; * Reduces RAG limitations (context size, retrieval relevance); * Is cost-efficient due to its small size.
What You Will Do: * Build prototypes and microservices with Python backend and React frontend integration; * Develop and maintain diverse pipelines (Data acquisition pipelines, Data processing, Feature engineering, data organization, generative, training, etc) * Design and implement cloud-based solutions using AWS services (S3, Cognito, Athena, DynamoDB); * Create and optimize SQL queries for data extraction and transformation; * Work with classic and non-classic dialogue systems for conversational AI; * Integrate with the third-party products * Build infrastructure using various protocols (REST, WebSocket, MCP, A2A); * Apply MLOps practices to streamline model development and deployment; * Containerize applications using Docker for consistent deployment across environments.
Requirements: * 2+ years of Python development experience; * Data engineering skills across multiple pipeline types (acquisition, processing, analysis, generation) * Proficiency with SQL and database optimization; * Experience with AWS cloud services (S3, Cognito, Athena, DynamoDB); * Hands-on experience with dialogue systems / conversational AI; * Familiarity with React for frontend development; * Strong understanding of statistical concepts and data quality assessment; * Experience with Docker containerization and Git version control; * Microservices architecture knowledge and implementation experience; * Proactive mindset, strong interest in LLMs, and quick prototyping skills; * Be flexible and fearless with new tools, tech stacks, and unusual challenges.
Nice to Have: * Experience with open-source LLMs (e.g., Mistral, LLaMA, OpenChat); * Advanced background in ML/NLP projects; * Experience with annotation tools, dataset building, and model evaluation workflows; * CI/CD pipeline implementation experience.