
[VCK] Senior DevOps Engineer AWS / AI Infrastructure
Software Mind
Posted about 17 hours ago
Job Description
About the Project
Software Mind is building a private, tenant-isolated AI assistant for the real estate title and settlement industry. The platform is a retrieval-first (RAG) system that ingests historical email, documents, and structured metadata into a per-tenant vector index, and serves grounded, cited, expert-weighted answers through a chat-style Q&A interface with single sign-on and full audit logging.
The platform is AWS-native with a Python/FastAPI backend, Vue.js frontend, OpenSearch/Pinecone vector store, and OpenAI/Anthropic/Bedrock as LLM provider. You will join a senior, cross-functional LATAM-based team where hands-on AI delivery experience not just familiarity is the baseline expectation.
You stand up and own the cloud infrastructure and CI/CD foundation the entire project runs on. Your work is on the critical path from day one: delivery begins with environment provisioning. You design for tenant isolation, observability, and security from the outset not as an afterthought. This role requires prior experience operating infrastructure for production AI or LLM-based workloads.
Your Responsibilities
Provision and configure a dedicated VPC and segmented cloud environment on AWS
Build the baseline CI/CD pipeline and maintain and evolve it across all delivery phases
Configure and manage the vector store infrastructure (OpenSearch/Pinecone on AWS)
Set up and manage the observability stack: CloudWatch, X-Ray, alerting thresholds, and LLM-specific monitoring
Implement infrastructure-as-code for all environments (dev, staging, production) using Terraform or CDK
Manage secrets, KMS encryption key configuration, and tenant-scoped access controls
Configure LLM provider connectivity (OpenAI / Anthropic / Amazon Bedrock enterprise tier, zero-data-retention)
Define and implement environment promotion strategy aligned with the 2-week sprint cadence
Support incremental ingestion pipeline infrastructure requirements and nightly scheduling
Qualifications
Must-Have Skills & Experience
6+ years in DevOps or cloud infrastructure engineering; strong AWS specialisation required
Infrastructure-as-code: Terraform, CloudFormation, or AWS CDK
CI/CD tooling: GitHub Actions, AWS CodePipeline, or equivalent
Core AWS services: VPC, ECS, Lambda, S3, DynamoDB, API Gateway, Cognito, CloudWatch, X-Ray
Experience designing and operating multi-tenant cloud environments with tenant-level data isolation
AI Experience (Required Not Optional)
At least one project operating infrastructure for a production AI/ML or LLM-integrated system not just general cloud workloads
Experience configuring and managing vector store infrastructure (OpenSearch, Pinecone, Weaviate, or equivalent) in a production environment
Familiarity with LLM provider APIs (OpenAI, Anthropic, or Amazon Bedrock) in a production/enterprise configuration, including zero-data-retention tier setup
Understanding of AI-specific observability concerns: token usage monitoring, latency profiling for LLM calls, and model response logging
Additional Information
Job details
Jobr Assistant extension
Get the extension →