About the Role

VectorGrid is a Series A climate-tech company on a mission to accelerate Europe's energy transition. Their real-time grid optimization platform helps energy traders and network operators balance supply and demand across renewable-heavy grids — reducing curtailment, cutting costs, and keeping the lights on.

Backed by Northzone and EQT Ventures, VectorGrid operates across the Netherlands, Germany, and the Nordics, with plans to expand into the UK and Iberian markets by 2027. The engineering team is 22 strong, and this role reports directly to the CTO.

As Lead AI Systems Engineer, you'll sit at the intersection of machine learning and infrastructure — architecting the systems that take forecasting models from Jupyter notebooks to production services making real-time decisions on critical energy infrastructure. This is not a research role. You'll be building production-grade ML pipelines that must be fast, reliable, and auditable.

The stack: Python and Rust services on AWS (EKS), real-time data ingestion via Apache Flink, model serving through a custom inference layer built on ONNX Runtime, and time-series storage in TimescaleDB. You'll inherit solid foundations and a team of three ML engineers who need technical leadership, not micromanagement.

This is a hybrid role based at VectorGrid's Amsterdam Zuidas office, with the team in two to three days per week. The company offers a competitive base, equity with a clear liquidity path, a €3,000 annual learning budget, and relocation support for candidates outside the Netherlands.

Responsibilities

Architect and maintain the end-to-end ML platform — from feature engineering and model training pipelines through to real-time inference serving at sub-200ms latency
Design high-availability deployment patterns for AI services that operate on critical energy infrastructure — these systems cannot go down during peak grid events
Build and improve the real-time data ingestion layer (Apache Flink) processing 500,000+ grid telemetry events per second from SCADA systems and smart meters
Collaborate with the ML research team to productionise forecasting models — translating prototype notebooks into versioned, monitored, and rollback-capable services
Establish and enforce MLOps best practices — model versioning, A/B testing, drift detection, automated retraining triggers, and lineage tracking
Lead the technical direction of the AI infrastructure squad (3 ML engineers), setting standards through architecture reviews, pairing, and written technical guidance
Work closely with energy domain experts to ensure model outputs are interpretable and auditable for regulatory compliance across European energy markets
Own observability and performance monitoring for all AI services — defining SLIs/SLOs, building dashboards, and driving down inference costs

Requirements

Essential

7+ years in software engineering with at least 3 years focused on ML infrastructure, MLOps, or AI platform engineering
Strong production experience with Python — you write clean, tested, well-structured Python, not just scripts and notebooks
Hands-on experience deploying and operating ML models in real-time production environments (not just batch inference)
Deep understanding of model serving frameworks — ONNX Runtime, TensorFlow Serving, Triton, or similar
Experience with stream processing systems (Apache Flink, Kafka Streams, or Apache Beam) for real-time feature computation
Comfortable with cloud infrastructure on AWS or GCP — Kubernetes, Terraform, CI/CD pipelines
Proven ability to lead a small technical team — setting direction, unblocking people, and raising the bar through code review and mentorship
Excellent communication skills in English (VectorGrid's working language); Dutch is not required

Nice to Have

Experience with Rust for performance-critical systems or data processing
Background in energy, utilities, or other critical infrastructure domains
Familiarity with time-series databases (TimescaleDB, InfluxDB, QuestDB)
Understanding of forecasting techniques — time-series models, probabilistic forecasting, or reinforcement learning for control systems
Contributions to ML infrastructure open-source projects (MLflow, Kubeflow, Feast, etc.)