Systems Security & AI Infrastructure

Thanos Ariyanayagam

Software Engineer & Tech Lead @ Google

Architecting planet-scale systems at the intersection of virtualization and AI. I build the high-performance, secure infrastructure that powers Google’s most critical workloads, including Gemini and Waymo.

About Me

I engineer the foundational infrastructure that powers the modern AI era. As a Tech Lead at Google, I specialize in designing planet-scale systems where uncompromising security meets extreme performance. My work directly enables the reliable, high-speed execution of Google’s most critical workloads, including Gemini and Waymo.

My expertise lies at the bleeding edge of virtualization, systems programming, and high-performance computing. Whether I'm architecting a next-generation C++ node runtime that saves tens of thousands of engineering hours, leading fleet-wide isolation initiatives, or crafting patent-pending compiler extensions for 3D deep learning hardware, I thrive on translating highly ambiguous architectural challenges into robust, scalable realities.

I don't just write code; I build resilient engines for scale. Armed with deep expertise in modern C++, distributed systems, and low-level hardware-software co-design, my drive is to push the boundaries of what's possible with today’s hardware to unlock tomorrow's technological leaps.

Experience

Google

Aug 2024 — Present

Software Engineer III & Tech Lead

Nov 2025 — Present

Leading node-level runtime architecture across AI/ML Infrastructure and virtualization frameworks. Key focus areas:

  • Borg Isolation Commitment: Fortified ecosystem security by leading the transparent migration of untrusted workloads into isolated VMs, rigorously balancing security with system efficiency.
  • TI-VM Virtualized Runtime: Architecting and optimizing the future of virtualized execution environments across the fleet.
  • Confidential AI/ML Infrastructure: Leading node-level runtime architecture for a highly confidential project directly supporting Google's rapid AI growth.
  • AI Compute Fungibility: Maximized global fleet capacity by engineering seamless compute fungibility (CPU/GPU/TPU) for high-demand ML workloads, unlocking scale for Gemini.

Software Engineer II

Aug 2024 — Nov 2025
  • Identified and architected significant performance improvements for virtualized workloads across Google's fleet, leading to infrastructure cost savings equivalent to 56.7+ SWE years.
  • Designed and implemented a novel VM Package Hotplugging system to seamlessly plug and unplug NVMe devices from VMs without relying on passthrough systems or in-VM dependencies, unlocking native host-level performance with the absolute minimum in-VM overhead and resolving complex filesystem bottlenecks.
  • Led the architectural design and C++ implementation of the VM resource management parity initiative, enabling seamless telemetry extraction and in-place VM updates to execute complex, untrusted ML workloads at scale.

Software Engineering Intern

Google
May 2023 — Aug 2023

Engineered a high-throughput, multithread-safe C++ API in Borglet utilizing lock-free programming to minimize contention. Designed a novel hierarchical data-sharing solution for modularizing fleet-wide autonomous services within the Borg cluster manager.

Software Engineer Intern

Intel Corporation
Sept 2022 — Apr 2023

Authored a patent-pending C++ compiler extension for 3D deep learning models on proprietary hardware. Boosted ML inference throughput by 11% via advanced compiler optimizations (operator fusion, memory layout transformations, mixed-precision INT8).

Software Developer Intern

Google
May 2022 — Aug 2022

Deployed a highly optimized C++/Go ensembling library that boosted a production NLP API's F1 score by 4.28% and reduced annotation errors by 50%. Built a scalable ML experimentation pipeline on GCP.

STEP Intern

Google
May 2021 — Aug 2021

Enhanced C++ infrastructure connecting to Spanner DB for ML anomaly detection, resulting in an 8% increase in bad actor suspensions and a 10% reduction in scan quota.

Projects

OpenStreetMaps GIS

C++

Achieved a >56% runtime improvement in a custom GIS by architecting a concurrent design utilizing thread pools to manage asynchronous tasks, parallelizing data parsing and pathfinding heuristics.

NEPIADA Drone Swarms

Python / RL

Designed novel multi-agent Reinforcement Learning algorithms (DQN, PPO) that outperformed state-of-the-art methods in adversarial, partial-information environments simulating drone swarm behavior.

AI Reversi Engine

C

Developed a top-5% ranked Reversi AI in C. Implemented an aggressively optimized Minimax algorithm featuring alpha-beta pruning, transposition tables (memoization), and advanced move ordering to drastically reduce search space under tight constraints.

Technical Arsenal

Languages

  • Modern C++ (17/20/23)
  • Python
  • Go
  • C
  • SQL & Bash

Infrastructure

  • Borg & Kubernetes
  • GCP / AWS
  • Docker
  • Spanner / BigQuery
  • gRPC / Protocol Buffers

AI & ML

  • TensorFlow
  • PyTorch
  • RLlib
  • Model Pruning / INT8
  • Scikit-Learn

Core Domains

  • Systems Design
  • Compiler Optimization
  • High-Performance Comp.
  • Multi-threading
  • Lock-Free Programming