Platform Engineer - Infra & Security Team
Atlys
Job description
🎯 Atlys' mission is to enable every person on earth to travel freely.
At Atlys, we believe that the path to creating a more open world is by making it efficient to travel. Travelers cite visas as the most frustrating pain point, and we're starting by automating that completely. We're looking for talented people who are interested in building the future of travel alongside us.
Building technology to increase global movement liquidity will be one of the most exciting developments in decades. If you are curious why the smartest people want to work at Atlys, read this post.
We’re looking for engineers with grit and vision who want to build a modern platform to make travelling efficient and delightful. This role is critical in achieving our goals - expanding coverage to support more destinations worldwide, automating entire processes and reducing support volume to offer a truly self-serve experience. We talk to customers daily, ship code several times a day, and measure every little interaction!
Job requirements
We're hiring a Platform Engineer to join our Infra & Security team. This is a hands-on infrastructure role. You'll work directly on our GKE clusters, CI/CD pipelines, and observability stack to keep our systems reliable and scalable as we process visas across multiple countries.
This isn't a checkbox operations role. This is a hands-on engineering position where you'll be building infrastructure, troubleshooting production issues, and driving reliability improvements across our platform.
The Job
Infrastructure (20%)
Kubernetes Operations: Manage and optimize our GKE clusters - deployments, scaling, resource management, and cluster upgrades
Database Administration: Maintain our self-hosted PostgreSQL instances - performance tuning, backup strategies, storage management, and high availability
Cloud Infrastructure: Manage and improve our GCP stack - networking, IAM, storage, and compute resources
Troubleshooting: Debug production issues across the stack - from container networking to database performance
CI/CD & Observability (30%)
Pipeline Ownership: Build and improve CI/CD pipelines for fast, reliable deployments
Pipeline Security: Implement guardrails in CI/CD - container vulnerability scanning, policy enforcement, and deployment checks
Monitoring Stack: Maintain and improve our Grafana, Loki, and Prometheus setup
Observability Expansion: Enable new capabilities like distributed tracing to improve debugging and performance analysis
Alerting & Visibility: Build meaningful alerts and improve visibility across our stack - this includes making platform-relevant changes within services
Reliability & Security (20%)
DR/BCP: Implement and maintain disaster recovery and business continuity procedures
Migrations: Plan and execute infrastructure migrations - database upgrades, cluster migrations, service moves
Vulnerability Management: Solve for VM and container vulnerabilities across our infrastructure
Internal Tooling & Automation (20%)
Developer Productivity: Create internal tools to improve developer workflows and productivity
Automation: Write scripts in Python/Bash to automate repetitive tasks and improve operational efficiency
Centralized Services: Manage and own centralized services used across engineering teams
General (10%)
Ad-hoc Requests: Support engineering teams with infrastructure-related requests as they come up
Collaboration: Work closely with developers to improve their deployment and debugging workflows
The Ideal Candidate
Must-Have
Passionate about Infrastructure: We're looking for someone who enjoys solving complex infrastructure problems and building reliable systems
Ownership Mindset: High ownership and bias for action - you see a problem, you fix it
1.5-3 years of hands-on experience in DevOps, SRE, or Platform Engineering roles
Some development background: Prior experience writing code (even if brief) - you understand how developers work and can make changes within services when needed
Kubernetes experience: You've worked with clusters, debugged pod issues, and understand container orchestration
GCP familiarity: Hands-on experience with GCP services (GKE, VPC, IAM, Cloud Storage, etc.)
Database fundamentals: Experience with PostgreSQL - queries, performance basics, backups
Networking fundamentals: Understands how networking works - DNS, load balancing, firewalls, VPCs, and troubleshooting connectivity issues
CI/CD understanding: You've built or maintained pipelines and understand DevOps as a culture, not just tooling
Observability experience: Worked with monitoring and logging tools (Prometheus, Grafana, Loki, or similar)
Scripting ability: Comfortable writing Python and Bash scripts to automate tasks
AI-assisted productivity: Comfortable using AI tools to learn faster and get things done
Strong communication: You'll work closely with developers and need to explain infrastructure decisions clearly
Nice-to-Have
Cost optimization exposure: Identified and implemented cloud cost savings
Why This Role is Unique
Real engineering challenges: Self-hosted databases, custom observability stack, Kubernetes at scale - not just clicking buttons in a cloud console
Direct impact: Your work keeps the platform running for thousands of travelers daily
Startup learning curve: Move fast, wear multiple hats, and grow your skills rapidly
Breadth of work: From CI/CD pipelines to database migrations to internal tooling - no two days are the same
Why Atlys?
Build infrastructure that enables global movement - one of the most exciting developments in decades
Fast-paced, high-trust environment with significant ownership
Competitive compensation and startup equity
Smart colleagues who ship fast
or