Senior Storage Systems Engineer
Crusoe
Software Engineering
San Francisco, CA, USA
USD 148,500-161k / year + Equity
Location
San Francisco, CA - US
Employment Type
Full time
Location Type
On-site
Department
Cloud Engineering
Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
About the Role:
At Crusoe, we are on a mission to align the future of computing with the future of the climate. As a Senior Storage Systems Engineer, you will be the primary operator of our high-performance data layer. This role focuses on the availability, scaling, and operational excellence of our all-flash storage ecosystems—specifically VAST Data or Pure Storage, ensuring they deliver the sub-millisecond latency required for world-class AI training and HPC workloads.
You will lead the day-to-day administration of our global storage footprint, serving as the subject matter expert for our flash-based platforms. Your work ensures that our sustainable GPU clusters have the reliable, high-throughput data backbone needed to power the AI revolution.
What You'll Be Working On:
Flash Array Administration: Own the end-to-end management of VAST Data (Universal Storage) and Pure Storage (FlashBlade/FlashArray) environments, including initial setup, volume provisioning, and export management.
Performance Monitoring: Proactively monitor VAST and Pure clusters for IOPS, throughput, and latency bottlenecks, ensuring storage performance stays ahead of GPU demand.
Non-Disruptive Operations: Execute software upgrades (Purity//FB, VAST OS), expansion of D-Nodes/C-Nodes, and hardware refreshes with zero downtime for our AI customers.
Data Protection: Manage snapshots, replication policies, and data reduction (deduplication/compression) strategies to optimize TCO while ensuring 100% data durability.
Tier 3 Support: Act as the lead technical point of contact for storage incidents, working directly with VAST and Pure support engineering to resolve complex fabric or metadata issues.
Integration & Automation: Use APIs (REST, Python) to automate provisioning and integrate storage health metrics into our centralized observability stack (Grafana/Prometheus).
What You'll Bring to the Team:
Technical Experience: 5–8+ years of experience in Storage Administration, with at least 3+ years of hands-on experience managing VAST Data or Pure Storage in a production environment.
Protocol Expertise: Deep understanding of NFS over RDMA, SMB, and NVMe-oF, and how they are implemented within VAST and Pure architectures.
Linux Systems Mastery: Strong command of the Linux CLI, specifically for mounting, tuning, and troubleshooting high-performance file systems.
Network Awareness: Understanding of how storage interacts with InfiniBand and RoCE fabrics to ensure low-latency data delivery to GPU nodes.
Scripting Skills: Proficiency in Python, Bash, or similar for automating volume creation, quota management, and reporting via storage APIs.
Operational Discipline: A meticulous approach to capacity planning and documentation, ensuring the environment remains stable as we add petabytes of scale.
Bonus Points
Experience with Pure1 or VAST VMS/Insight for predictive analytics and capacity forecasting.
Familiarity with Slurm or Kubernetes (CSI) integration with high-performance storage.
Prior experience in a "Large Scale" environment (multi-petabyte footprints).
Benefits:
Competitive compensation and equity packages
Restricted Stock Units
Paid time off, paid holidays & leave of absence programs
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off
Global travel insurance & emergency assistance
Daily meals allowance
Additional perks & programs specific to location
Compensation Range
Compensation will be paid in the range of up to $148,500 - $161,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.