We launched Long Journey IV. Click here to read all about it!

Skip to Content
Long Journey Ventures
Portfolio
Our Values
Team
Jobs
Stories
Say hello
Long Journey Ventures
Portfolio
Our Values
Team
Jobs
Stories
Say hello
Portfolio
Our Values
Team
Jobs
Stories
Say hello
hero

Embark on a New Journey

Discover career opportunities within our portfolio of magically weird companies
companies
Jobs
Search 
jobs
Explore 
companies
Join talent network
Talent
My job alerts

Major Incident Manager

Crusoe

Crusoe

San Francisco, CA, USA
Posted on Mar 6, 2026
Apply now

The Incident Manager role is critical to maintaining service reliability and preserving customer trust. This position directly impacts company success by minimizing downtime, managing high-severity incidents, and ensuring rapid resolution of complex technical challenges. You will lead the response to high-visibility incidents and customer escalations, acting as a central point of coordination to drive timely, effective outcomes.

In this role, you’ll spearhead the management of critical incidents from identification through resolution, while continuously improving incident response processes and support readiness. You’ll work cross-functionally with engineering, product, and customer teams to design scalable self-service support workflows, contribute to product improvements, and develop robust incident response strategies. You’ll also play a key role in mentoring team members, delivering training, and building knowledge resources that strengthen both internal teams and customer success.

We’re looking for a technically skilled professional with strong Linux expertise, excellent communication skills, and 4–5 years of customer-facing experience. Prior experience in incident management and on-call rotations is essential.

What You’ll Be Working On

Troubleshoot & Resolve

  • Diagnose and resolve complex technical issues related to InfiniBand, containerization, and distributed training environments
  • Lead high-severity incident response efforts to ensure rapid mitigation and minimal disruption to customer operations
  • Manage customer escalations with professionalism, clarity, and urgency, ensuring stakeholder confidence throughout the incident lifecycle

Implement & Optimize

  • Guide customers through the implementation, configuration, and optimization of HPC infrastructure
  • Partner with customers to improve performance, scalability, and efficiency across their environments

Educate & Empower

  • Develop and deliver internal and external training materials, including live training sessions, documentation, and knowledge base articles
  • Provide ongoing enablement to help customers effectively adopt and maximize the value of company solutions
  • Lead incident response training and preparedness initiatives for internal teams

Collaborate Internally

  • Work closely with engineering and product teams to share customer feedback and operational insights
  • Influence product enhancements and reliability improvements based on real-world incident data
  • Contribute to the continuous improvement of incident management processes and the overall customer experience

What You’ll Bring to the Team

Technical Proficiency

  • Strong hands-on experience with Linux, virtualization, Kubernetes, and managing customer incidents
  • Solid understanding of the TCP/IP stack
  • Working knowledge of Infrastructure-as-Code (IaC) practices

Essential Skills

  • Excellent written and verbal communication skills, with the ability to clearly explain complex technical issues
  • Proven problem-solving mindset with strong diagnostic and analytical abilities
  • 3–5+ years of experience in a team leadership role, serving as a liaison between internal teams and external customers
  • 4–5 years of customer-facing experience in a technical environment
  • Direct experience participating in or leading incident management efforts and on-call rotations

Bonus Skills

  • Programming experience in one or more programming languages

Benefits & Perks

  • Industry-competitive compensation
  • Restricted Stock Units (RSUs) in a fast-growing, well-funded technology company
  • Comprehensive health insurance options, including HDHP and PPO plans, plus vision and dental coverage for you and your dependents
  • Employer contributions to HSA accounts
  • Paid parental leave
  • Company-paid life insurance, short-term disability, and long-term disability coverage
  • Teladoc access
  • 401(k) plan with a 100% company match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal benefits
  • Company-paid Commuter FSA benefit of $200 per month
Apply now
See more open positions at Crusoe
Privacy policyCookie policy
 

Bellwethers welcome.

Say hello

© 2025 Long Journey.