High Performance Computing (HPC) Engineer
Company: GenBio AI
Location: Palo Alto
Posted on: February 18, 2026
|
|
|
Job Description:
Job Description Job Description Headquartered in Silicon Valley,
we are a newly established start-up, where a collective of
visionary scientists, engineers, and entrepreneurs are dedicated to
transforming the landscape of biology and medicine through the
power of Generative AI. Our team comprises leading minds and
innovators in AI and Biological Science, pushing the boundaries of
what is possible. We are dreamers who reimagine a new paradigm for
biology and medicine. We are committed to decoding biology
holistically and enabling the next generation of life-transforming
solutions. As the first mover in pan-modal Large Biological Models
(LBM), we are pioneering a new era of biomedicine, with our LBM
training leading to ground-breaking advancements and a
transformative approach to healthcare. Our exceptionally strong
R&D team and leadership in LLM and generative AI position us at
the forefront of this revolutionary field. With headquarters in
Silicon Valley, California, and a branch office in Paris, we are
poised to make a global impact. Join us as we embark on this
journey to redefine the future of biology and medicine through the
transformative power of Generative AI. Job Description GPU Cluster
Management: Design, deploy, and maintain high-performance GPU
clusters, ensuring their stability, reliability, and scalability.
Monitor and manage cluster resources to maximize utilization and
efficiency. Distributed/Parallel Training: Implement distributed
computing techniques to enable parallel training of large deep
learning models across multiple GPUs and nodes. Optimize data
distribution and synchronization to achieve faster convergence and
reduced training times. Performance Optimization: Fine-tune GPU
clusters and deep learning frameworks to achieve optimal
performance for specific workloads. Identify and resolve
performance bottlenecks through profiling and system analysis. Deep
Learning Framework Integration: Collaborate with data scientists
and machine learning engineers to integrate distributed training
capabilities into GenBio AI’s model development and deployment
frameworks. Scalability and Resource Management: Ensure that the
GPU clusters can scale effectively to handle increasing
computational demands. Develop resource management strategies to
prioritize and allocate computing resources based on project
requirements. Troubleshooting and Support: Troubleshoot and resolve
issues related to GPU clusters, distributed training, and
performance anomalies. Provide technical support to users and
resolve technical challenges efficiently. Documentation: Create and
maintain documentation related to GPU cluster configuration,
distributed training workflows, and best practices to ensure
knowledge sharing and seamless onboarding of new team members. Job
Requirements: Master’s or Ph.D. degree in computer science, or a
related field with a focus on High-Performance Computing,
Distributed Systems, or Deep Learning. 2 years proven experience in
managing GPU clusters, including installation, configuration, and
optimization. Strong expertise in distributed deep learning and
parallel training techniques. Proficiency in popular deep learning
frameworks like PyTorch, Megatron-LM, DeepSpeed, etc. Programming
skills in Python and experience with GPU-accelerated libraries
(e.g., CUDA, cuDNN). Knowledge of performance profiling and
optimization tools for HPC and deep learning. Familiarity with
resource management and scheduling systems (e.g., SLURM,
Kubernetes) Strong background in distributed systems, cloud
computing (AWS, GCP), and containerization (Docker, Kubernetes)
Join us as we embark on this journey to redefine the future of
biology and medicine. We are an equal opportunity employer. We
celebrate diversity and are committed to creating an inclusive
environment for all employees. GenBio AI participates in the U.S.
Department of Homeland Security’s E-Verify program to confirm the
employment eligibility of all newly hired employees. For more
information on E-Verify, please visit www.e-verify.gov. We may use
artificial intelligence (AI) tools to support parts of the hiring
process, such as reviewing applications, analyzing resumes, or
assessing responses. These tools assist our recruitment team but do
not replace human judgment. Final hiring decisions are ultimately
made by humans. If you would like more information about how your
data is processed, please contact us.
Keywords: GenBio AI, Sacramento , High Performance Computing (HPC) Engineer, IT / Software / Systems , Palo Alto, California