Site Reliability Engineer-Database-DataBase Tools platform
Company: Alibaba Cloud
Location: Sunnyvale
Posted on: June 1, 2025
Job Description:
Alibaba Cloud - Site Reliability Engineer - Database - DataBase
Tools platform - SunnyvaleWe are looking for a Site Reliability
Engineer (SRE) specialized in the database domain to support the
stable operation of Alibaba Cloud's DataBase-Tools platform. This
role combines software and systems engineering to ensure the
reliable operation of Alibaba Cloud's database DataBase-Tools
platform, providing stable DataBase-Tools services to
customers.Responsibilities
- Ensuring System Stability and High Availability: Responsible
for health checks of components within the database foundational
platform, developing maintenance tools for routine inspections,
identifying and resolving potential risks in advance.
- Development of Operations Platforms and Tools: Design and
implement automated operations platforms. Monitor and maintain
various operational metrics, optimizing the system through data
analysis. Participate in solving issues related to capacity,
performance, and stability in production systems.
- Incident Handling and Emergency Response: During major events
like promotional sales, ensure smooth user experience under massive
peak loads while maintaining cost control. Handle live network
issues, including fault diagnosis, disaster recovery, intelligent
scheduling, elastic scaling, and anti-attack measures.
- Close Collaboration with Development Teams: Work closely with
product teams to promptly identify and optimize technical
architectures, improving service response latency and performance,
and enhancing service availability.Position Requirements
- Bachelor's degree in Computer Science, or a related technical
field, or equivalent practical experience.
- 4+ years of work experience in Site Reliability Engineering
within the domain of databases or other cloud products.
- Familiar with the basic principles of the Linux kernel, common
tools and commands, and has good skills in diagnostics and
optimization.
- Proficient in at least one or more of the following languages:
Java, Python, Go, C++, with experience in developing operations and
maintenance tools.
- Familiar with open-source cloud platforms such as Kubernetes,
OpenStack, and CloudFoundry.
- Experience with relational databases like MySQL, SQL Server,
and PostgreSQL, as well as open-source databases and queue products
like Redis, MongoDB, HBase, Cassandra, Kafka, and
Elasticsearch.
- Requires experience in operating large-scale distributed
systems, with proficiency in at least one major cloud
platform.
- Excellent problem-solving and analytical skills.The pay range
for this position at commencement of employment is expected to be
between $133,200/year and $219,600/year. However, base pay offered
may vary depending on multiple individualized factors, including
market location, job-related knowledge, skills, and experience. If
hired, employee will be in an "at-will position" and the Company
reserves the right to modify base salary (as well as any other
discretionary payment or compensation program) at any time,
including for reasons related to individual performance, Company or
individual department/team performance, and market factors.
#J-18808-Ljbffr
Keywords: Alibaba Cloud, Sacramento , Site Reliability Engineer-Database-DataBase Tools platform, IT / Software / Systems , Sunnyvale, California
Didn't find what you're looking for? Search again!
Loading more jobs...