Job Title: Senior Site Reliability Engineer (SRE)
Location: Bay Area CA (Onsite / Remote)
Duration: 6+ Month Contract
Job Description:
We are seeking an SRE Engineer focused on Observability, Kubernetes, and Cloud Infrastructure to support our large-scale GCP/AWS/EKS platform. This role is central to improving SLO reliability, logging pipelines, distributed tracing, dashboards, and automated diagnostics across 10,000+ applications running in EKS.
Responsibilities:
Own observability stack: Prometheus/Grafana, OpenTelemetry, Loki/ELK/Splunk, Jaeger, Alertmanager, SLO frameworks.
Build intelligent monitoring pipelines and ensure high reliability of metric ingestion, log ingestion, tracing, and analytics systems.
Develop Terraform modules for observability infrastructure, K8s components, cluster add-ons, and monitoring services. Improve reliability of AWS/GCP/EKS clusters through automation, performance tuning, capacity modeling, and event-driven remediation.
Build AI-assisted diagnostics for anomaly detection, auto-alert tuning, automated playbooks, and noise reduction.
Partner with Platform Engineering to ensure Istio/service mesh telemetry, API server health, and node-level insights.
Lead operational readiness, SLO reporting, incident management, and root cause analysis for platform outages.
Qualifications:
6-8 years in SRE, Infrastructure, or Kubernetes operations. Strong knowledge of EKS/ECS/GKE, Kubernetes internals, and cluster operations. Expertise in observability stacks (Prometheus, OTel, Grafana, ELK, Datadog, Splunk). Advanced Terraform IaC and automation skills (Python/Go preferred). Experience with CI/CD, cloud networking, service mesh (Istio), and capacity planning.
...communicator, highly organized multitasker, and detail-oriented accounting professional. This position also supports fixed assets,... ...the accounting team to promote accuracy, accountability, and productivity. Required Skills & Qualifications ~5+ years of professional...
...miles southeast of Tok. Production is expected to begin in 2024. Job Summary Responsible for carrying out a full range of engineering functions including, but not limited to: development of an accurate mineral model, generation of mineable pit designs, analyzing...
10040 - Executive Principal, Site-Reliability Engineering (SRE) - DevOps Location: Irvine, CA 92614 (5 days on-site) Company Overview Hyundai AutoEver America (HAEA) is the dynamic IT powerhouse behind Hyundai Motor Corporation, a Fortune 500 global leader in the...
...responsible for the installation and repairs of wiring, fixtures, and equipment for all electrical services aboard vessels, facilities, and shipyard facilities. The marine electrician will be responsible for the diagnosis and repair of 3 phase power generation systems...
...breakthrough treatments reach the people who need them. Hybrid Pharmaceutical Sales Representative Hybrid-Virtual & Field Regionally Based We are... ...with customers and Client contacts. Education and Experience Required: Bachelors degree from an accredited...