Job Description

Position: Site Reliability Engineer (SRE)

Location: Richmond, VA or Plano, TX

Work Model: Hybrid 3 days onsite per week

Duration: Long term contract

Job Summary:

We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large enterprise environment. This role will focus on ensuring high availability, reliability, performance, and scalability of mission-critical applications running on AWS.

Strong Preference: Former Capital One engineers. Candidates must be able to provide verifiable Capital One credentials and be eligible for rehire.

Key Responsibilities:

Design, build, and maintain highly reliable, scalable, and resilient systems in AWS
Monitor system health, performance, and availability using SRE best practices
Implement automation to reduce manual operational work
Troubleshoot production incidents and perform root cause analysis (RCA)
Develop and maintain scripts and tools to improve system reliability and efficiency
Partner with application development, platform, and infrastructure teams
Support on-call rotations and incident response as required
Enforce operational excellence, security, and compliance standards

Required Skills & Qualifications:

Former Capital One experience HIGHLY preferred
Must provide credentials for rehire eligibility verification
Strong hands-on experience with AWS (EC2, EKS, Lambda, CloudWatch, IAM, etc.)
Python scripting experience strongly preferred
Bash or Shell scripting experience will also be considered
Experience with Linux-based systems and troubleshooting
Understanding of SRE concepts: SLIs, SLOs, error budgets, monitoring, and alerting
Experience supporting production environments at scale

Preferred Qualifications:

Experience with CI/CD pipelines
Infrastructure as Code (Terraform, CloudFormation)
Containerization and orchestration (Docker, Kubernetes)
Observability tools (Prometheus, Grafana, Datadog, CloudWatch)
Experience working in highly regulated enterprise environments

Job Tags

Long term contract, 3 days per week,

Site Reliability Engineer (SRE) Job at IT America Inc, Plano, TX

Y3ZmVVZFT2hteFhHd2VLL3JTei9URUhHdnc9PQ==

Job Description

Job Tags

Similar Jobs

Plano, TX

Full Time

2026-04-12

2026-05-12