Site Reliability Engineer (SRE) Job at IT America Inc, Plano, TX

Y3ZmVVZFT2hteFhHd2VLL3JTei9URUhHdnc9PQ==
  • IT America Inc
  • Plano, TX

Job Description

Position: Site Reliability Engineer (SRE)

Location: Richmond, VA or Plano, TX

Work Model: Hybrid 3 days onsite per week

Duration: Long term contract

Job Summary:

We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large enterprise environment. This role will focus on ensuring high availability, reliability, performance, and scalability of mission-critical applications running on AWS.

Strong Preference: Former Capital One engineers. Candidates must be able to provide verifiable Capital One credentials and be eligible for rehire.

Key Responsibilities:

  • Design, build, and maintain highly reliable, scalable, and resilient systems in AWS
  • Monitor system health, performance, and availability using SRE best practices
  • Implement automation to reduce manual operational work
  • Troubleshoot production incidents and perform root cause analysis (RCA)
  • Develop and maintain scripts and tools to improve system reliability and efficiency
  • Partner with application development, platform, and infrastructure teams
  • Support on-call rotations and incident response as required
  • Enforce operational excellence, security, and compliance standards

Required Skills & Qualifications:

  • Former Capital One experience HIGHLY preferred
  • Must provide credentials for rehire eligibility verification
  • Strong hands-on experience with AWS (EC2, EKS, Lambda, CloudWatch, IAM, etc.)
  • Python scripting experience strongly preferred
  • Bash or Shell scripting experience will also be considered
  • Experience with Linux-based systems and troubleshooting
  • Understanding of SRE concepts: SLIs, SLOs, error budgets, monitoring, and alerting
  • Experience supporting production environments at scale

Preferred Qualifications:

  • Experience with CI/CD pipelines
  • Infrastructure as Code (Terraform, CloudFormation)
  • Containerization and orchestration (Docker, Kubernetes)
  • Observability tools (Prometheus, Grafana, Datadog, CloudWatch)
  • Experience working in highly regulated enterprise environments

Job Tags

Long term contract, 3 days per week,

Similar Jobs