Site Reliability Engineer
Company: Federal Home Loan Banks Office Of Finance
Location: Reston
Posted on: April 1, 2026
|
|
|
Job Description:
FEDERAL HOME LOAN BANKS OFFICE OF FINANCE POSITION DESCRIPTION
POSITION: Site Reliability Engineer DATE: December 2025 DEPARTMENT:
Information Technology FLSA: Exempt REPORTS TO: Senior Manager,
Data & Platform Engineering LOCATION: Reston, VA SUMMARY OF
POSITION The Site Reliability Engineer (i.e., “SRE”) role is
responsible for the optimization and reliability of core technical
platforms and platform services, and exerting significant technical
leadership in the continuous improvement of service reliability to
platform stakeholders. The SRE will champion the overall health of
OF core technical platforms, lead the response to operational
incidents, determine root causes, propose and implement
remediations that ensure overall platform viability. OF IT
platforms and infrastructure exist over three locations (i.e.,
“on-premise”), including, Office Headquarters (Reston, VA), Primary
Data Center Co-Location (Sterling, VA), and Disaster Recovery Data
Center Co-Location (Chicago, IL), as well as a limited set of
infrastructure services provided by Microsoft Azure (i.e.,
“Azure”). The core technical platform is Red Hat OpenShift, with a
variety of platform services to include, but not limited to, Red
Hat AMQ, HashiCorp Vault, and Keycloak, that are consumed by
various platform stakeholders. This role will span from the
OpenShift platform to services provided by Azure. We’re proud of
the way our teammates have a positive impact on everything we do.
Our employees are committed to and exemplify our Core Values:
Integrity through accountability, consistency, transparency and
trust Agility through adaptability, continuous improvement,
expertise, and flexibility Partnership through collaboration,
communication, leadership, and teamwork Inclusivity through
diversity, relationships, respect, and support PRINCIPAL
RESPONSIBILITIES Maintain overall health and reliability of core
technical platforms and platform services to ensure business
continuity and high availability. Maintain and improve the
end-to-end observability of the platform, to ensure that platform
state is at all times understood in context with supporting
information and data that can be quickly marshalled into action.
Lead incident response, root-cause analysis, and postmortems that
advance the overall health of the system and prevent or diminish
reoccurrence of platform issues. Partner with development teams to
troubleshoot platform issues, to include deployment, routing, and
configuration challenges. Build and maintain automated deployment
pipelines that support engineering, development and data teams.
Write, test, and deploy solutions that reduce unneeded human
intervention and improve quality. Lead the delivery of new platform
features, services, and capabilities. Prioritize, deliver, and
operate new platform capabilities products and services. Develop
and maintain accurate and up-to-date documentation, including but
not limited to operational procedures, deployment plans, incident
response plans. Participate in on-call rotation. Assist with other
job duties as assigned. PRINCIPAL JOB REQUIREMENTS Bachelor's
degree in computer science or related field, or equivalent
experience. Minimum of 5-7 years of experience in a Site
Reliability Engineering and/or Platform Engineering role, with
progressively increasing scope of responsibility. Extensive
hands-on experience and knowledge of the following technologies:
Red Hat OpenShift, inclusive of operators, routing/ingress, and
cluster management Azure cloud services and solutions Messaging
platforms like AMQ, Kafka, Reddis HashiCorp Vault Scripting
languages like Bash, Python, Go, PowerShell Observability tools
like Datadog, Grafana, Prometheus Strong scripting and automation
skills in Bash, Python. Strong prior experience with observability
tools and connecting trends, incidents and alerts with actions.
Prior experience troubleshooting complex production issues using
logs, metrics, traces, packet captures, and Kubernetes debugging
tools. Prior experience working in a heavily audited environment is
preferred, with focus on mitigating risks and ensuring compliance
with policies and procedures. Knowledge of enterprise-level
technologies and concepts. Ability to multi-task in a dynamic
environment while continuing to progress on longer term projects.
Ability to communicate well, both orally and in writing, including
producing thorough documentation of all work. Ability to conduct
independent technical research and share results with management
and/or peers. Ability to listen and integrate ideas from different
views, build and maintain respectful relationships, collaborate
with others, and resolve conflicts constructively. Proof of
eligibility to work in the United States. EQUAL EMPLOYMENT
OPPORTUNITY: The Federal Home Loan Banks Office of Finance is
committed to equal employment opportunity without regard to race
(including traits historically associated with race, such as hair
texture, hair type and protective hairstyles), color, religion,
sex, pregnancy (including childbirth, lactation, and related
medical conditions), national origin or ancestry, ethnic origin,
age, physical or mental disability, veteran status, uniformed
service member status, military status, sexual orientation, gender
identity, status as a parent, marital status, genetic information
(including testing and characteristics), citizenship or immigration
status, or any other characteristic protected by applicable
federal, state, or local law .
Keywords: Federal Home Loan Banks Office Of Finance, Reston , Site Reliability Engineer, IT / Software / Systems , Reston, Virginia