Site Reliability Engineer | Scale a Next-Gen SaaS Platform
Location: Melbourne (Hybrid)
About the job
Salient is proud to be partnering with a fast-growing fintech scale-up that’s tackling one of the world’s most pressing challenges: financial crime. Their AI-powered SaaS platform is already trusted by leading banks and financial institutions across Australia, New Zealand, and beyond.
This is your chance to join an engineering-led culture where reliability, scalability, and security are at the heart of everything.
As the company continues to scale, they are looking for a Site Reliability Engineer (SRE) to ensure the platform is highly available, performant, and resilient as usage grows globally.
The Role
As a Site Reliability Engineer, you’ll play a pivotal role in balancing innovation with stability:
- Reliability First: Define, measure, and deliver against SLIs, SLOs, and SLAs to guarantee platform reliability.
- Incident Management & Response: Build playbooks, lead incident response, and run blameless post-mortems to continuously improve resilience.
- Observability & Monitoring: Implement advanced monitoring, logging, and alerting (CloudWatch, Prometheus, OpenSearch, ELK) to detect and resolve issues proactively.
- Scalability Engineering: Identify bottlenecks and optimise performance so the platform scales seamlessly with demand.
- Infrastructure as Code: Maintain consistency and repeatability by automating infrastructure using Terraform/CloudFormation.
- Resilience & Continuity: Own disaster recovery, backup strategies, and fault-tolerant designs that safeguard customer data.
- Collaboration with Product & Engineering: Work closely with development teams to embed reliability into the software delivery lifecycle.
- Security by Default: Integrate security best practices into infrastructure and operations.
Experience
We’re looking for engineers with a strong SRE mindset and the technical depth to keep mission-critical platforms running smoothly:
- 5+ years in Site Reliability Engineering, Platform Engineering, or DevOps within high-scale SaaS environments.
- Proven track record of setting and managing SLIs/SLOs/SLAs for production systems.
- Experience in incident management: on-call rotations, root cause analysis, and post-mortem culture.
- Deep AWS knowledge (networking, IAM, autoscaling, resilience patterns) and expertise with Infrastructure as
- Code (Terraform, CloudFormation).
- Strong background in observability tooling: Prometheus, Grafana, CloudWatch, OpenSearch/ELK, or equivalent.
- Proficiency in scripting/automation (Python, Bash, or Go preferred).
- Exposure to resilience engineering: chaos testing, fault injection, and recovery strategies.
- Strong understanding of cloud security and compliance practices.
- A collaborative engineer who thrives on improving systems, processes, and culture — not just fixing incidents.
- Kubernetes or container orchestration, AWS or SRE certifications, or experience embedding SRE practices into software teams.
If you’re passionate about building secure, scalable platforms and want to join a high-growth fintech on the rise, Salient would love to introduce you.
- Make an impact: Build the systems that ensure trust and resilience in financial technology.
- Shape reliability practices: Drive SRE standards, metrics, and culture across the business.
- Career growth: Pathways into technical leadership and ownership of reliability domains.
- Hybrid flexibility: Work from Melbourne HQ and remotely, in a setup that fits your lifestyle.
- Startup energy, enterprise reach: Enjoy autonomy, speed, and impact while working with leading banks across the region.
📩 Apply today and let’s explore how your expertise in Site Reliability Engineering can shape the future of financial technology.
Interested in this role? Reach out to danny@salientgroup.com.au.