Lead DevOps/SRE - Senior Platform/SRE Engineer Job at ZENDA, LLC, New York, NY

OVRFL0RWNnBCcG14Y29jc2hnZWNneTQvNlE9PQ==
  • ZENDA, LLC
  • New York, NY

Job Description

About the Role:

We are seeking an experienced DevOps/SRE to lead our cloud infrastructure strategy for our SaaS platform of a collaborative AI-driven workflow planing and management platform designed for Fortune 1000 clients. 

This hybrid role involves architecting and maintaining robust, scalable cloud environments, implementing Infrastructure-as-Code, building automated CI/CD pipelines, and developing essential developer tooling. Additionally, you will be responsible for proactive logging and monitoring, as well as shoring up our environment to ensure top-tier security and compliance.

Key Responsibilities:

  • Cloud Infrastructure Management: Oversee the design, deployment, and maintenance of AWS-based environments. Manage container orchestration via EKS to support our microservices and event-driven workflows and Ensure the infrastructure is highly available, scalable, and secure.
  • Infrastructure as Code (IaC): Develop and maintain IaC scripts using Terraform and Automate the provisioning and configuration of AWS resources and EKS clusters.
  • Automated CI/CD Pipelines: Build, maintain, and enhance CI/CD pipelines using GitHub Actions.Integrate automated testing and continuous deployment processes tailored to microservices and event-driven triggers.
  • Developer Tooling & Build Scripts: Create and maintain custom tools and build scripts to streamline development processes. Document and support internal tooling and best practices for microservices development.
  • Logging, Monitoring & Incident Management: Design and implement comprehensive logging, monitoring, and alerting systems utilizing AWS managed services (such as CloudWatch) or complementary tools like Prometheus, Grafana, and ELK.
  • Proactively monitor the performance of our event-driven platform to ensure rapid detection and resolution of issues.
  • Lead post-incident reviews and drive continuous improvements.
  • Security and Compliance: Strengthen the security posture of our AWS environment and EKS clusters. Implement security best practices and ensure compliance with SOC2 industry standards. Collaborate with security consultants to perform audits, manage vulnerabilities, and enforce corrective actions.
  • System Reliability and Performance: Develop strategies for system resilience, performance tuning, and capacity planning. Optimize the performance of microservices and event-driven workflows to meet enterprise-grade requirements.
  • Collaboration and Mentorship: Work closely with cross-functional development and design teams to align infrastructure operations with business goals. Mentor junior team members and foster a culture of continuous improvement and innovation.

Required Qualifications:

  • Experience : 7+ years of experience in DevOps, Site Reliability Engineering (SRE), or related roles for large-scale, enterprise-grade applications, with a focus on multi-tenant, multi-user environments.
  • Technical Leadership: Proven track record of technical leadership in guiding and mentoring engineering teams. Strong ability to drive architectural direction, enforce best practices, and inspire continuous improvement across the team.
  • Technical Expertise :
  • Proven expertise in managing AWS cloud infrastructures and container orchestration with EKS 
  • Experience with containerization technologies (Docker).
  • Extensive experience with microservices architectures and event-driven platforms.
  • Proven expertise in designing and maintaining low-latency, high-concurrency infrastructure to support real-time multi-user collaboration . Strong ability to leverage WebSockets to ensure seamless performance and scalability.
  • Expertise with Infrastructure as Code using Terraform
  • Demonstrated ability in building and maintaining automated CI/CD pipelines with GitHub Actions.
  • Strong background in logging and monitoring, including experience with AWS managed services (e.g., CloudWatch) and complementary tools (Prometheus, Grafana, ELK)
  • Proficiency in scripting and automation (Bash, Python, or similar).
  • Experience in Regulated Industries : Proven experience working in regulated industries, understanding compliance requirements and ensuring architecture meets industry-specific regulations.
  • Collaboration Skills : Proven experience working with cross-functional teams in an Agile environment, with a focus on alignment across backend and frontend engineering.

You will play a critical role in ensuring low-latency, high-concurrency infrastructure to support real-time multi-user collaboration, leveraging WebSockets, Micro-services, and event-driven architectures.

Job Tags

Similar Jobs

Nelson Bros Ready Mix

Pneumatic Tanker Driver (R&R Trucking) Job at Nelson Bros Ready Mix

 ...Casual work attire On-the-job training Relaxed atmosphere Lively atmosphere Job Summary: We are seeking a skilled Tanker Driver to join our team. The Tanker Driver will be responsible for transporting various materials safely and efficiently to designated... 

Industrial Metal Supply Co.

Metal Machine Saw Operator Job at Industrial Metal Supply Co.

 ...processes and improve efficiency. Maintain a clean and organized work area, adhering to safety protocols at all times. Utilize...  ...quality for our customers, every time. You will be pulling, packing and wrapping the material while adhering to our Quality System.... 

Insight Global

Certified Pharmacy Technician Job at Insight Global

 ...Apply now! We are hiring Licensed Pharmacy Technicians in Sharonville, OH for a fantastic opportunity with a growing company. We have both Inventory Technician + Dispensing Technician openings. Available Schedules: Mon-Fri 7am-3:30pm EST Mon-Fri 7am-3:30pm... 

Chellecomm Merchant Services

call center trainer Job at Chellecomm Merchant Services

ChelleComm is a rapidly growing merchant services company headquartered in Dallas, TX. We specialize in delivering cutting-edge payment processing solutions to businesses nationwide. Our inside sales operation is the heartbeat of our business, and were seeking a Call ...

Compunnel Inc.

Client Onboarding/KYC Specialist Job at Compunnel Inc.

 ...Laundering documentation and clarify requirements Execute tasks as per supervisor instructions Requirements: ~ Knowledge of KYC and AML Compliance policies in a U.S. banking environment ~ Familiarity with the Client Onboarding Process and KYC/AML...