How Sanas achieved 40% faster ML Dev cycle with MLOps and ETL

Erik Gafni

August 10, 2023

Sanas.ai specializes in accent translation and voice enhancement for call centers, aiming to improve communication and understanding between individuals. The company recently secured an investment of $50 million to accelerate its growth and expand its operations.

Introduction

As part of this growth, the AI team at Sanas.ai faced the challenge of scaling beyond its current size of 7 people.

This case study explores how the team, with the assistance of a group of 3 full-time consultants, efficiently scaled their AI operations and continually improved their MLOPs capabilities to meet the demands of a rapidly growing business.

The Challenge

As Sanas.ai gained traction and more clients, the AI team quickly realized the need to scale their operations. They needed to enhance their technology, improve their development processes, and optimize the entire machine learning workflow to handle a higher volume of requests and data.

To address these challenges, the AI team sought external guidance and expertise, and that's when they engaged our team of 3 experienced consultants.

Here is what Eventum has done to help Sanas

Team Expansion and Best Practices

The first step in scaling the AI team was to expand its size beyond the initial 7 members. Eventum collaborated with the existing team to assess their strengths and weaknesses and identified the key areas where additional talent was required.

Eventum recruited talented ML engineers and data scientists to fill the identified gaps in needed skills.

To ensure smooth collaboration and maintain consistency across the growing team, we introduced a set of best practices. The team adopted technologies and tools such as pytorch_lightning, hydra, unit testing, wandb, poetry, linters, formatters, and typecheckers to streamline their development processes.

The adoption of best practices contributed to a more efficient and reliable development workflow, reducing errors and minimizing the time spent on debugging.

Infrastructure and CI/CD Improvements

To meet the increasing computational demands, the AI team needed to enhance their infrastructure and optimize their CI/CD pipeline.

Our consultants recommended setting up CI/CD on GPUs and Windows machines which reflected the production environment to speed up the development and testing process. This allowed the team to validate changes faster and catch potential issues early in the development cycle.

Eventum also worked with Sanas.ai to adopt an autoscaling Kubernetes cluster to handle the dynamic workload of machine learning tasks. This elastic infrastructure enabled the team to automatically scale resources based on demand, ensuring efficient resource utilization during peak periods.

Cost Optimisation and Cloud Migration

With our teams’ assistance, Sanas.ai implemented cost-saving strategies to make their AI operations more economically sustainable.

The team enabled spot instances, taking advantage of unused cloud resources to save up to 70% on training costs. By utilizing spot instances for non-time-sensitive tasks, they maximized their cloud spending efficiency.

Furthermore, the AI team migrated their operations to the AWS cloud, leveraging its robust and scalable infrastructure. The migration enabled the team to benefit from AWS's vast array of services and resources, providing them with the flexibility to adapt to future business growth.

Save up to 50% on hiring World-class talent with us

Hire Elite Talent

Enhanced Development Environments and ETL

To streamline the development process further, we introduced automatic deployment to production using git tags. This streamlined the deployment process, reducing the risk of manual errors and facilitating seamless rollouts of new features and improvements.

The team also adopted Dockerized repositories, enabling them to package their applications and dependencies consistently. This containerization approach simplified deployment and ensured a consistent environment across various stages of the development lifecycle.

For Extract, Transform, Load (ETL) processes, we recommended using dagster, making it easier for the team to manage data workflows. Sanas.ai became one of dagster's first official Trusted ML Partners, showcasing their commitment to excellence in data engineering and data management.

Speeding up Development with Merge Request Environments

One of the critical achievements that emerged from the collaboration with Eventum was the implementation of Automatic Merge Request development environments.

This approach allowed developers to run any ETL step starting from any point in the graph automatically. This significant improvement sped up development drastically, as developers could focus on specific parts of the code without waiting for the entire workflow to complete.

Conclusion

Scaling an AI team beyond its initial size is a challenging endeavor, but with the right expertise and best practices, it can be achieved efficiently.

Sanas.ai, with the help of a team of 3 full-time AI specialists, successfully expanded their AI team, improved their MLOPs capabilities, and optimized their infrastructure to support their accent translation and voice enhancement services.

By adopting best practices, enhancing their CI/CD pipeline, optimizing costs, and leveraging cloud resources effectively, Sanas.ai positioned itself for continued growth and success in the dynamic and competitive AI industry.

The Automatic Merge Request development environments became a game-changer, accelerating the team's development process and ensuring they stay at the forefront of innovation in the field of AI-driven communication solutions.

Table of Contents

How Sanas achieved 40% faster ML Dev cycle with MLOps and ETL

Introduction

The Challenge

Here is what Eventum has done to help Sanas

Team Expansion and Best Practices

Infrastructure and CI/CD Improvements

Cost Optimisation and Cloud Migration

Save up to 50% on hiring World-class talent with us

Enhanced Development Environments and ETL

Speeding up Development with Merge Request Environments

Conclusion

Save up to 50% on hiring World-class talent with us

More Resources

0 to viral app in 4 months

50% model error reduction

Strategies for Hiring Elite ML Teams

Eventum’s Guide to Mastering RAG: Chunking Done Right

Top 10 Tips for Cutting Costs in ML Systems

Three Breakthroughs That Shaped the Modern Transformer Architecture

Artificial Intelligence

Get in Touch

Our Newsletter