Last modified: Dec-14-2021, 10:35PM +08
From Manual To Semi-Automatic
Before the advent of the concept “MLOps”, getting a single machine learning(ML) model to production was tedious and belaboring. Every single detail pertaining to the inputs, model server, training and inference have to be defined explicitly. This is to ensure the input tensors follow a strict requirement for them to be processed by user defined functions.
To serve a single model, these predefined configurations have to be under version control as the ML field and software ecosystem is accelerating at near exponential speeds. In addition to the model, version control has to be applied to the training data as well as the software infrastructure that is used to host the model. A working production pipeline is like a moving train loading and offloading compartments to keep up with cutting-edge development.
After the release of ImageNet dataset, there was tremendous effort poured into surpassing the human baseline. In the early 2010s, that baseline was exceeded with the combination of readily available data, open-source frameworks and modern computing resources that can be bought off the shelf. However, being proficient in these resources was restricted to experts and those within the technical community. Developer sanity was largely dependant on up-to-date documentation or comments within the source where documentation was absent.
In the mid 2010s, a number of Deep Learning(DL) frameworks were designed to unify the common primitives in building these DL models. These include TensorFlow, Keras, Pytorch, Apache MXNet and many others.
To tackle the problem of productionizing models, one of the solutions explored was the usage of Docker containers, to package both the dependencies and the actual model as lightweight components that can be easily shared through a public repository. This approach greatly democratize the deployment of DL models to common hosting providers like the public clouds or in-house servers.
The natural progression in using Docker containers meant the inclusion of shell scripts, cron jobs and triggers that allow the automation of the entire ML pipeline. Docker-based workflows gave developers access to version controlled resources locally on their laptops and globally across different time zones.
Components-Based Workflow
For organizations that need to scale to millions of containers in production, the de facto solution include container orchestration platforms such as Kubernetes. The platform allows hundreds and thousands of engineers to collaborate on different levels of a complex ML system. This ranges from low level implementation of hardware drivers to the high level design of user-interfaces such as click-and-drag block diagrams.
The low-code or no-code approach is an industry effort to lower the cognitive strain in designing complex ML models. The design and implementation of mission-critical models requires non-trivial engineering efforts, so why should their deployment be unnecessarily complex?
Behind the scenes of the components-based workflow lies Kubernetes applications such as Argo Workflows, Tekton as well as many others. These applications specify steps in a ML pipeline as containers that can spun up sequentially or in parallel. These steps can be expressed as a directed acyclic graph(DAG), which can be version controlled and compiled for export to different hardware architectures.
Initially, we had manual design, hand-tuned and hand-crafted models without A/B testing because deployment of new models simply could not keep up with the development of a core application(4~6 weeks cycle). Now we can churn dozens of models daily in parallel, set to trigger on arrival of new data or based on adjacent/over-lapping time windows. The models that passed evaluation are then uploaded to a model repository for further downstream processes.
Cautionary Tales
A majority of kubernetes applications are rather new to the scene, many more are emerging to solve critical issues pertaining to storage, security, networking and other peripherals. Choosing the right software stack requires an in-depth technical review of existing solutions with respect to dimensions of correctness, latency and costs.
At the SME scale, one single competent ML engineer is the bare requirement for a sufficiently complex ML system, serving requests up to the number of CPU cores procured with default settings.
At the enterprise scale, ML engineering is not well suited to be an one-man job, but rather spread across different teams with each being a subject matter expert on their domains.
What’s Next
Currently working on a regression pipeline, targeting TensorFlow.js models to be deployed in a Flutter application, hosted by Firebase. The pipeline is designed to be agnostic to regression problem domains. Future regression tasks include cryptocurrency market size, health monitoring, renewable energy forecasts and EV tank-to-wheel efficiency(70~90%).
Other pipelines include tasks under the pillars of ML:
- classification
- density-estimation
- dimensionality-reduction
Pipelines for generative models are in the roadmap as well.
Incorporating accelerators such as GPUs or TPUs into pipeline to further parallelize existing workflows.