Machine Learning (ML) models are essential to identifying patterns and making reliable predictions. And at CloudSEK, our models are trained to derive such predictions across data collected from more than 1000 sources. With over 50 different models running in production, monitoring these Machine Learning models is a daunting task and yet indispensable.
The ML development life cycle consists of training and testing models, their deployment to production, and monitoring them for improved accuracy. A lack of adequate monitoring could lead to inaccurate predictions, obsolete models, and the presence of unnoticed bugs in them.
CloudSEK’s Data Engineering team works together with Data Scientists to deploy ML models and track their performances continuously. To achieve this, we ensure that the following requirements are fulfilled:
- Model versioning: Enable multiple versions of the same models
- Initializing observer patterns using incoming data
- Comparing results from different versions
At CloudSEK, different Machine Learning models and their multiple versions classify a document across various stages. Whereby, the client is alerted only to the most accurate results from efficient models or an ensemble of results by combining different versions.
What constitutes a version upgrade?
At its core, all machine learning modules are composed of 2 parts. The output of an ML module depends on both of these components:
- The core ML model weights file which is generated upon training a model.
- The surrounding code statements that represent preprocessing, feature extraction, post-processing, etc.
As a rule of thumb, any significant modifications made to these two components qualify as a version upgrade. However, minor changes or bug fixes or even static rules additions don’t lead to an upgrade, and are simply considered as regular code updates, which we track via Git.
Deploying and Monitoring Models
Generally, Machine Learning models are hosted on stateless docker containers. Such containerized models listen to queues for messages, as soon as the docker container runs on a system. The container maintains a configuration file with information about the type of models, their versions, and whether these models are meant for production.
When the docker container is built, you can pass the latest Git repository Git commit hash to it, to be set as an environment variable. The diagram explains the data flow between ML models and their different versions:
When the container is run, data is consumed from a message queue. The model name present in the configuration file determines the data that is consumed. Once it is processed, the predictions are returned as a dictionary which is then persisted into a database.
The ML modules can also possibly return optional metadata that contains information such as the actual prediction scores, functions triggered inside, etc.
Given below is a sample of a document after processing the results from all the models:
{
"document_id" : "root-001#96bfac5a46",
"classifications_stage_1_clf_v0" : {
"answer" : “suspected-defaced-webpage",
"content_meta" : null,
"hit_time" : ISODate("2019-12-24T14:54:09.892Z"),
"commit_hash" : "6f8e8033"
},
"classifications_stage_2_clf_v0" : {
"answer" : {
"reason" : null,
"type" : "nonthreat",
"severity" : null
},
"content_meta" : null,
"hit_time" : ISODate("2019-12-24T15:40:46.245Z"),
"commit_hash" : null
},
"classifications_stage_2_clf_v1" : {
"answer" : {
"reason" : null,
"type" : "nonthreat",
"severity" : null
},
"content_meta" : null,
"hit_time" : ISODate("2019-12-24T15:40:46.245Z"),
"commit_hash" : null
}
}
How this helps us
This process allows us to find, for any given document, the exact state of all the models that classified a particular document. We can rollback between the model versions and a minor change in the value provided in the configuration file should allow us to set the main production model apart from the test models.
A Metabase instance can be leveraged to visualize key metrics and the performance of each classifier, on a dashboard. It may also contain details about the documents that are processed by each model, or how many documents were classified with category X, category Y, etc. ( in the case of classification tasks), and more.
Monitoring also allows data scientists to study and compare the results of the various versions of the models, given that the particulars of version outputs are retrieved. This data provides them with a set of documents that reveal which output may have been influenced by a new model. This data is then added to the training data to calibrate the models.