MLOps : Integrating The Machine learning automation with Devops
What is MLOps ?
MLOps means (Machine learning + Operations). MLOps is communication between data scientists and the operations or production team. It’s deeply collaborative in nature, designed to eliminate waste, automate as much as possible, and produce richer, more consistent insights with machine learning. ML can be a game changer for a business, but without some form of systemization, it can devolve into a science experiment.
MLOps brings business interest back to the forefront of your ML operations. Data scientists work through the lens of organizational interest with clear direction and measurable benchmarks. It’s the best of both worlds.
How is MLOps different from DevOps?
- Data/model versioning != code versioning
- Model reuse entirely has different case than software reuse, as models need tuning based on scenarios and data.
- Fine-tuning is needed when to reuse a model. Transfer learning on it, and it leads to a training pipeline.
- Retraining ability requires on-demand as the models decay over time.
Here we Integrate The MLOps with DevOps >>
Let’s get started :-
~ Task Description overview:
1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile
2. When we launch this image, it should automatically starts train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repo automatically when some developers push repo to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
6. Job3 : Train your model and predict accuracy or metrics.
7. Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.
8. Job5: Retrain the model or notify that the best model is being created
9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left
~ Solution overview:
The first step is to create a docker image for setting the environment of training. Following is the code , the code you have to write in any test editor and save this file to the “Dockerfile” . You can’t save this file with the Any other name otherwise it will not work. Here is the Dockerfile you can use and create your own docker image
To save the docker file and run the following command to create your docker image :
docker build -t <name of the image>:<version/tag> <location of dockerfile>
Now the Image is ready, with the use of this image we can create environments
Next step is to create a Github repository where the developer or coder will upload the files or datasets for predicting or training the model
After you create repository clone it into your PC and add a post-commit script in .git/hooks/ so that whenever you commit it automatically pushes the files to GitHub. Following is the code for writing a post-commit script. Paste the code in any text editor and make sure you save the file in .git/hooks folder of your repository and the name should be post-commit
#!/bin/bash
git push
So, whole Github and RHEL setup is done . So let’s move to the the part of jenkins where we have create different job’s of jenkins
Job1: mlops_job1
Pull the Github repo automatically when some developers push repo to Github.
Job2: mlops_job2
By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
As you see job2 is dependent on job1 ,so we need to Build trigger “Build after other project”
if sudo cat /root/root/shubham/MLOps-project/mlops1.py | grep keras
then
if sudo docker ps | grep cnn
then
echo "Already Running"
else
sudo docker run -dit -v /root/root/shubham/MLOps-project:/home --name cnn mlops:v1
fi
else
if sudo docker ps | grep sklearn
then
echo "Already Running"
else
sudo docker run -dit -v /root/root/shubham/MLOps-project:/home --name sklearn mlops:v1
fifi
Job3: mlops_job3
Train your model and predict accuracy or metrics.
As you can seen job3 is to be executed after the job2 , job3 is triggered by job2
if sudo cat /root/root/shubham/MLOps-project/mlops1.py | grep keras
then
sudo docker exec cnn python3 /home/mlops1.py
else
sudo docker exec sklearn python3 /home/mlops1.py
fi
Here we have to execute the shell for executing the python file , it will check the accuracy , In my case the data is already predicted in the background ,When you will start it show the downloading images .I used MNIST dataset which is officially provided by jenkins .The file mlops1.py will take all the values of hyperparameters . So when next time the model is put again in training it can take changed values.
Job4: mlops_job4
If metrics accuracy is less than 80% , then tweak the machine learning model architecture.
Here we used the file “rebuild.py” to check the accuracy which run epochs and iteration to check the accuracy .If the accuracy is greater then 80 % it sends the “success.py” mail In case ,if accuracy is less than 80% it will sends the failure.py mail .It depends on its accuracy.
if sudo cat /root/root/shubham/MLOps-project/mlops1.py | grep keras
then
accuracy=$(sudo docker exec cnn python3 /home/rebuild.py)
else
accuracy=$(sudo docker exec sklearn python3 /home/rebuild.py)
fi
if (( $accuracy >= 80 ))
then
sudo python3 /root/root/shubham/MLOps-project/success.py
exit 1
else
sudo python3 /root/root/shubham/MLOps-project/failure.py
fi
Here is, In my case it provides the greater than 80% accuracy .The file successfully sents mail to developer notifying the successful completion of the model.
Job5: mlops_job5
Retrain the model or notify that the best model is being created
sudo python3 /root/root/shubham/MLOps-project/accuracy.txt
sudo python3 /root/root/shubham/MLOps-project/rebuild_failed_notifier.py
sudo curl cp accuracy.txt http://192.168.43.56/view/mlops-project/job/mlops_job5/build?token=mlops_job1
Job6: mlops_job6
If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.
This code will create the new container and process again.
if sudo docker ps | grep CNN
then
echo "everything is fine"
else
sudo docker run -dit -v /root/root/shubham/MLOps-project:/home --name CNN mlops:v1
fiif sudo docker ps | grep SKLEARN
then
echo "Everything is good"
else
sudo docker run -dit -v /root/root/shubham/MLOps-project:/home --name SKLEARN mlops:v1
fi
Here is the full complete cycle of all the jobs .It is a continuous process as you can seen from the above code . where all jobs are connected or triggered to next job like: Job1 →Job2 →Job3 →Job4 →Job5 →Job6.
Building pipeline
This was all about the jobs in Jenkins .
Conclusion/Miscelleneous:
To completing the whole task we used the tools like Git,Jenkins,VM,RHEL8,Docker and some code of python .In this we can integrate Machine learning with DevOps which can make the life simple of the coder or developer and saves alot of time .
Github repo for all the files : → https://github.com/shubhamkhandelwal523/MLOps-project.git
LinkedIn URL : → https://www.linkedin.com/posts/shubham-khandelwal-a04613144_mlops-integrating-the-machine-learning-activity-6670344581525188610-h_R7
I would like to thank WORLD RECORD HOLDER @VimalDaga for giving such good knowledge about the MLOps .I created this project under the mentorship of vimal daga sir . It was such a great task and very interesting.
Thanks for reading , I Hope you liked it.