What is Kubeflow? It is a platform to develop, build, train, deploy and monitor Machine Learning (ML) models in production, which supports a multi-architecture, multi-cloud framework for running entire ML pipelines.

ML model development and deployment involve several iterative processes, e.g. data preparation, feature engineering, train, test, hyper-parameter tuning, nested cross-validation, feature selection, dealing with overfitting etc., and the list continues! Over time, underlying data can also change as a result predictive performance of the model can degrade. Hence, once the model is in production, you have to ensure that the deployed model keeps producing the results you expected.

If you’ve developed loosely coupled micro-services on a laptop/desktop using Docker, aspiring to get the real business benefit, and also have a requirement to distribute training by leveraging TensorFlow or PyTorch. Then you may seriously consider Kubeflow to develop, build, train, deploy, monitor and scale your app/model for production use-cases. You can quickly accelerate your ML model/app development using Kubeflow (which also provides UI), and deploy your end-to-end ML pipelines on all major cloud platforms, e.g. AWS, GCP and Azure.

Finally, I’m keen to find out more about their AI Explainability feature, which is currently on alpha!