Why we need TensorFlow Extended (TFX) — and how to get it in 3 steps

Roman Kazinnik
3 min readMar 2, 2021

There are two main considerations when it comes to adopting TFX: value and cost. I want to demonstrate the value of TFX and how it helps with production-level experimentation, and adopting the best standards for model and data validation.

The cost for using TFX is all about locking up Machine Learning to Tensorflow stacks, such as model training and data transformation. Moving to Tensorflow from RAM Pandas, scikit-learn, and R-Studio may become a not-so-trivial endeavor, and seeing the potential value of TFX clearly is very important. So, here it is!

DevOps and Data Science disconnect a.k.a. ‘But it worked on My Laptop!’

First, I explain the problem ‘But it worked on my Laptop” and how Tensorflow Extended helps to solve it. After that, I show a 3-step example of how to transition to the unified Machine Learning End-to-End workflow.

Here is the 3-step recipe to transition to Machine Learning as End-To-End

Step-1: Migrate data input and model to Tensorflow

That usually involves moving from Python Pandas DataFrame to TensorFlow data pipeline and migrating from the scikit-learn to the TensorFlow model.

Step-2: tensorflow-transform to transform heterogeneous data to numerical TensorFlow input

The module tensorflow-transform creates transformations from heterogeneous data input to numerical outputs. This includes one-hot encoding and bucketing, as well as synthesizing new numerical outputs. These numerical outputs will be used later in TensorFlow input layers in creating Tensorflow features.
Example: Titanic dataset that reproduces 99% prediction accuracy, tensorflow-transform, and Tensorflow modeling.

Together with Maximiliano Teruel and Arturo Vallone: maximiliano.teruel@azumo.co arturo@azumo.co, run locally, no TFX dependencies imported:

https://github.com/romankazinnik/romankazinnik_blog/blob/master/TFX_KFP/module.py

Notice how TFX is completely abstracted out from TensorFlow features creation and modeling.

Step-3: Create TensorFlow Extended (TFX) pipeline

TFX pipeline is a thin architectural layer that adds ‘LIVE’ to ML experiments. TFX appends Tensorflow non-production code with Production-scale components: Create Schema, Data Validation, Train and Push Model, Model Evaluation, and Serve Model.

  1. Generic pipeline TFX pipeline code is generic and can be used as-is for multiple Models.
  2. Production TFX pipeline goal is to create Production-ready deployment, that includes multiple components such as Data Validation, Model Evaluation, and Model Inference.
  3. Kubeflow By appending Tensorflow Model with TFX one can use Machine Learning Platform tools such as Kubeflow and Kubernetes.
  4. TFX is an abstraction layer for Tensorflow. That means Tensorflow models can be developed independently from TFX in any preferred local or Cloud environment. When the model is ready for Production, Step-1 and Step-2 will make the model run in TFX, and in turn with Kubeflow and Kubernetes.
  5. Single contributor All the steps from model experiments to the production TFX model deployment can be done by a single Data Scientist or Machine Learning Engineer.

Example: append Tensorflow model with TFX module_tfx.py and run TFX pipeline locally: https://github.com/romankazinnik/romankazinnik_blog/blob/master/TFX_KFP/module_tfx.py
https://github.com/romankazinnik/romankazinnik_blog/blob/master/TFX_KFP/tfx-e2e.ipynb

The diagram below illustrates the difference between the two paths to production.

Enjoyed or Hated it? Let me know with a comment, or get in touch on Twitter and follow me on Medium

Originally published at https://www.romankazinnik.com on March 2, 2021.

--

--