TensorFlow 101

Shan-Hung Wu & DataLab
Fall 2023

Why TensorFlow?

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. TensorFlow has several benefits:

  1. Easy model building
  2. Robust ML production anywhere
  3. Powerful experimentation for research

We are going to use TensorFlow as our framework in the following lectures. In this lab, you will learn how to install TensorFlow and get a better understanding by implementing a classical deep learning algorithm.

Environment setup

Software requirements

The following NVIDIA® software must be installed on your system:

Please refer to TensorFlow website, GPU Support section, for more details and latest information. Please check the version of the abovementioned softwares carefully. There is a strict requirement between TensorFlow's version and NVIDIA® softwares'.

Install CUDA with apt

This section shows how to install CUDA® 10 (TensorFlow >= 1.13.0) on Ubuntu 16.04 and 18.04. These instructions may work for other Debian-based distros.

Ubuntu 18.04 (CUDA 10.1)

# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-450
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.5.32-1+cuda10.1  \
    libcudnn7-dev=7.6.5.32-1+cuda10.1


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
    libnvinfer-dev=6.0.1-1+cuda10.1 \
    libnvinfer-plugin6=6.0.1-1+cuda10.1

Ubuntu 16.04 (CUDA 10.1)

# Add NVIDIA package repositories
# Add HTTPS support for apt-key
sudo apt-get install gnupg-curl
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1604_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
# Issue with driver install requires creating /usr/lib/nvidia
sudo mkdir /usr/lib/nvidia
sudo apt-get install --no-install-recommends nvidia-418
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.4.38-1+cuda10.1  \
    libcudnn7-dev=7.6.4.38-1+cuda10.1


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends \
    libnvinfer6=6.0.1-1+cuda10.1 \
    libnvinfer-dev=6.0.1-1+cuda10.1 \
    libnvinfer-plugin6=6.0.1-1+cuda10.1

Install CUDA with Anaconda

You can also install CUDA with Anaconda with following command:

conda install cudnn=7.6.5=cuda10.1_0

,which will install CUDA Toolkit and cuDNN SDK. After that, the only thing you have to install manually is Nvidia driver. After installing CUDA Toolkit, you can check CUDA version with following command:

You can also check GPU utilization after installing GPU driver with following command:

Install TensorFlow 2

TensorFlow is tested and supported on the following 64-bit systems:

We can simply install TensorFlow with Python's pip package manager with following commands:

# Requires the latest pip
pip install --upgrade pip

# Current stable release for CPU and GPU
pip install tensorflow

It is recommanded to install TensorFlow in a virtual environment, for more details, please refer to Install TensorFlow with pip.

To test whether TensorFlow is installed successfully and confirm that TensorFlow is using the GPU by executing following code.

Google Colab

Installing all dependencies of TensorFlow is a pain in the ass. Google Colab provides a Jupyter notebook environment that requires no setup with free GPU. The types of GPUs that are available in Colab vary over time. The GPUs available in Colab often include Nvidia

There is no way to choose what type of GPU you can connect to in Colab at any given time.

You can easily access Google Colab via your Google account and play around with TensorFlow. For example, TensorFlow 2 quickstart for beginners.

However, there are few constraints when using Google Colab:

Google announced a new service called Colab Pro ($9.99/month) at the beginning of 2020, which provides faster GPUs, longer runtimes, and more memory compared with Colab. Other choices for acquiring GPU is to use Cloud GPU and Cloud TPU, but they are more expensive. As a result, we highly recommend you to use TensorFlow on your own machine, while colab is still a good place for you to start with.

TensorFlow 2 quickstart

Originally developed by Google Brain, TensorFlow is an open source library which provides a variety of functions and classes used to conduct machine learning algorithm.

TensorFlow has various benefits, including

Later on, we will try to:

Limit GPU memory growth

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation. To limit TensorFlow to a specific set of GPUs we use the tf.config.experimental.set_visible_devices method.

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. One option is to turn on memory growth by calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory as needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, we extend the GPU memory region allocated to the TensorFlow process.

Load dataset via tf.keras.dataset

Currently, tf.keras.dataset supports 7 datasets. Including:

In this lab, we will use MNIST dataset to demo how to build a neural network. MNIST contains 70,000 images of hand-written digits, 60,000 for training while 10,000 for testing, each $28×28$ pixels, in greyscale with pixel-values from 0 to 255.

Load and prepare the MNIST dataset. Convert the samples from integers to floating-point numbers:

Build model via Sequential API

When to use?

A Sequential API is the simplest way to build a model, which is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

Build the tf.keras.Sequential model by stacking layers. Choose an optimizer and loss function for training:

For each example the model returns a vector of "logits" or "log-odds" scores, one for each class.

The tf.nn.softmax function converts these logits to "probabilities" for each class:

Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.

The losses.SparseCategoricalCrossentropy loss takes a vector of logits and a True index and returns a scalar loss for each example.

This loss is equal to the negative log probability of the true class: It is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to -tf.log(1/10) ~= 2.3.

The Model.summary method prints a string summary of the network, which is quite useful to examining model architecture before training.

The Model.fit method adjusts the model parameters to minimize the loss:

The Model.evaluate method checks the models performance, usually on a "Validation-set" or "Test-set".

The image classifier is now trained to ~98% accuracy on this dataset.

Build model via Functional API

When to use?

The Keras Functional API is a way to create models that are more flexible than the tf.keras.Sequential API. The functional API can handle models with non-linear topology, shared layers, and even multiple inputs or outputs.

The main idea is that a deep learning model is usually a directed acyclic graph (DAG) of layers. So the functional API is a way to build graphs of layers.

Consider the following model:

(input: (28, 28)-dimensional vectors)
       
[Flatten]
       
[Dense (128 units, relu activation)]
       
[Dropout]
       
[Dense (10 units, softmax activation)]
       
(output: logits of a probability distribution over 10 classes)

This is a basic graph with four layers. To build this model using the functional API, start by creating an input node:

The shape of the data is set as a 784-dimensional vector. The batch size is always omitted since only the shape of each sample is specified. The inputs that is returned contains information about the shape and dtype of the input data that you feed to your model.

You create a new node in the graph of layers by calling a layer on this inputs object. The "layer call" action is like drawing an arrow from "inputs" to this layer you created. You're "passing" the inputs to the dense layer, and you get x as the output.

Let's add a few more layers to the graph of layers:

At this point, you can create a Model by specifying its inputs and outputs in the graph of layers:

Training, evaluation, and inference work exactly in the same way for models built using the functional API as for Sequential models.

Build model via Model Subclassing

When to use?

The final method to implement a model architecture is called Model Subclassing. Model subclassing is fully-customizable and enables you to implement your own custom forward-pass of the model. However, this flexibility and customization comes at a cost — model subclassing is way harder to utilize than the Sequential API or Functional API.

So, if the model subclassing method is so hard to use, why bother utilizing it all?

Exotic architectures or custom layer/model implementations, especially those utilized by researchers, can be extremely challenging, if not impossible, to implement using the standard Sequential or Functional APIs. Instead, researchers wish to have control over every nuance of the network and training process — and that’s exactly what model subclassing provides them.

Use tf.data to batch and shuffle the dataset:

Build the tf.keras model using the Keras model subclassing API:

Choose an optimizer and loss function for training.

Select metrics to measure the loss and the accuracy of the model. These metrics accumulate the values over epochs and then print the overall result.

To train the model, we can use tf.GradientTape to record operations for automatic differentiation.

One of the most important and powerful features of deep learning framework is automatic differentiation and gradients. As we can see in Neural Networks from Scratch, building neural networks manually requires strong knowledge of backpropagation algorithm, where we have to calculate the derivative of everything. It is interesting as we don't have too many operations or the model architecture is relatively simple. When we have tens of millions of operations or much more complicated architecture, computing gradients for each weight will be a nightmare.

TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.Variables, which is useful for implementing machine learning algorithms such as backpropagation for training neural networks. In short, you can regard tape.gradient(loss, model.trainable_variable) as

$$\frac{\partial{L}}{\partial{W_{i,j}}}$$

For more details, please refer to Introduction to Gradients and Automatic Differentiation.

Notice that the function decorated by tf.function will be compiled into a callable TensorFlow graph automatically. This allows the TensorFlow runtime to apply optimizations and exploit parallelism to boost computation performance. We will talk more about tf.function in the later section.

Let's train and evaluate the model first.

Sequential API, Functional API, and Model Subclassing

Should you use the Keras functional API to create a new model, or just subclass the Model class directly? In general, the functional API is higher-level, easier and safer, and has a number of features that subclassed models do not support. However, model subclassing provides greater flexibility when building models that are not easily expressible as directed acyclic graphs of layers. For example, you could not implement a Tree-RNN with the functional API and would have to subclass Model directly.

Choosing between the functional API or Model subclassing isn't a binary decision that restricts you into one category of models. All models in the tf.keras API can interact with each other, whether they're Sequential models, functional models, or subclassed models that are written from scratch.

For an in-depth look at the differences between the functional API and model subclassing, read What are Symbolic and Imperative APIs in TensorFlow 2.0?.

Better performance with tf.function

In TensorFlow 2, eager execution is turned on by default. The user interface is intuitive and flexible (running one-off operations is much easier and faster), but this can come at the expense of performance and deployability.

You can use tf.function to make graphs out of your programs. It is a transformation tool that creates Python-independent dataflow graphs out of your Python code. Here we will briefly introduce you how tf.function works under the hood so you can use it effectively.

The main takeaways and recommendations are:

Let's create two function with same operation. f_eager and f_graph represent the functions run in eager and graph mode, respectively.

As you can see, functions decorated with @tf.function can be faster than eager code, especially for graphs with many small ops. But for graphs with a few expensive ops (like convolutions), you may not see much speedup.

If your code uses multiple functions, you don't need to annotate them all - any functions called from an annotated function will also run in graph mode. Therefore, you only need to decorate the outermost function only.

Debugging

In general, debugging code is easier in eager mode than inside tf.function. You should ensure that your code executes error-free in eager mode before decorating with tf.function. To assist in the debugging process, you can call tf.config.run_functions_eagerly(True) to globally disable and reenable tf.function.

When tracking down issues that only appear within tf.function, here are some tips:

Python side effects

Python side effects like printing, appending to lists, and mutating globals only happen the first time you call a Function with a set of inputs. Afterwards, the traced tf.Graph is reexecuted, without executing the Python code.

The general rule of thumb is to only use Python side effects to debug your traces. Otherwise, TensorFlow ops like tf.Variable.assign, tf.print, and tf.summary are the best way to ensure your code will be traced and executed by the TensorFlow runtime with each call.

As we can see, print function works unexpectedly, while tf.print calls execute every time. Another gotcha we can see below is mutating globals.

In addition, many Python features, such as generators and iterators, rely on the Python runtime to keep track of state. In general, while these constructs work as expected in eager mode, many unexpected things can happen inside a Function.

For more details about tf.function and Autograph, including how conditionals and loops work and how to trace code and debug inside tf.function, please refer to tutorial and video.

Customize gradient flow by tf.custom_gradient

tf.custom_gradient is a decorator to define a function with a custom gradient. This decorator allows fine grained control over the gradients of a sequence for operations. This may be useful for multiple reasons, including providing a more efficient or numerically stable gradient for a sequence of operations.

For example, consider the following function that commonly occurs in the computation of cross entropy and log likelihoods: $$y=\log_e(1+e^x)$$

The derivative of y is:

$$\frac{dy}{dx}=\frac{e^x}{1+e^x}=1-\frac{1}{1+e^x}$$

Theoretically, if x = 100, dy/dx should be 1. However, due to numerical instability, the gradient this function evaluated at x=100 is NaN. For example:

The gradient expression can be analytically simplified to provide numerical stability:

Reference