Fall 2018

In this lab, we will first go through some basic concepts of TensorFlow including graph, session, dataset and tensorboard. At the end, introduce a word2vec model as application to cover all knowledge taught in this lab.

In order to use NVIDIA GPUs to train your model, CUDA and CuDNN are required. The installation guide can be found here.

There are several ways to install TensorFlow which can be found here. One way is to install TensorFlow in a conda virtual environment. First, we create a new environment called `tensorflow`

.

`> conda create -n tensorflow`

Then we activate the environment:

```
> source activate tensorflow (Linux or Mac)
> activate tensorflow (Windows)
```

According to the TensorFlow official webpage, it is recommended installing TensorFlow with `pip install`

command instead of `conda install`

. Since the conda package is community supported, not officially supported, we will stick to `pip install`

.

First, make sure that `pip3`

is installed:

`> pip3 -V`

Install TensorFlow with `pip install`

:

`> pip3 install tensorflow-gpu # Python 3.n; GPU support`

Then we can verify the installation by entering a short program in the python interactive shell.

`> python`

Type in the following program:

```
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
```

Originally developed by Google Brain, TensorFlow is an open source library which provides a variety of functions and classes used to conduct machine learning.

The benefits of using TensorFlow include:

- Python API
- Portability: can be used on multiple CPUs or GPUs as well as on mobile devices
- Flexibility: can run on different devices e.g. Raspberry Pi, Android, iOS, Windows, Linux
- Visualization: visualize the training process via TensorBoard
- Checkpoints: manage trained models
- Auto-differentiation
- Large community

In TensorFlow, the definition of computations is separated from their execution. First, we specify the operations by building a data flow graph in Python. Next, TensorFlow runs the graph with a `Session`

using optimized C++ code. Let's import tensorflow first.

In [1]:

```
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "3" # choose which GPU you want to use
from tempfile import gettempdir
import urllib
import zipfile
import numpy as np
import random
import matplotlib.pyplot as plt
import tensorflow as tf
print(tf.__version__)
```

A computational graph is a series of TensorFlow operations arranged into a graph. The graph is composed of two types of objects.

`tf.Operation`

: The nodes of the graph. Operations describe calculations that consume and produce tensors.`tf.Tensor`

: The edges in the graph. These represent the values that will flow through the graph. Most TensorFlow functions return`tf.Tensors`

.

Notes:`tf.Tensors`

do not have values, they are just handles to elements in the computation graph.

To evaluate tensors, instantiate a `tf.Session`

object, informally known as a session. A **session** encapsulates the state of the TensorFlow runtime, and runs TensorFlow operations.

In [2]:

```
# build a naive graph
a_tensor = tf.constant(3., name="const3")
b_tensor = tf.constant(4., name="const4")
out_tensor = tf.add(a_tensor, b_tensor)
print(a_tensor, b_tensor, out_tensor, sep="\n")
```

Notice that printing the tensors does not output the values `3.0`

, `4.0`

, and `7.0`

as you might expect. The above statements only build the computation graph. These `tf.Tensor`

objects just represent the results of the operations that will be run.

We need a `tf.Session`

to run it.

In [3]:

```
sess = tf.Session() # create a session
a, b, c = sess.run([a_tensor, b_tensor, out_tensor])
print("a = {} \nb = {} \nc = {}".format(a, b, c))
```

A **tensor** is a generalization of vectors and matrices to potentially higher dimensions.

When writing a TensorFlow program, the main object you manipulate and pass around is the `tf.Tensor`

. A `tf.Tensor`

object represents a partially defined computation that will eventually produce a value. TensorFlow programs work by first building a graph of `tf.Tensor`

objects, detailing how each tensor is computed based on the other available tensors and then by running parts of this graph to achieve the desired results.

A `tf.Tensor`

has the following properties:

- a data type (
`tf.float32`

,`tf.int32`

, or`tf.string`

, for example) - a shape

The **rank** of a tensor refers to the number of dimensions it has.

The **shape** of a tensor speficies the array's length along each dimension.

```
3. # a rank 0 tensor; a scalar with shape [],
[1., 2., 3.] # a rank 1 tensor; a vector with shape [3]
[[1., 2., 3.], [4., 5., 6.]] # a rank 2 tensor; a matrix with shape [2, 3]
[[[1., 2., 3.]], [[7., 8., 9.]]] # a rank 3 tensor with shape [2, 1, 3]
```

Some types of tensors are special.

The main ones are:

`tf.constant`

`tf.Variable`

`tf.placeholder`

We can create constants by passing lists or constants into the `tf.constant`

function.

`tf.constant(value, dtype=None, shape=None, name='Const', verify_shape=False)`

In [4]:

```
# constant of 1d tensor (vector)
a = tf.constant([2, 2], dtype=tf.int32, name="vector")
# constant of 2x2 tensor (matrix)
b = tf.constant([[0, 1], [2, 3]], name="matrix")
print(a, b, sep="\n")
```

We can also create tensors of a specific value.

In [5]:

```
# a matrix with filled zeros
c = tf.zeros([2, 3], tf.int32, name="zeros_matrix") # [[0, 0, 0], [0, 0, 0]]
# a matrix with filled ones
d = tf.ones([2, 3], tf.int32, name="ones_matrix") # [[1, 1, 1], [1, 1, 1]]
# create a tensor filled zeros/ones, with shape and type as input_tensor
input_tensor = tf.constant([[1,1], [2,2], [3,3]], dtype=tf.float32)
e = tf.zeros_like(input_tensor, name="zeros_like_matrix") # [[0, 0], [0, 0], [0, 0]]
f = tf.ones_like(input_tensor, name="ones_like_matrix") # [[1, 1], [1, 1], [1, 1]]
print(c, d, e, f, sep="\n")
```

Unlike a constant, a variable can be assigned to, so its value can be changed. Also, a constant's value is stored on the graph, whereas a variable's value is stored seperately. To declare a variable, we create a instance of `tf.get_variable`

.

In [6]:

```
# create a variable of vector
vec_var = tf.get_variable(name="vector", shape=[3],
initializer=tf.ones_initializer)
# create a variable of matrix
mat_var = tf.get_variable(name="matrix", shape=[5, 3],
initializer=tf.random_normal_initializer)
print(vec_var, mat_var, sep="\n")
```

Or we can create variables by calling `tf.Variable`

.

In [7]:

```
# instance of `tf.Variable`
var = tf.Variable(2, name="scalar")
# we can assign new value to a variable
var_times_two = var.assign(var * 2) # an operation that assigns value var*2 to var
print(var, var_times_two, sep="\n")
```

In [8]:

```
# constant value is not changable
# the following code will casue error
c = tf.constant(0.)
c.assign(1.)
```

`tf.constant`

, while trainable weights should be as type `tf.Variable`

Before you can use a variable, it must be initialized. If you are programming in the low-level TensorFlow API (that is, you are explicitly creating your own graphs and sessions), you must explicitly initialize the variables.

To initialize all trainable variables in one go, before training starts, call `tf.global_variables_initializer()`

. This function returns a single operation responsible for initializing all variables in the `tf.GraphKeys.GLOBAL_VARIABLES`

collection. Running this operation initializes all variables.

In [9]:

```
variable_init_op = tf.global_variables_initializer() # an operation
sess.run(variable_init_op) # initialize the variable
```

As it stands, this graph is not especially interesting because it always produces a constant result. A graph can be parameterized to accept external inputs, known as **placeholders**. A **placeholder** is a promise to provide a value later, like a function argument.

`tf.placeholder(dtype, shape=None, name=None)`

In [10]:

```
a_placeholder = tf.placeholder(tf.float32, shape=[None, 3], name="a")
print(a_placeholder)
```

A **placeholder** should be provided a value when executed by `tf.Session`

.

In [11]:

```
sess = tf.Session()
a = sess.run(a_placeholder, feed_dict={a_placeholder: [[1, 2, 3], [4, 5, 6]]})
print(a)
```

In [12]:

```
# Build a basic graph that demos the basic tensorflow concepts
with tf.Graph().as_default() as g:
# a constant tensor with rank = 0
scalar_tensor = tf.constant(5., name="scalar")
# a vector tensor with rank = 1, and filled with random values
vector_tensor = tf.random_normal(shape=[5], name="vector")
# tensorflow supports broadcast
broadcast_with_scalar = vector_tensor + scalar_tensor
# use placeholder to get values in runtime
x_input = tf.placeholder(tf.float32, shape=[None, 5], name="input")
feature_dims = x_input.shape[1]
# a matrix variable with rank = 2.
matrix_variable = tf.get_variable("matrix",
shape=[feature_dims, 2],
initializer=tf.ones_initializer)
mul_with_matrix = tf.matmul(x_input, matrix_variable, name="output")
var_init_op = tf.global_variables_initializer()
```

In [13]:

```
feed_data = np.random.randint(5, size=[5, 5])
# Under the scope of session, we can run the value of tensor in default graph
with tf.Session(graph=g) as sess:
sess.run(var_init_op) # must initialize the variables
scalar, vector, broadcast = sess.run([scalar_tensor,
vector_tensor,
broadcast_with_scalar])
print("[scalar]\n {} \n[vector]\n {} \n[broadcast]\n {}".format(scalar,
vector,
broadcast))
x, m, out = sess.run([x_input, matrix_variable, mul_with_matrix],
feed_dict={x_input: feed_data})
print("[input]\n {} \n[matrix]\n {} \n[output]\n {}".format(x, m, out))
```

The computations you'll use TensorFlow for - like training a massive deep neural network - can be complex and confusing. To make it easier to understand, debug, and optimize TensorFlow programs, TF included a suite of visualization tools called **TensorBoard**. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it.

Create a `writer`

with `tf.summary.FileWriter`

to write the graph into file.

In [14]:

```
graph_dir = "graphs/demo"
os.makedirs(graph_dir)
with tf.Graph().as_default() as g:
const_a = tf.constant(1., shape=[1, 5], name="const_a")
const_b = tf.add(const_a, 5, name="const_b")
var_c = tf.get_variable("var_c", shape=[5, 3])
const_d = tf.matmul(const_b, var_c, name="const_d")
# create a writer
writer = tf.summary.FileWriter(graph_dir, tf.get_default_graph())
```

To run TensorBoard, use the following command:

`tensorboard --logdir=path/to/log-directory`

where `logdir`

points to the directory where the `FileWriter`

serialized its data. If this `logdir`

directory contains subdirectories which contain serialized data from separate runs, then TensorBoard will visualize the data from all of those runs. Once TensorBoard is running, navigate your web browser to localhost:6006 to view the TensorBoard.

TensorBoard provides more than that. You can check here for more information.

Training a deep learning model may take a few hours or even a few days. The learned weights should be saved periodically so that you can restore for further applications. In TensorFlow, all trainable variables can be saved in checkpoints - a binary file that map variable names to tensor values.

In this part, we will guide you how to save and restore the model. TensorFlow provides a superb class `tf.train.Saver`

to do this work. Its constructor adds `save`

and `restore`

ops to the graph for all, or a specified list, of the variables in the graph. The `Saver`

object provides methods to run these ops, specifying paths for the checkpoint files to write to or read from.

[**Notes**] You can check here for more than that.

Create a `Saver`

with `tf.train.Saver()`

to manage all variables in the model.
The following cell shows how to call the `tf.train.Saver.save`

method to save variables to checkpoint files.

In [15]:

```
ckpt_dir = "checkpoints/demo"
os.makedirs(ckpt_dir)
with tf.Graph().as_default() as g:
const_a = tf.constant(2, tf.int32, [5])
var_b = tf.get_variable("var_b", dtype=tf.int32, shape=[5],
initializer=tf.zeros_initializer) # variable
const_c = var_b + const_a
var_d = tf.get_variable("var_d", shape=[3],
initializer=tf.ones_initializer) # variable
print("[Graph]", const_a, var_b, const_c, var_d, sep="\n")
print("\n[Trainable variables]", *tf.trainable_variables(), sep="\n")
init_op = tf.global_variables_initializer()
# Declare a saver object to save checkpoints
saver = tf.train.Saver()
with tf.Session(graph=g) as sess:
# Initialize variables
sess.run(init_op)
# Do some works with the model
a, b, c, d = sess.run([const_a, var_b, const_c, var_d])
print("\n[Value]", a, b, c, d, sep="\n")
# Save the variables to disk
save_path = saver.save(sess, os.path.join(ckpt_dir, "model.ckpt"))
print("\n[Model saved in path: {}]".format(save_path))
```

This is a useful function to debug. Sometimes you may notice that you have some conflict between your model and checkpoints. That's why you need this function to insepct what you stored in the checkpoints.

In [16]:

```
from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
# tensor_name: Name of the tensor in the checkpoint file to print.
# all_tensors: Boolean indicating whether to print all tensors.
# all_tensor_names: Boolean indicating whether to print all tensor names.
print_tensors_in_checkpoint_file(save_path,
tensor_name="",
all_tensors="",
all_tensor_names=False)
```

The `tf.train.Saver`

object not only saves variables to checkpoint files, it also restores variables. Note that when you restore variables you do not have to initialize them beforehand. For example, the following snippet demonstrates how to call the `tf.train.Saver.restore`

method to restore variables from the checkpoint files:

In [17]:

```
with tf.Session(graph=g) as sess:
saver.restore(sess, save_path)
a, b, c, d = sess.run([const_a, var_b, const_c, var_d])
print("\n[Value]", a, b, c, d, sep="\n")
```

In [18]:

```
# more robust
with tf.Session(graph=g) as sess:
ckpt = tf.train.get_checkpoint_state(ckpt_dir)
print(ckpt)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
a, b, c, d = sess.run([const_a, var_b, const_c, var_d])
print("\n[Value]", a, b, c, d, sep="\n")
```

The `tf.data`

API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch for training. The pipeline for a text model might involve extracting symbols from raw text data, converting them to embedding identifiers with a lookup table, and batching together sequences of different lengths. The `tf.data`

API makes it easy to deal with large amounts of data, different data formats, and complicated transformations.

The `tf.data`

API introduces two new abstractions to TensorFlow:

A

`tf.data.Dataset`

represents a sequence of elements, in which each element contains one or more Tensor objects. For example, in an image pipeline, an element might be a single training example, with a pair of tensors representing the image data and a label. There are two distinct ways to create a dataset:Creating a

**source**(e.g.`Dataset.from_tensor_slices()`

) constructs a dataset from one or more`tf.Tensor`

objects.Applying a

**transformation**(e.g.`Dataset.batch()`

) constructs a dataset from one or more`tf.data.Dataset`

objects.

- A
`tf.data.Iterator`

provides the main way to extract elements from a dataset. The operation returned by`Iterator.get_next()`

yields the next element of a`Dataset`

when executed, and typically acts as the interface between input pipeline code and your model. The simplest iterator is a "one-shot iterator", which is associated with a particular`Dataset`

and iterates through it once. For more sophisticated uses, the`Iterator.initializer`

operation enables you to reinitialize and parameterize an iterator with different datasets, so that you can, for example, iterate over training and validation data multiple times in the same program.

In [19]:

```
tf.reset_default_graph()
## Pseudo dataset
dataset_size = 20
# 20 examples, each example has 5 features
data = np.random.rand(dataset_size, 5)
# this dataset has 3 labels
label = np.random.randint(low=0, high=3, size=dataset_size)
```

In [20]:

```
batch_size = 7
## Create a `dataset` by `from_tensor_slices`
training_dataset = tf.data.Dataset.from_tensor_slices((data, label))
print("[Original dataset] \n", training_dataset)
training_dataset = training_dataset.batch(batch_size)
print("\n[Transformed dataset] \n", training_dataset)
## Create a `iterator` to extract elements from `dataset`
training_iterator = training_dataset.make_initializable_iterator()
x_input, y_label = training_iterator.get_next()
print("\n[Iterator] \n", training_iterator)
print("\n[Elements extracted by iterator] \n", x_input, "\n", y_label)
```

`x_input`

and `y_label`

, we can build a graph for it.

Here we skip the model building process, we demonstrates how will the tensors display.

In [21]:

```
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # avoids occupying full memory of GPU
with tf.Session(config=config) as sess:
sess.run(training_iterator.initializer) # initialize the iterator
step = 0 # record the steps
try:
while(True):
x_, y_ = sess.run([x_input, y_label])
step += 1
print("{} batch - {} examples".format(step, len(y_)))
print(x_, y_)
except tf.errors.OutOfRangeError:
pass
```

`dataset`

and `iterator`

to consume data. The `tf.data`

API provides compact methods to consume data and you can check following link for more comprehensive tutorials. Importing Data

We have learned essential concepts in TensorFlow. In this part, we will introduce two methods to build a basic model which has one fully connected layer. There are many ways to build a neural network. Here we guide you to use low level and high level method to construct it. Although high level method provides compact utilities in `tf.layers`

, it is better to understand the concepts of deep learning model using low level method in the beginning.

In [22]:

```
# setting
feature_dims = 784 # example with 784 features
neurons = 1024 # fully connected layer with 1024 neurons
classes = 10 # 10 classes classification problem
```

- Construct the layers in neural network from scratch.
- Define
`weights`

and`bias`

as trainable variables.

In [23]:

```
def fully_connected_layer(x_inputs, out_dim, name='fc'):
""" Low level method
x_inputs: a batch examples [batch_size, feature_dims]
out_dim: neurons in this layer.
"""
in_dim = x_inputs.shape[-1] # feature_dims
with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
weights = tf.get_variable("weights", shape=[in_dim, out_dim])
bias = tf.get_variable("bias", shape=[out_dim])
out = tf.matmul(x_inputs, weights) + bias
return out
```

In [24]:

```
tf.reset_default_graph()
x = tf.placeholder(tf.float32, shape=[None, feature_dims])
fc = fully_connected_layer(x, neurons, "fc")
out = fully_connected_layer(fc, classes, "logits")
print("[Output tensor]", fc, out, sep="\n")
print("\n[Variables] ", *tf.trainable_variables(), sep="\n")
```

- Construct the layers in neural network using
`tf.layers`

. - A high level API provided by TensorFlow.
- It contains a lot of useful methods and arguments.

In [25]:

```
tf.reset_default_graph()
x = tf.placeholder(tf.float32, shape=[None, feature_dims])
fc = tf.layers.dense(x, neurons, activation=tf.nn.relu, name="fc")
out = tf.layers.dense(fc, classes,
activation=tf.nn.softmax, name="logits")
print("[Output tensor]", fc, out, sep="\n")
print("\n[Variables] ", *tf.trainable_variables(), sep="\n")
```

Build a

**Graph**that:- Define
**dataset**and**iterator** - Build the
**model** - Define the
**loss** - Define the
**optimizer** - Other tensors or operations you need (optional)

- Define
Execute a

**session**to:- Initialize the variables
- Run the target tensors and operations

In [26]:

```
## template
with tf.Graph().as_default() as g:
""" Define dataset and iterator """
with tf.name_scope("data"):
pass
""" Build the model """
with tf.name_scope("model"):
pass
""" Define the loss """
with tf.name_scope("loss"):
pass
""" Define the optimizer """
with tf.name_scope("optimizer"):
pass
""" Other tensors or operations you need """
with tf.name_scope("accuracy"):
pass
with tf.Session(graph=g) as sess:
""" Initialize the variables """
""" Run the target tensors and operations """
pass
```

Word2Vec is a computationally-efficient model that learns to embed words into vectors. The goal is to map words that have similar meanings close to each other.

When dealing with words, a straightforward way would be treating each word as discrete symbols. For instance, `cat`

as `2`

and `dog`

as `1`

. However, these symbols carry no information about the original word, making it impossible for us to infer the relationship between cats and dogs (both are four-legged animals and both are pets) based on the symbols alone. Hence, to successfully learn the relationship between them, we might need a large amount of training data.

On the other hand, **Vector space models (VSMs)** which represent words as vectors can help overcome these obstacles. This is based on a key observation that **semantically similar words are often used interchangeably in different contexts**. For example, the words `cat`

and `dog`

may both appear in a context "___ is my favorate pet." When feeding `cat`

and `dog`

into the NN to predict their nearby words, these two words will be likely to share the same/similar hidden representation in order to predict the same/similar nearby words.

Word2Vec comes in two variants **Skip-Gram** and **CBOW (Continuous Bag-Of-Words)**. Algorithmically, these models are similar. CBOW predicts the target words using its neighborhood(context) whereas Skip-Gram does the inverse, which is to predict context words from the target words. For example, given the sentence `the quick brown fox jumped over the lazy dog`

. Defining the context words as the word to the left and right of the target word, CBOW will be trained on the dataset:

`([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox)...`

where CBOW tries to predict the target word `quick`

from the context words in brackets `[the, brown]`

, and predict `brown`

from `[quick, fox]`

and so on.
However, with Skip-Gram, the dataset becomes

`(quick, the), (quick, brown), (brown, quick), (brown, fox), ...`

where Skip-Gram predicts the context word `the`

, `brown`

with the target word `quick`

. Statistically, CBOW smoothes over a lot of the distributional information (by treating an entire context as one example). For the most part, this turns out to be a useful thing for smaller datasets. On the other hand, Skip-Gram treats each context-target pair as a new observation and is shown to be able to capture the semantics better when we have a large dataset.

Skip-gram | CBOW |

Note that the tasks described above are only used to train the neural network, we donâ€™t use the neural network for the task we trained it on. What we want is the weights of the hidden layer, the "embedding matrix".

For the rest of the tutorial, we will focus on the Skip-Gram model.

Before we start implementing a skip-gram model, we want to introduce an important techniques that reduce the computing efforts. It is called **noise constrative estimation**.

Let's motivate this idea by a naive method. If we want to create a skip-gram model, we can use the following snippets.

In [ ]:

```
# Naive idea, this is not runnable.
vocabulary_size = 10000
embedding_size = 128
batch_size = 64
with tf.Graph().as_default() as g:
center_words = tf.placeholder(tf.int32, [batch_size])
target_words = tf.placeholder(tf.int32, [batch_size])
encode_matrix = tf.get_variable("encoder",
shape=[vocabulary_size, embedding_size])
decode_matrix = tf.get_variable("decoder",
shape=[embedding_size, vocabulary_size])
embedding = tf.matmul(center_words, encode_matrix)
logits = tf.matmul(embedding, decode_matrix)
output = tf.nn.softmax(logits)
```

As we can see, it has a large number of parameters in the network and the softmax layer is a computationally intensive task.

Is there any solution to solve it?

**NCE comes to rescue** the problem of softmax.

Like most neural networks, a Skip-Gram model is trained using the maximum likelihood(ML) principle:

$$ \arg\min_{\Theta}\sum_{i=1}^{N}{-\log\mathrm{P}(\boldsymbol{y}^{(i)}\,|\,\boldsymbol{x}^{(i)},\Theta)} $$
In a multiclass task where $y=1,\cdots,V$($V$ being the vocabulary size) we usually assume

It is natural to use $V$ **Softmax units** in the output layer. That is, the activation $a_i^{(L)}$ of each unit at the last layer(layer $L$) $z_i^{(L)}$ outputs one dimension of the softmax function, a generalization of the logistic sigmoid:

The objective function then becomes:

$$\arg\min_{\Theta}\sum_{i}-\log\prod_{j}\left(\frac{\exp(z_{j}^{(L)})}{\sum_{k=1}^{{\color{red}V}}\exp(z_{k}^{(L)})}\right)^{1(y^{(i)};y^{(i)}=j)}=\arg\min_{\Theta}\sum_{i}\left[-z_{y^{(i)}}^{(L)}+\log\sum_{k=1}^{{\color{red}V}}\exp(z_{k}^{(L)})\right]$$On the other hand, for feature learning in word2vec we do not need a full probabilistic model. The CBOW and skip-gram models are instead trained using a binary classification objective (logistic regression) to discriminate the real target words $w_t$ from $k$ imaginary (noise) words $\tilde{w}$, in the same context.

Since we are sampling from two distributions, the correct word is sampled from the true distribution $P$ according to the context $c$ and noise words are sampled from $Q$, which is a noise distribution, in practice, it is said to be a uniform distribution

The cost function finally transform to: $$ C(\theta) = - \sum_{i=1}^{V} [log\frac{ exp(\,z_{i}^{(L)}\,) }{ exp(\,z_{i}^{(L)}\,) + kQ(\boldsymbol{w})} + logP(1 - \frac{ exp(\,z_{i}^{(L)}\,) }{ exp(\,z_{i}^{(L)}\,) + kQ(\boldsymbol{w})} ] $$

The dataset we use is **text8**, which is the first 100 MB of cleaned text of the English Wikipedia dump on Mar. 3, 2006. While 100MB is not enough to train really good embeddings, we can still see some interesting relations. Splitting the text by blank space, we can find that there are 17,005,207 tokens in total.

To generate batches for training, several functions defined below are used. First, we read the data into the memory and build the vocabulary using a number of most commonly seen words. Meanwhile, we build two dictionaries, a dictionary that translates words to indices and another which does the reverse. Then, for every word in the text selected as the center word, pair them with one of the context words. Finally, a python generator which generates a batch of pairs of center-target pairs.

In [28]:

```
# Download the data.
DOWNLOAD_URL = 'http://mattmahoney.net/dc/'
DATA_FOLDER = "data"
FILE_NAME = "text8.zip"
EXPECTED_BYTES = 31344016
def make_dir(path):
""" Create a directory if there isn't one already. """
try:
os.mkdir(path)
except OSError:
pass
def download(file_name, expected_bytes):
""" Download the dataset text8 if it's not already downloaded """
local_file_path = os.path.join(DATA_FOLDER, file_name)
if os.path.exists(local_file_path):
print("Dataset ready")
return local_file_path
file_name, _ = urllib.request.urlretrieve(
os.path.join(DOWNLOAD_URL, file_name), local_file_path)
file_stat = os.stat(local_file_path)
if file_stat.st_size == expected_bytes:
print('Successfully downloaded the file', file_name)
else:
raise Exception(
'File ' + file_name +
' might be corrupted. You should try downloading it with a browser.')
return local_file_path
make_dir(DATA_FOLDER)
file_path = download(FILE_NAME, EXPECTED_BYTES)
```

In [29]:

```
# Read the data into a list of strings.
def read_data(file_path):
""" Read data into a list of tokens"""
with zipfile.ZipFile(file_path) as f:
# tf.compat.as_str() converts the input into the string
data = tf.compat.as_str(f.read(f.namelist()[0])).split()
return data
vocabulary = read_data(file_path)
print('Data size', len(vocabulary))
```

In [30]:

```
import collections
# Build the dictionary and replace rare words with UNK token.
def build_dataset(words, n_words):
""" Create two dictionaries and count of occuring words
- word_to_id: map of words to their codes
- id_to_word: maps codes to words (inverse word_to_id)
- count: map of words to count of occurrences
"""
count = [['UNK', -1]]
count.extend(collections.Counter(words).most_common(n_words - 1))
word_to_id = dict() # (word, id)
# record word id
for word, _ in count:
word_to_id[word] = len(word_to_id)
id_to_word = dict(zip(word_to_id.values(), word_to_id.keys())) # (id, word)
return word_to_id, id_to_word, count
def convert_words_to_id(words, dictionary, count):
""" Replace each word in the dataset with its index in the dictionary"""
data_w2id = []
unk_count = 0
for word in words:
index = dictionary.get(word, 0)
if index == 0:
unk_count += 1
data_w2id.append(index)
count[0][1] = unk_count
return data_w2id, count
```

In [31]:

```
"""Filling 4 global variables:
# data_w2id - list of codes (integers from 0 to vocabulary_size-1).
This is the original text but words are replaced by their codes
# count - map of words(strings) to count of occurrences
# word_to_id - map of words(strings) to their codes(integers)
# id_to_word - maps codes(integers) to words(strings)
"""
vocabulary_size = 50000
word_to_id, id_to_word, count = build_dataset(vocabulary, vocabulary_size)
data_w2id, count = convert_words_to_id(vocabulary, word_to_id, count)
del vocabulary # reduce memory.
```

In [32]:

```
print('Most common words (+UNK)', count[:5])
print('Sample data: {}'.format(data_w2id[:10]))
print([id_to_word[i] for i in data_w2id[:10]])
```

In [33]:

```
# utility function
def generate_sample(center_words, context_window_size):
""" Form training pairs according to the skip-gram model."""
for idx, center in enumerate(center_words):
context = random.randint(1, context_window_size)
# get a random target before the center word
for target in center_words[max(0, idx - context) : idx]:
yield center, target
# get a random target after the center word
for target in center_words[idx + 1 : idx + context + 1]:
yield center, target
def batch_generator(data, skip_window, batch_size):
""" Group a numeric stream into batches and yield them as Numpy arrays."""
single_gen = generate_sample(data, skip_window)
while True:
center_batch = np.zeros(batch_size, dtype=np.int32)
target_batch = np.zeros([batch_size, 1], dtype=np.int32)
for idx in range(batch_size):
center_batch[idx], target_batch[idx] = next(single_gen)
yield center_batch, target_batch
```

In [34]:

```
## some training settings
training_steps = 1000
skip_step = 100
graph_dir = "graphs/word2vec_simple"
ckpt_dir = "checkpoints/word2vec_simple"
## some hyperparameters
batch_size = 128
embed_size = 128
num_sampled = 64
learning_rates = 1.0
## geneartor for `tf.data.Dataset`
def gen():
""" Return a python generator that generates batches. """
yield from batch_generator(data_w2id, 2, batch_size)
## model
def word2vec(dataset):
""" 1. Build the graph"""
with tf.name_scope("data"):
# one_shot_iterator doesn't need to be initialized
iterator = dataset.make_one_shot_iterator()
# get the input and output
center_words, target_words = iterator.get_next()
with tf.name_scope('embed'):
embedding_matrix = tf.get_variable("embedding_matrix",
shape=[vocabulary_size, embed_size])
embedding = tf.nn.embedding_lookup(embedding_matrix,
center_words, name='embedding')
with tf.name_scope('loss'):
initializer = tf.truncated_normal_initializer(stddev=1.0 / (embed_size ** 0.5))
nce_weight = tf.get_variable('nce_weight',
shape=[vocabulary_size, embed_size],
initializer=initializer)
nce_bias = tf.get_variable('nce_bias', shape=[vocabulary_size],
initializer=tf.zeros_initializer)
# define loss function to be NCE loss function
loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight,
biases=nce_bias,
labels=target_words,
inputs=embedding,
num_sampled=num_sampled,
num_classes=vocabulary_size), name='loss')
with tf.name_scope('optimizer'):
optimizer = tf.train.GradientDescentOptimizer(learning_rates).minimize(loss)
## store checkpoints
saver = tf.train.Saver()
""" 2. Execute a session """
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # avoids occupying full memory of GPU
with tf.Session(config=config) as sess:
sess.run(tf.global_variables_initializer())
ckpt = tf.train.get_checkpoint_state(ckpt_dir)
# if that checkpoint exists, restore from checkpoint
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
# we use this to calculate late average loss in the last SKIP_STEP steps
total_loss = 0.0
writer = tf.summary.FileWriter(graph_dir, sess.graph)
for index in range(1, training_steps+1):
try:
loss_batch, _ = sess.run([loss, optimizer])
total_loss += loss_batch
if index % skip_step == 0:
print('Average loss at step {}: {:5.1f}'.format(
index, total_loss / skip_step))
total_loss = 0.0
saver.save(sess,
os.path.join(ckpt_dir, "model"),
index)
except tf.errors.OutOfRangeError:
pass
writer.close()
```

In [35]:

```
tf.reset_default_graph()
dataset = tf.data.Dataset.from_generator(gen, (tf.int32, tf.int32),
(tf.TensorShape([batch_size]),
tf.TensorShape([batch_size, 1])))
word2vec(dataset)
```

In [ ]:

```
```

- Submit to iLMS. Filename:
`Lab10_{student_id}.ipynb`

- Deadline: 2018/10/25 15:30

**[Problem Definition]**

- Build a model to challenge this task.
- Input: binary data of shape=
`[2]`

- Output: xor result of this input, shape=
`[1]`

**[Requirements]**

- Show the code of
**graph**and**session**you build. - Use
**low level API method**(define the weights and bias from scratch like above) - Show the accuracy of these 4 examples (it should be
`100%`

) - Show the
**values of weights and bias**you used.

**[Notes]**

- The model architecture are not constrainted.
- It is an easy nonlinear function that neural network can fit rapidly.

In [36]:

```
# xor task
xor_data = np.array([[1, 0],
[0, 1],
[1, 1],
[0, 0]])
xor_label = np.array([[1], [1], [0], [0]])
```

In [37]:

```
with tf.Graph().as_default() as g:
x_input = tf.placeholder(tf.float32, [None, 2])
y_label = tf.placeholder(tf.float32, [None, 1])
# start building your model and meet the requirements
# from here
with tf.Session(graph=g) as sess:
# start run the seesion and meet the requrements
# from here
pass
```

**[Problem Definition]**

- Build a
**fully connected network**to challenge this task. - Input: an image of shape=
`[784]`

- Output: a digit of this image, shape=
`[10]`

**[Requirements]**

- Show the code of
**graph**and**session**you build. - Use
`tf.data.Dataset`

and`tf.data.Iterator`

to extract data. - Use low level API method to build the model (define the weights and bias from scratch like above).
- The accuracy on
`mnist.test`

should be at least`95%`

.

**[Notes]**

- the
**hyperparameters**are**not constrainted**(e.g.`num_neurons_in_one_layer`

,`how_many_layers`

,`learning_rates`

,`training_epochs`

,`batch_size`

) - the
**optimizer**are**not constrainted**. `mnist.train`

is all you can use to train the model.`mnist.validation`

are just used to be validated.

In [38]:

```
from tensorflow.examples.tutorials.mnist import input_data
# load mnist data
mnist = input_data.read_data_sets("data/mnist", one_hot=True)
```

In [39]:

```
print("training data size: {}".format(mnist.train.num_examples))
print("validation data size: {}".format(mnist.validation.num_examples))
print("testing data size: {}".format(mnist.test.num_examples))
print("Shape of image: {}".format(mnist.train.images[0].shape))
print("Shape of label: {}".format(mnist.train.labels[0].shape))
```

In [40]:

```
n = 5
plt.figure(figsize=(18,18))
for i in range(5):
plt.subplot(n, n, i+1)
plt.imshow(mnist.train.images[i].reshape(28,28), cmap='gray')
plt.title(np.argmax(mnist.train.labels[i]))
plt.xticks([])
plt.yticks([])
plt.show()
```

In [41]:

```
with tf.Graph().as_default() as g:
## meet the requirements
## then you will
pass
with tf.Session(graph=g) as sess:
## just do it
pass
```

In [ ]:

```
```