LSTM neural network for sequence learning

Published on 2017-11-26 22:00

22 min read

In category

Artificial Intelligence

In 1996, during my last year in High School, I borrowed a book of a friend about neural networks. It explained how a two layer perceptron network could learn the XOR function. Back then I tried implementing the formulas and was able to do the feed-forward calculations. The training algorithm however still eluded me. Being able to perform forward calculations was already very exciting. I created a windows 95 screen save which would fill the screen with the output of a randomized neural network. The output images we're very interesting. Especially when replacing the activation functions of the network by exotic ones such as sin(x), abs(x) etc. (Although I lost the source code, you can still download it here)

At the time it seemed that Neural networks were just another statistical method to interpolate data. Furthermore limited training data and the problem of vanishing gradients limited their usefulness. Fast forward to 2017. Massive amounts of training data and computing power are available. A number of relatively small improvements in the basic neural network algorithms have made it possible to train networks consisting of many more layers. These so-called deep neural networks have fueled progress and interest in Artificial Intelligence development.

One particular innovation that caught my attention is the LSTM neural network architecture. This architecture solves the issue of vanishing gradients for Recurrent Neural Networks (RNN). LSTM networks are especially suited to perform analysis of sequences and time series. Some interesting links:

In this first test I wanted to experience implementing a sine wave predictor. using Tensor Flow. It's a toy example. Due to the periodic nature of the sine wave the train, dev, and test set overlap. This limits the possibilities to check if the network can generailize.

This notebook can be downloaded from my git repository.

import plotly
from plotly.graph_objs import Scatter, Layout
import numpy as np
import tensorflow as tf
import sys
plotly.offline.init_notebook_mode(connected=True)
import IPython.display

Training data

The following cell generates the training data. I decided to add some noise to the sine wave which forces some regularization.

sample_length = 50001
time_per_sample = 0.01
signal_time = np.linspace(num=sample_length,start = 0, stop = sample_length * time_per_sample )
signal_amp = np.sin(signal_time*2*np.pi) + np.random.normal(size=sample_length)*0.02
    #np.sin(2+signal_time*1.7*np.pi)*0.5 + \
    #np.sin(1+signal_time*2.2*np.pi) + \

#plot part of the signal, just to see what's in there
s_i = 0
e_i = s_i + 100
x = plotly.offline.iplot({
    "data": [Scatter(x=signal_time[s_i:e_i],y=signal_amp[s_i:e_i])],
    "layout": Layout(title="")
    
})

#Setup general  hyper parameters

#Unroll the RNN to sequence_length timesteps
sequence_length = 100
#The number timesteps to predict
prediction_length = 1
#The number of features per input time step
input_feature_count = 1
#The number of featuers per prediction
output_feature_count = 1

#the number of LSTM nodes per layer of the network
hidden_count_per_layer = [16,16]

tf.reset_default_graph()

#inputs is a vector of (batch_size, sequence_length, feature_count)
inputs = tf.placeholder(tf.float32, 
                        [None, sequence_length, input_feature_count], 
                        name = 'inputs')
#targets will be an example to train. 
#It will be filled with the value of the next time step. 
#Size (batch_size, feature count)
targets = tf.placeholder(tf.float32, 
                         [None, output_feature_count], 
                         name = 'targets')
#Apply drop out regularization with a a probability of keep_prob 
#to keep a connection 
keep_prob = tf.placeholder(tf.float32, name = 'keep')
#Used a learning rate for AdamOptimzer
learning_rate = tf.placeholder(tf.float32, name = 'learning_rate')

Defining the LSTM multi layer network

Define a network by creating a number of layers. In most examples I found all layers used equal node counts. In this example you can specify the number of neurons per layer through the 'hiddencountper_layer' array.

layers = []


for hidden_count in hidden_count_per_layer:
    layer =  tf.nn.rnn_cell.LSTMCell(hidden_count, state_is_tuple = True)
    layer_with_dropout = tf.nn.rnn_cell.DropoutWrapper(layer,
                                          input_keep_prob=keep_prob,
                                          output_keep_prob=1.0)
    layers.append(layer)
hidden_network = tf.nn.rnn_cell.MultiRNNCell(layers, state_is_tuple = True)

Packing/Unpacking the LSTM network state

'stateistuple = True' means that the LSTM State data structure will be a Tuple. Although inconvenient to work with this seems to be the future default. I will introduce some functions which help to work more easily with these state tuples.

In order to use the LSTM network to generate a predicted sequence of arbitrary length you need to store the state of the network. The output state after predicting a sample should be fed back in to the network when predicting the next sample.

The LSTM implementation in Tensor flow uses a LSTMStateTuple(c,h) data structure. The idea is to pack this LSTMStateTuple(c,h) into a 2D vector of size (batch_size, states).

There were some challenges implementing these packing/unpacking functions. Especially you want to avoid them beeing dependent on a specific batchsize. During building of the computation graph the batchsize should be None.

There is a pointer on how to use dynamic batch_sizes and packing/unpacking states here. I made some changes to clarify these functions.

def get_network_state_size(network):
    """Returns the number of states variables in the network"""
    states = 0
    for layer_size in hidden_network.state_size:
        states += layer_size[0] # LSTMState tuple element c
        states += layer_size[1] # LSTMState tuple element h
    return states

def pack_state_tuple(state_tuple, indent=0):
    """Returns a (batch_size,network_state_size) matrix of the states in the network
        state_tupel = the states obtained from  _ , state = tf.nn.dynamic_rnn(...)
    """
    if isinstance(state_tuple, tf.Tensor) or not hasattr(state_tuple, '__iter__'):
        #The LSTMSTateTuple contains 2 Tensors
        return state_tuple
    else:
        l = []
        #an unpacked LSTM network is tuple of layer size, each element of the tuple is an LSTMStateTuple
        #state_tupel is either the tuple of LSTMStateTuples or it is a LSTMSTateTuple (via recursive call)
        for item in state_tuple:
            # item is either an LSTMStateTuple (top level call)
            # or it is an element of the LSTMStateTuple (first recursive call)
            i = pack_state_tuple(item, indent+2)
            l.append(i)
        
        #convert the list of [Tensor(bsz,a), Tensor(bsz,b), ...] Into one long Tensor (bsz, a-b-c-...)
        return tf.concat(l,1)

def unpack_state_tuple(state_tensor, sizes):
    """The inverse of pack, given a packed_states vector of (batch_size,x) return the LSTMStateTuple 
    datastructure that can be used as initial state for tf.nn.dynamic_rnn(...) 
        sizes is the network state size list (cell.state_size)
    """

    def _unpack_state_tuple( sizes_, offset_, indent):
        if isinstance(sizes_, tf.Tensor) or not hasattr(sizes_, '__iter__'): 
            #get a small part (batch size, c or h size of LSTMStateTuple) of the packed state vector of shape (batch size, network states)
            return tf.reshape(state_tensor[:, offset_ : (offset_ + sizes_) ], (-1, sizes_)), offset_ + sizes_
        else:
            result = []
            #Top level: sizes is a tuple of size network layers, each element of the tuple is an LSTMStateTuple(c size, h size)
            #Recursive call: sizes_ is a LSTMStateTuple
            for size in sizes_:
                #size is an LSTMStateTuple (toplevel)
                #or size is c size or h size (recursive call)
                s, offset_ = _unpack_state_tuple( size, offset_, indent+2)
                result.append(s)
            if isinstance(sizes_, tf.nn.rnn_cell.LSTMStateTuple):
                #end of recursive call
                #Build a LSTMStateTuple using the c size and h size elements in the result list
                return tf.nn.rnn_cell.LSTMStateTuple(*result), offset_
            else:
                # end of toplevel call
                # create a tuple of size network layers. Result is a list of LSTMStateTuple
                return tuple(result), offset_
    return _unpack_state_tuple( sizes, 0,0)[0]

Testing the packing/unpacking functions

Next I wrote a check to see if the pack and unpack functions are indeed each others inverse. The vectors should be packed/unpacked in the correct order. The idea is to create 'packed' vector containing the values 0..n. Then unpack and repack. The output value should be equal to the original vector.

#Test pack and unpack

#create a placeholder in which we can feeisd packed states (vector of (batch_size, states) as initial_state
state_packed_in = tf.placeholder(
    tf.float32, 
    (None,get_network_state_size(hidden_network)), 
    name="state_packed_1")


#Unpack the packed states
state_unpacked_out = unpack_state_tuple(state_packed_in,hidden_network.state_size)
#Repack the unpacked states
state_packed_out = pack_state_tuple(state_unpacked_out)


inputs_batch_size = 40
a_batch_of_inputs = np.zeros((inputs_batch_size, sequence_length, input_feature_count))

#create an initial state vector and fill it with test data
an_initial_state = np.zeros((inputs_batch_size*get_network_state_size(hidden_network),1))
an_initial_state[:,0] = np.linspace(start=0,stop=an_initial_state.shape[0]-1,num=an_initial_state.shape[0])
#reshape it as an packed state 
an_initial_state_packed = np.reshape(an_initial_state, (inputs_batch_size,get_network_state_size(hidden_network)))


init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    up,p = sess.run([state_unpacked_out, state_packed_out],  feed_dict={state_packed_in: an_initial_state_packed})
    # compare the original packed states with the ones the were unpacked and then repacked
    diff = an_initial_state_packed - p
    # should return 0
    print("diff",np.sum(np.abs(diff)))

diff 0.0

Initial state

Create a placeholder for initial packed states. This makes it possible to supply the initial states to the LSTM network as a simple vector. Then add a unpack operation to the computation graph. This outputs the initial state as a LSTMTuple vector which can be used by the dynamic RNN function later on.

sz = get_network_state_size(hidden_network)
print("states in network", sz)


initial_state_packed = tf.placeholder(
    tf.float32, 
    (None,sz), 
    name="initial_state")

state_unpacked = unpack_state_tuple(initial_state_packed,hidden_network.state_size)

states in network 64

Forward propagation

Define the forward calculations by using the dynamic_rnn function. This function needs and outputs the network states in unpacked format.

#out_weights=tf.Variable(tf.random_normal([hidden_count_per_layer[-1],output_feature_count]))
#out_bias=tf.Variable(tf.random_normal([output_feature_count]))
print("inputs ",inputs.shape)
outputs, state_unpacked_network_out = tf.nn.dynamic_rnn(hidden_network, inputs, initial_state = state_unpacked, dtype=tf.float32) #, initial_state=rnn_tuple_state, )
state_packed_network_out = pack_state_tuple(state_unpacked_network_out)
print("packed state", state_packed_network_out.shape)
print("outputs before transpose", outputs.shape)
outputs = tf.transpose(outputs, [1, 0, 2])
print("outputs after transpose", outputs.shape)
#last_output = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
last_output =  outputs[outputs.shape[0]-1,:,:]
print("last output", last_output.shape)
                                   
#out_size = target.get_shape()[2].value
predictions = tf.contrib.layers.fully_connected(last_output, output_feature_count, activation_fn=None)
print("prediction", predictions.shape)
print("targets", targets.shape)

inputs  (?, 100, 1)
packed state (?, 64)
outputs before transpose (?, 100, 16)
outputs after transpose (100, ?, 16)
last output (?, 16)
prediction (?, 1)
targets (?, 1)

Backward pass, training

Define the loss as the total of the squared differences between the the last output (prediction) and the target.

loss = tf.reduce_sum(tf.squared_difference(predictions, targets))

opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

Defining the train, dev and test set

Generally you would define 3 sets:

A set to train on: Train set
A set to tune the hyper parameters on: Dev set
A set to test the generalization performance of the network: Test set

In the case of sine wave this is a bit useless. The dev set and test set overlap because of the periodic nature of the sine wave. I added noise to the source signal to make the train, dev and test set at least partly independent.

start_indices = np.linspace(
    0,
    sample_length-sequence_length-prediction_length-1,
    sample_length-sequence_length-prediction_length-1, dtype= np.int32)

#When you have many examples then you can get away with tiny sizes for the dev and test set.
dev_size_perc = 0.20
test_size_perc = 0.20
batch_size = 128 #512 

dev_size = int(np.floor(start_indices.shape[0] * dev_size_perc))
test_size  = int(np.floor(start_indices.shape[0] * test_size_perc))
train_size = start_indices.shape[0] - test_size - dev_size
train_batch_count = int(np.floor(train_size / batch_size))
dev_batch_count = int(np.floor(dev_size / batch_size))
test_batch_count = int(np.floor(test_size / batch_size))

print("dataset size %d" %(start_indices.shape[0]))
print("%d Examples (%d batches) in train set" %(train_size, train_batch_count))
print("%d Examples (%d batches) in dev set" %(dev_size,dev_batch_count))
print("%d Examples (%d batches) in test set" %(test_size,test_batch_count))

dataset size 49899
29941 Examples (233 batches) in train set
9979 Examples (77 batches) in dev set
9979 Examples (77 batches) in test set

Creating batches

The network will be trained using mini batches. This speeds up training because a network training step is performed after each mini batch in contrast to updating after presenting the complete training set.

#A batch of examples can start at an arbitrary index in the source signal. 
# Shuffle the indices to fill the train. dev and test set with different sequences

np.random.shuffle (start_indices)
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]

def get_batch(batch_index, indexes, size=batch_size):
    batch_start_indexes = indexes[batch_index*size:batch_index*size+size]
    batch_inputs = np.zeros((size,sequence_length, input_feature_count))
    batch_targets = np.zeros((size,prediction_length))
    for i in range(size):
        se = batch_start_indexes[i]
        part = signal_amp[se:se+sequence_length]
        batch_inputs[i,0:sequence_length,0] = part
        batch_targets[i,0] = signal_amp[se+sequence_length+1]

    return batch_inputs,batch_targets

batch_inputs,batch_targets = get_batch(train_batch_count-1,train_indices)
print(batch_inputs.shape,batch_targets.shape)

example_inputs = batch_inputs[0,:,:]
example_targets =  batch_targets[0,:]
print(example_inputs.shape)

#plot a single example
b_i = 1
b_s = batch_inputs[b_i,0:sequence_length,0]
plotly.offline.iplot({
    "data": [Scatter(y=b_s)],
    "layout": Layout(title="")
})

(128, 100, 1) (128, 1)
(100, 1)

Test training using a single batch

In the next cell I check if I can train the network on one single batch. Just to check if the optimizer is indeed able to train the network. Successful training should decrease the loss. In the output you will see the loss decreasing (first column)

np.random.shuffle (start_indices)
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]

zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))


init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

    np.random.shuffle (train_indices)
    
    batch_inputs,batch_targets = get_batch(0, train_indices)
    print("batch input shape", batch_inputs.shape)
    #v_outputs, v_state = sess.run([outputs,state], feed_dict={inputs: batch_inputs, targets: batch_targets})
    v_predictions, v_state_unpacked = sess.run([predictions, state_unpacked_network_out], 
                                      feed_dict={
                                          inputs: batch_inputs, 
                                          targets: batch_targets,
                                          initial_state_packed: zero_state_packed
                                      })
    print(v_predictions.shape)
    print(v_predictions[0],batch_targets[0])
    for i in range(0,120):
        v_predictions, v_outputs, v_state_unpacked, v_loss, v_opt = sess.run(
            [predictions, outputs, state_unpacked_network_out, loss, opt], 
            feed_dict={
                learning_rate: 0.02, 
                inputs: batch_inputs, 
                targets: batch_targets,
                state_unpacked: v_state_unpacked
            }) #})
        if i % 10 == 0:
            print(v_loss,v_predictions[0],batch_targets[0])

batch input shape (128, 100, 1)
(128, 1)
[ 0.04703456] [-0.66767944]
69.1535 [ 0.04703381] [-0.66767944]
2.39038 [-0.71438432] [-0.66767944]
0.769588 [-0.72163767] [-0.66767944]
0.238517 [-0.71686089] [-0.66767944]
0.133357 [-0.67176348] [-0.66767944]
0.130659 [-0.67786527] [-0.66767944]
0.102174 [-0.66787326] [-0.66767944]
0.0939764 [-0.68473738] [-0.66767944]
0.0897313 [-0.6789692] [-0.66767944]
0.0882396 [-0.6785149] [-0.66767944]
0.0865416 [-0.68145192] [-0.66767944]
0.0851147 [-0.67975515] [-0.66767944]

Training and Testing

Finally we can train and test the network. The training consists of 'epochs' during which all training batches are presented. After presenting a single training batch the network is immediately optimized. After an epoch the loss is calculated over the dev set and printed.

Next a graph is plotted which shows an example of the network predicting a sine wave. The prediction is based on first 'priming' the network by presenting part of a sine.

After completing training on a number of epochs. The last predictions is executed over a longer time period.

np.random.shuffle (start_indices)

#create a randomized train, dev and test set
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]


#initialization of the network states for a single mini batch, by setting them to zero
#You could also initialize by using random states.
batch_zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))


epoch_count = 10

#Store the performance over the dev set in loss_results
loss_results = np.zeros((epoch_count,2))

def get_loss(set_name, bsz, example_set_indices):
    """Calculate a score over all batches in a set"""
    epoch_loss = 0.0
    for example_index in range(bsz):
        batch_inputs,batch_targets = get_batch(example_index, example_set_indices)

        batch_loss = sess.run(loss,feed_dict={
            inputs:batch_inputs,
            targets:batch_targets,
            initial_state_packed: batch_zero_state_packed
        })
        if example_index % 20 == 0:
            print("  %s results batch %d, loss %s" %(  set_name, example_index, str(batch_loss)))  

        epoch_loss += batch_loss
    return epoch_loss / len(example_set_indices)

def generate_graph(graph_size=200):
    """Use the network to generate a graph"""
    
    #The network will be primed using prime_size samples of the original signal
    prime_size = 50
    prime_signal_start_i = 0
    
    #put prime_size samples of the original signal in tmp_singal
    orig_signal = np.zeros((graph_size,1))
    tmp_signal = np.zeros((graph_size,1))
    tmp_signal[0:prime_size,0] = signal_amp[prime_signal_start_i:(prime_signal_start_i+prime_size)]
    orig_signal[0:graph_size,0] = signal_amp[prime_signal_start_i:(prime_signal_start_i+graph_size)]
    
    #create a sequence for a batch_size of 1
    seq = np.zeros((1,sequence_length,1))
    seq_state_packed = np.zeros((1, get_network_state_size(hidden_network)))
    
    _state_unpacked = None
    #generate the graph
    for end in range(prime_size, graph_size):
        #get a sequence to present to the network
        seq[0,:,0] = tmp_signal.take(range((end-sequence_length),end), mode='wrap')
        
        #get a prediction
        seq_state_packed , _prediction = sess.run(
            [state_packed_network_out, predictions[0,0]], 
            feed_dict={
                initial_state_packed: seq_state_packed,
                inputs: seq})
        #put the prediction in the graph
        tmp_signal[end,0] = _prediction
        sys.stdout.write('.')
        sys.stdout.flush()
    print("")
    plotly.offline.iplot({
       "data": [Scatter(name="predicted",y=tmp_signal[:,0]),Scatter(name="original",y=orig_signal[:,0])],
       "layout": Layout(title="")})


init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)



    for epoch in range(0,epoch_count):
        print("Epoch %d" %(epoch))
        #in every epoch go through the training set in a different order
        np.random.shuffle (train_indices)
        print("Train")
        for ti in range(train_batch_count):
            batch_inputs,batch_targets = get_batch(ti, train_indices)

            #train the network
            #I reset the state to zero for each batch. 
            batch_train_loss, _ = sess.run([loss, opt], 
                                           feed_dict={
                                               learning_rate: 0.00005, 
                                               inputs: batch_inputs, 
                                               targets: batch_targets,
                                               initial_state_packed: batch_zero_state_packed
                                           })
            sys.stdout.write('.')
            sys.stdout.flush()
        print("")
        epoch_train_loss = get_loss("Train", train_batch_count, train_indices)
        print("Training results epoch %d, loss %s" %( epoch, str(epoch_train_loss)))
        epoch_dev_loss = get_loss("Dev", dev_batch_count, dev_indices)    
        print("Dev results epoch %d, loss %s" %( epoch, str(epoch_dev_loss)))  
        loss_results[epoch,0] = epoch_train_loss
        loss_results[epoch,1] = epoch_dev_loss
        ti += 1
        generate_graph()
        
    #generate a last long graph
    generate_graph(graph_size=1000)
    
    plotly.offline.iplot({
       "data": [Scatter(name="loss train",y=loss_results[:,0]),Scatter(name="loss dev",y=loss_results[:,1])],
       "layout": Layout(title="")})

Epoch 0
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 54.9094
  Train results batch 20, loss 52.7554
  Train results batch 40, loss 57.1101
  Train results batch 60, loss 56.6699
  Train results batch 80, loss 52.1481
  Train results batch 100, loss 55.9796
  Train results batch 120, loss 58.3971
  Train results batch 140, loss 44.4456
  Train results batch 160, loss 48.3201
  Train results batch 180, loss 48.1845
  Train results batch 200, loss 55.2113
  Train results batch 220, loss 54.7782
Training results epoch 0, loss 0.420098819824
  Dev results batch 0, loss 51.6062
  Dev results batch 20, loss 53.5696
  Dev results batch 40, loss 50.2322
  Dev results batch 60, loss 51.0714
Dev results epoch 0, loss 0.407184930349
......................................................................................................................................................

Epoch 1
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 21.4379
  Train results batch 20, loss 20.0213
  Train results batch 40, loss 21.2412
  Train results batch 60, loss 19.3035
  Train results batch 80, loss 22.9997
  Train results batch 100, loss 22.1121
  Train results batch 120, loss 19.6073
  Train results batch 140, loss 20.3425
  Train results batch 160, loss 19.6809
  Train results batch 180, loss 25.0082
  Train results batch 200, loss 22.1662
  Train results batch 220, loss 24.211
Training results epoch 1, loss 0.163680443564
  Dev results batch 0, loss 19.6441
  Dev results batch 20, loss 20.6591
  Dev results batch 40, loss 21.2059
  Dev results batch 60, loss 19.5042
Dev results epoch 1, loss 0.15843713808
......................................................................................................................................................

Epoch 2
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 1.25657
  Train results batch 20, loss 1.24823
  Train results batch 40, loss 1.5213
  Train results batch 60, loss 1.51712
  Train results batch 80, loss 1.37371
  Train results batch 100, loss 1.52443
  Train results batch 120, loss 1.04627
  Train results batch 140, loss 1.12556
  Train results batch 160, loss 1.01674
  Train results batch 180, loss 1.34206
  Train results batch 200, loss 1.23301
  Train results batch 220, loss 1.15482
Training results epoch 2, loss 0.00969723457763
  Dev results batch 0, loss 1.21576
  Dev results batch 20, loss 1.2668
  Dev results batch 40, loss 1.44446
  Dev results batch 60, loss 1.07501
Dev results epoch 2, loss 0.00952362719155
......................................................................................................................................................

Epoch 3
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 0.179961
  Train results batch 20, loss 0.153622
  Train results batch 40, loss 0.207009
  Train results batch 60, loss 0.189542
  Train results batch 80, loss 0.156345
  Train results batch 100, loss 0.205141
  Train results batch 120, loss 0.145814
  Train results batch 140, loss 0.153639
  Train results batch 160, loss 0.185179
  Train results batch 180, loss 0.170148
  Train results batch 200, loss 0.184466
  Train results batch 220, loss 0.198975
Training results epoch 3, loss 0.00135172683512
  Dev results batch 0, loss 0.182201
  Dev results batch 20, loss 0.171388
  Dev results batch 40, loss 0.174947
  Dev results batch 60, loss 0.157753
Dev results epoch 3, loss 0.00133814723993
......................................................................................................................................................

Epoch 4
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 0.0727791
  Train results batch 20, loss 0.0689376
  Train results batch 40, loss 0.0716537
  Train results batch 60, loss 0.0817372
  Train results batch 80, loss 0.0861643
  Train results batch 100, loss 0.0728834
  Train results batch 120, loss 0.0586975
  Train results batch 140, loss 0.065918
  Train results batch 160, loss 0.0647993
  Train results batch 180, loss 0.0598816
  Train results batch 200, loss 0.0765543
  Train results batch 220, loss 0.0749957
Training results epoch 4, loss 0.000547558254865
  Dev results batch 0, loss 0.0717915
  Dev results batch 20, loss 0.0704993
  Dev results batch 40, loss 0.0613825
  Dev results batch 60, loss 0.0708818
Dev results epoch 4, loss 0.00054083534219
......................................................................................................................................................

Epoch 5
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 0.0688844
  Train results batch 20, loss 0.0541076
  Train results batch 40, loss 0.065995
  Train results batch 60, loss 0.056871
  Train results batch 80, loss 0.0703863
  Train results batch 100, loss 0.0622354
  Train results batch 120, loss 0.0632361
  Train results batch 140, loss 0.0497587
  Train results batch 160, loss 0.0681106
  Train results batch 180, loss 0.0735378
  Train results batch 200, loss 0.0551927
  Train results batch 220, loss 0.0586691
Training results epoch 5, loss 0.000470599653114
  Dev results batch 0, loss 0.0599357
  Dev results batch 20, loss 0.060271
  Dev results batch 40, loss 0.0517052
  Dev results batch 60, loss 0.0637929
Dev results epoch 5, loss 0.000465033017603
......................................................................................................................................................

Epoch 6
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 0.053541
  Train results batch 20, loss 0.0699349
  Train results batch 40, loss 0.0553604
  Train results batch 60, loss 0.0620093
  Train results batch 80, loss 0.0647623
  Train results batch 100, loss 0.0518999
  Train results batch 120, loss 0.0632549
  Train results batch 140, loss 0.0673768
  Train results batch 160, loss 0.0645557
  Train results batch 180, loss 0.0533836
  Train results batch 200, loss 0.0603966
  Train results batch 220, loss 0.0555538
Training results epoch 6, loss 0.000448670988889
  Dev results batch 0, loss 0.0567565
  Dev results batch 20, loss 0.0563003
  Dev results batch 40, loss 0.0496434
  Dev results batch 60, loss 0.0620466
Dev results epoch 6, loss 0.000444287580372
......................................................................................................................................................

Epoch 7
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 0.0564441
  Train results batch 20, loss 0.0662476
  Train results batch 40, loss 0.0497547
  Train results batch 60, loss 0.0620312
  Train results batch 80, loss 0.0595999
  Train results batch 100, loss 0.0605202
  Train results batch 120, loss 0.0603222
  Train results batch 140, loss 0.0491505
  Train results batch 160, loss 0.0579644
  Train results batch 180, loss 0.0600865
  Train results batch 200, loss 0.0482739
  Train results batch 220, loss 0.0515458
Training results epoch 7, loss 0.000437071640663
  Dev results batch 0, loss 0.0552906
  Dev results batch 20, loss 0.0541909
  Dev results batch 40, loss 0.0487828
  Dev results batch 60, loss 0.061333
Dev results epoch 7, loss 0.000433660682277
......................................................................................................................................................

Epoch 8
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 0.058103
  Train results batch 20, loss 0.0501114
  Train results batch 40, loss 0.0478091
  Train results batch 60, loss 0.0674866
  Train results batch 80, loss 0.0530147
  Train results batch 100, loss 0.0548249
  Train results batch 120, loss 0.0628589
  Train results batch 140, loss 0.0645167
  Train results batch 160, loss 0.0632368
  Train results batch 180, loss 0.0476007
  Train results batch 200, loss 0.0471682
  Train results batch 220, loss 0.0493235
Training results epoch 8, loss 0.00043170396977
  Dev results batch 0, loss 0.055131
  Dev results batch 20, loss 0.0531145
  Dev results batch 40, loss 0.0483831
  Dev results batch 60, loss 0.0610029
Dev results epoch 8, loss 0.000428519506533
......................................................................................................................................................

Epoch 9
Train
.........................................................................................................................................................................................................................................
  Train results batch 0, loss 0.0510024
  Train results batch 20, loss 0.0479287
  Train results batch 40, loss 0.0584473
  Train results batch 60, loss 0.0485359
  Train results batch 80, loss 0.0615271
  Train results batch 100, loss 0.0570304
  Train results batch 120, loss 0.0464874
  Train results batch 140, loss 0.0617962
  Train results batch 160, loss 0.0486729
  Train results batch 180, loss 0.0585998
  Train results batch 200, loss 0.045459
  Train results batch 220, loss 0.0659634
Training results epoch 9, loss 0.000429875803059
  Dev results batch 0, loss 0.0545206
  Dev results batch 20, loss 0.0513194
  Dev results batch 40, loss 0.0483249
  Dev results batch 60, loss 0.0622594
Dev results epoch 9, loss 0.000427906684893
......................................................................................................................................................

......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Conclusion, next steps

When playing around with the parameters I found out that often the network performed better on the dev set than the train set! Clearly overlapping dev and train sets are nonsensical. Also the network seems to optimize on predicting the amplitude but not frequency. Probably the frequency can be better predicted by training on more then only one value (last_output).

This example gave me a good overview of some TensorFlow features. More generally there are so many hyper parameters to choose when building a network architecture:

basic parameters: network size, learning rate, drop-out, optimization method
how to choose initial state
predict one sample, or multiple samples
loss function

It would be interesting to automatically tune the hyper-parameters as well. Maybe using genetic networks?

I am planning to use the approach in this article to process sampled sound waves. Things that cross my mind:

Apply on raw audio
- Sample microphone via WebAudo, send the samples to the notebook via WebSocket, analyze and feed the result back
Implement a phase vocodor, instead of raw audio, input the frequency features
Achieve something like this
Process a MIDI file
Generate text
Train on multi-feature sequence (eg. audio and corresponding text)

Stay tuned....