LSTM neural network for sequence learning
Published on 2017-11-26 22:00
22 min read
In category
Artificial Intelligence
In 1996, during my last year in High School, I borrowed a book of a friend about neural networks. It explained how a two layer perceptron network could learn the XOR function. Back then I tried implementing the formulas and was able to do the feed-forward calculations. The training algorithm however still eluded me. Being able to perform forward calculations was already very exciting. I created a windows 95 screen save which would fill the screen with the output of a randomized neural network. The output images we're very interesting. Especially when replacing the activation functions of the network by exotic ones such as sin(x), abs(x) etc. (Although I lost the source code, you can still download it here)
At the time it seemed that Neural networks were just another statistical method to interpolate data. Furthermore limited training data and the problem of vanishing gradients limited their usefulness. Fast forward to 2017. Massive amounts of training data and computing power are available. A number of relatively small improvements in the basic neural network algorithms have made it possible to train networks consisting of many more layers. These so-called deep neural networks have fueled progress and interest in Artificial Intelligence development.
One particular innovation that caught my attention is the LSTM neural network architecture. This architecture solves the issue of vanishing gradients for Recurrent Neural Networks (RNN). LSTM networks are especially suited to perform analysis of sequences and time series. Some interesting links:
- article about text generation kernel code
- fake news generator
- LSTM architecture
- LSTM explanation
- Modeling attention
- Convolutional network for speech synthesis
In this first test I wanted to experience implementing a sine wave predictor. using Tensor Flow. It's a toy example. Due to the periodic nature of the sine wave the train, dev, and test set overlap. This limits the possibilities to check if the network can generailize.
This notebook can be downloaded from my git repository.
import plotly
from plotly.graph_objs import Scatter, Layout
import numpy as np
import tensorflow as tf
import sys
plotly.offline.init_notebook_mode(connected=True)
import IPython.display
Training data
The following cell generates the training data. I decided to add some noise to the sine wave which forces some regularization.
sample_length = 50001
time_per_sample = 0.01
signal_time = np.linspace(num=sample_length,start = 0, stop = sample_length * time_per_sample )
signal_amp = np.sin(signal_time*2*np.pi) + np.random.normal(size=sample_length)*0.02
#np.sin(2+signal_time*1.7*np.pi)*0.5 + \
#np.sin(1+signal_time*2.2*np.pi) + \
#plot part of the signal, just to see what's in there
s_i = 0
e_i = s_i + 100
x = plotly.offline.iplot({
"data": [Scatter(x=signal_time[s_i:e_i],y=signal_amp[s_i:e_i])],
"layout": Layout(title="")
})
#Setup general hyper parameters
#Unroll the RNN to sequence_length timesteps
sequence_length = 100
#The number timesteps to predict
prediction_length = 1
#The number of features per input time step
input_feature_count = 1
#The number of featuers per prediction
output_feature_count = 1
#the number of LSTM nodes per layer of the network
hidden_count_per_layer = [16,16]
tf.reset_default_graph()
#inputs is a vector of (batch_size, sequence_length, feature_count)
inputs = tf.placeholder(tf.float32,
[None, sequence_length, input_feature_count],
name = 'inputs')
#targets will be an example to train.
#It will be filled with the value of the next time step.
#Size (batch_size, feature count)
targets = tf.placeholder(tf.float32,
[None, output_feature_count],
name = 'targets')
#Apply drop out regularization with a a probability of keep_prob
#to keep a connection
keep_prob = tf.placeholder(tf.float32, name = 'keep')
#Used a learning rate for AdamOptimzer
learning_rate = tf.placeholder(tf.float32, name = 'learning_rate')
Defining the LSTM multi layer network
Define a network by creating a number of layers. In most examples I found all layers used equal node counts. In this example you can specify the number of neurons per layer through the 'hiddencountper_layer' array.
layers = []
for hidden_count in hidden_count_per_layer:
layer = tf.nn.rnn_cell.LSTMCell(hidden_count, state_is_tuple = True)
layer_with_dropout = tf.nn.rnn_cell.DropoutWrapper(layer,
input_keep_prob=keep_prob,
output_keep_prob=1.0)
layers.append(layer)
hidden_network = tf.nn.rnn_cell.MultiRNNCell(layers, state_is_tuple = True)
Packing/Unpacking the LSTM network state
'stateistuple = True' means that the LSTM State data structure will be a Tuple. Although inconvenient to work with this seems to be the future default. I will introduce some functions which help to work more easily with these state tuples.
In order to use the LSTM network to generate a predicted sequence of arbitrary length you need to store the state of the network. The output state after predicting a sample should be fed back in to the network when predicting the next sample.
The LSTM implementation in Tensor flow uses a LSTMStateTuple(c,h) data structure. The idea is to pack this LSTMStateTuple(c,h) into a 2D vector of size (batch_size, states).
There were some challenges implementing these packing/unpacking functions. Especially you want to avoid them beeing dependent on a specific batchsize. During building of the computation graph the batchsize should be None.
There is a pointer on how to use dynamic batch_sizes and packing/unpacking states here. I made some changes to clarify these functions.
def get_network_state_size(network):
"""Returns the number of states variables in the network"""
states = 0
for layer_size in hidden_network.state_size:
states += layer_size[0] # LSTMState tuple element c
states += layer_size[1] # LSTMState tuple element h
return states
def pack_state_tuple(state_tuple, indent=0):
"""Returns a (batch_size,network_state_size) matrix of the states in the network
state_tupel = the states obtained from _ , state = tf.nn.dynamic_rnn(...)
"""
if isinstance(state_tuple, tf.Tensor) or not hasattr(state_tuple, '__iter__'):
#The LSTMSTateTuple contains 2 Tensors
return state_tuple
else:
l = []
#an unpacked LSTM network is tuple of layer size, each element of the tuple is an LSTMStateTuple
#state_tupel is either the tuple of LSTMStateTuples or it is a LSTMSTateTuple (via recursive call)
for item in state_tuple:
# item is either an LSTMStateTuple (top level call)
# or it is an element of the LSTMStateTuple (first recursive call)
i = pack_state_tuple(item, indent+2)
l.append(i)
#convert the list of [Tensor(bsz,a), Tensor(bsz,b), ...] Into one long Tensor (bsz, a-b-c-...)
return tf.concat(l,1)
def unpack_state_tuple(state_tensor, sizes):
"""The inverse of pack, given a packed_states vector of (batch_size,x) return the LSTMStateTuple
datastructure that can be used as initial state for tf.nn.dynamic_rnn(...)
sizes is the network state size list (cell.state_size)
"""
def _unpack_state_tuple( sizes_, offset_, indent):
if isinstance(sizes_, tf.Tensor) or not hasattr(sizes_, '__iter__'):
#get a small part (batch size, c or h size of LSTMStateTuple) of the packed state vector of shape (batch size, network states)
return tf.reshape(state_tensor[:, offset_ : (offset_ + sizes_) ], (-1, sizes_)), offset_ + sizes_
else:
result = []
#Top level: sizes is a tuple of size network layers, each element of the tuple is an LSTMStateTuple(c size, h size)
#Recursive call: sizes_ is a LSTMStateTuple
for size in sizes_:
#size is an LSTMStateTuple (toplevel)
#or size is c size or h size (recursive call)
s, offset_ = _unpack_state_tuple( size, offset_, indent+2)
result.append(s)
if isinstance(sizes_, tf.nn.rnn_cell.LSTMStateTuple):
#end of recursive call
#Build a LSTMStateTuple using the c size and h size elements in the result list
return tf.nn.rnn_cell.LSTMStateTuple(*result), offset_
else:
# end of toplevel call
# create a tuple of size network layers. Result is a list of LSTMStateTuple
return tuple(result), offset_
return _unpack_state_tuple( sizes, 0,0)[0]
Testing the packing/unpacking functions
Next I wrote a check to see if the pack and unpack functions are indeed each others inverse. The vectors should be packed/unpacked in the correct order. The idea is to create 'packed' vector containing the values 0..n. Then unpack and repack. The output value should be equal to the original vector.
#Test pack and unpack
#create a placeholder in which we can feeisd packed states (vector of (batch_size, states) as initial_state
state_packed_in = tf.placeholder(
tf.float32,
(None,get_network_state_size(hidden_network)),
name="state_packed_1")
#Unpack the packed states
state_unpacked_out = unpack_state_tuple(state_packed_in,hidden_network.state_size)
#Repack the unpacked states
state_packed_out = pack_state_tuple(state_unpacked_out)
inputs_batch_size = 40
a_batch_of_inputs = np.zeros((inputs_batch_size, sequence_length, input_feature_count))
#create an initial state vector and fill it with test data
an_initial_state = np.zeros((inputs_batch_size*get_network_state_size(hidden_network),1))
an_initial_state[:,0] = np.linspace(start=0,stop=an_initial_state.shape[0]-1,num=an_initial_state.shape[0])
#reshape it as an packed state
an_initial_state_packed = np.reshape(an_initial_state, (inputs_batch_size,get_network_state_size(hidden_network)))
init=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
up,p = sess.run([state_unpacked_out, state_packed_out], feed_dict={state_packed_in: an_initial_state_packed})
# compare the original packed states with the ones the were unpacked and then repacked
diff = an_initial_state_packed - p
# should return 0
print("diff",np.sum(np.abs(diff)))
diff 0.0
Initial state
Create a placeholder for initial packed states. This makes it possible to supply the initial states to the LSTM network as a simple vector. Then add a unpack operation to the computation graph. This outputs the initial state as a LSTMTuple vector which can be used by the dynamic RNN function later on.
sz = get_network_state_size(hidden_network)
print("states in network", sz)
initial_state_packed = tf.placeholder(
tf.float32,
(None,sz),
name="initial_state")
state_unpacked = unpack_state_tuple(initial_state_packed,hidden_network.state_size)
states in network 64
Forward propagation
Define the forward calculations by using the dynamic_rnn function. This function needs and outputs the network states in unpacked format.
#out_weights=tf.Variable(tf.random_normal([hidden_count_per_layer[-1],output_feature_count]))
#out_bias=tf.Variable(tf.random_normal([output_feature_count]))
print("inputs ",inputs.shape)
outputs, state_unpacked_network_out = tf.nn.dynamic_rnn(hidden_network, inputs, initial_state = state_unpacked, dtype=tf.float32) #, initial_state=rnn_tuple_state, )
state_packed_network_out = pack_state_tuple(state_unpacked_network_out)
print("packed state", state_packed_network_out.shape)
print("outputs before transpose", outputs.shape)
outputs = tf.transpose(outputs, [1, 0, 2])
print("outputs after transpose", outputs.shape)
#last_output = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
last_output = outputs[outputs.shape[0]-1,:,:]
print("last output", last_output.shape)
#out_size = target.get_shape()[2].value
predictions = tf.contrib.layers.fully_connected(last_output, output_feature_count, activation_fn=None)
print("prediction", predictions.shape)
print("targets", targets.shape)
inputs (?, 100, 1)
packed state (?, 64)
outputs before transpose (?, 100, 16)
outputs after transpose (100, ?, 16)
last output (?, 16)
prediction (?, 1)
targets (?, 1)
Backward pass, training
Define the loss as the total of the squared differences between the the last output (prediction) and the target.
loss = tf.reduce_sum(tf.squared_difference(predictions, targets))
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
Defining the train, dev and test set
Generally you would define 3 sets:
- A set to train on: Train set
- A set to tune the hyper parameters on: Dev set
- A set to test the generalization performance of the network: Test set
In the case of sine wave this is a bit useless. The dev set and test set overlap because of the periodic nature of the sine wave. I added noise to the source signal to make the train, dev and test set at least partly independent.
start_indices = np.linspace(
0,
sample_length-sequence_length-prediction_length-1,
sample_length-sequence_length-prediction_length-1, dtype= np.int32)
#When you have many examples then you can get away with tiny sizes for the dev and test set.
dev_size_perc = 0.20
test_size_perc = 0.20
batch_size = 128 #512
dev_size = int(np.floor(start_indices.shape[0] * dev_size_perc))
test_size = int(np.floor(start_indices.shape[0] * test_size_perc))
train_size = start_indices.shape[0] - test_size - dev_size
train_batch_count = int(np.floor(train_size / batch_size))
dev_batch_count = int(np.floor(dev_size / batch_size))
test_batch_count = int(np.floor(test_size / batch_size))
print("dataset size %d" %(start_indices.shape[0]))
print("%d Examples (%d batches) in train set" %(train_size, train_batch_count))
print("%d Examples (%d batches) in dev set" %(dev_size,dev_batch_count))
print("%d Examples (%d batches) in test set" %(test_size,test_batch_count))
dataset size 49899
29941 Examples (233 batches) in train set
9979 Examples (77 batches) in dev set
9979 Examples (77 batches) in test set
Creating batches
The network will be trained using mini batches. This speeds up training because a network training step is performed after each mini batch in contrast to updating after presenting the complete training set.
#A batch of examples can start at an arbitrary index in the source signal.
# Shuffle the indices to fill the train. dev and test set with different sequences
np.random.shuffle (start_indices)
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]
def get_batch(batch_index, indexes, size=batch_size):
batch_start_indexes = indexes[batch_index*size:batch_index*size+size]
batch_inputs = np.zeros((size,sequence_length, input_feature_count))
batch_targets = np.zeros((size,prediction_length))
for i in range(size):
se = batch_start_indexes[i]
part = signal_amp[se:se+sequence_length]
batch_inputs[i,0:sequence_length,0] = part
batch_targets[i,0] = signal_amp[se+sequence_length+1]
return batch_inputs,batch_targets
batch_inputs,batch_targets = get_batch(train_batch_count-1,train_indices)
print(batch_inputs.shape,batch_targets.shape)
example_inputs = batch_inputs[0,:,:]
example_targets = batch_targets[0,:]
print(example_inputs.shape)
#plot a single example
b_i = 1
b_s = batch_inputs[b_i,0:sequence_length,0]
plotly.offline.iplot({
"data": [Scatter(y=b_s)],
"layout": Layout(title="")
})
(128, 100, 1) (128, 1)
(100, 1)
Test training using a single batch
In the next cell I check if I can train the network on one single batch. Just to check if the optimizer is indeed able to train the network. Successful training should decrease the loss. In the output you will see the loss decreasing (first column)
np.random.shuffle (start_indices)
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]
zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))
init=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
np.random.shuffle (train_indices)
batch_inputs,batch_targets = get_batch(0, train_indices)
print("batch input shape", batch_inputs.shape)
#v_outputs, v_state = sess.run([outputs,state], feed_dict={inputs: batch_inputs, targets: batch_targets})
v_predictions, v_state_unpacked = sess.run([predictions, state_unpacked_network_out],
feed_dict={
inputs: batch_inputs,
targets: batch_targets,
initial_state_packed: zero_state_packed
})
print(v_predictions.shape)
print(v_predictions[0],batch_targets[0])
for i in range(0,120):
v_predictions, v_outputs, v_state_unpacked, v_loss, v_opt = sess.run(
[predictions, outputs, state_unpacked_network_out, loss, opt],
feed_dict={
learning_rate: 0.02,
inputs: batch_inputs,
targets: batch_targets,
state_unpacked: v_state_unpacked
}) #})
if i % 10 == 0:
print(v_loss,v_predictions[0],batch_targets[0])
batch input shape (128, 100, 1)
(128, 1)
[ 0.04703456] [-0.66767944]
69.1535 [ 0.04703381] [-0.66767944]
2.39038 [-0.71438432] [-0.66767944]
0.769588 [-0.72163767] [-0.66767944]
0.238517 [-0.71686089] [-0.66767944]
0.133357 [-0.67176348] [-0.66767944]
0.130659 [-0.67786527] [-0.66767944]
0.102174 [-0.66787326] [-0.66767944]
0.0939764 [-0.68473738] [-0.66767944]
0.0897313 [-0.6789692] [-0.66767944]
0.0882396 [-0.6785149] [-0.66767944]
0.0865416 [-0.68145192] [-0.66767944]
0.0851147 [-0.67975515] [-0.66767944]
Training and Testing
Finally we can train and test the network. The training consists of 'epochs' during which all training batches are presented. After presenting a single training batch the network is immediately optimized. After an epoch the loss is calculated over the dev set and printed.
Next a graph is plotted which shows an example of the network predicting a sine wave. The prediction is based on first 'priming' the network by presenting part of a sine.
After completing training on a number of epochs. The last predictions is executed over a longer time period.
np.random.shuffle (start_indices)
#create a randomized train, dev and test set
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]
#initialization of the network states for a single mini batch, by setting them to zero
#You could also initialize by using random states.
batch_zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))
epoch_count = 10
#Store the performance over the dev set in loss_results
loss_results = np.zeros((epoch_count,2))
def get_loss(set_name, bsz, example_set_indices):
"""Calculate a score over all batches in a set"""
epoch_loss = 0.0
for example_index in range(bsz):
batch_inputs,batch_targets = get_batch(example_index, example_set_indices)
batch_loss = sess.run(loss,feed_dict={
inputs:batch_inputs,
targets:batch_targets,
initial_state_packed: batch_zero_state_packed
})
if example_index % 20 == 0:
print(" %s results batch %d, loss %s" %( set_name, example_index, str(batch_loss)))
epoch_loss += batch_loss
return epoch_loss / len(example_set_indices)
def generate_graph(graph_size=200):
"""Use the network to generate a graph"""
#The network will be primed using prime_size samples of the original signal
prime_size = 50
prime_signal_start_i = 0
#put prime_size samples of the original signal in tmp_singal
orig_signal = np.zeros((graph_size,1))
tmp_signal = np.zeros((graph_size,1))
tmp_signal[0:prime_size,0] = signal_amp[prime_signal_start_i:(prime_signal_start_i+prime_size)]
orig_signal[0:graph_size,0] = signal_amp[prime_signal_start_i:(prime_signal_start_i+graph_size)]
#create a sequence for a batch_size of 1
seq = np.zeros((1,sequence_length,1))
seq_state_packed = np.zeros((1, get_network_state_size(hidden_network)))
_state_unpacked = None
#generate the graph
for end in range(prime_size, graph_size):
#get a sequence to present to the network
seq[0,:,0] = tmp_signal.take(range((end-sequence_length),end), mode='wrap')
#get a prediction
seq_state_packed , _prediction = sess.run(
[state_packed_network_out, predictions[0,0]],
feed_dict={
initial_state_packed: seq_state_packed,
inputs: seq})
#put the prediction in the graph
tmp_signal[end,0] = _prediction
sys.stdout.write('.')
sys.stdout.flush()
print("")
plotly.offline.iplot({
"data": [Scatter(name="predicted",y=tmp_signal[:,0]),Scatter(name="original",y=orig_signal[:,0])],
"layout": Layout(title="")})
init=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(0,epoch_count):
print("Epoch %d" %(epoch))
#in every epoch go through the training set in a different order
np.random.shuffle (train_indices)
print("Train")
for ti in range(train_batch_count):
batch_inputs,batch_targets = get_batch(ti, train_indices)
#train the network
#I reset the state to zero for each batch.
batch_train_loss, _ = sess.run([loss, opt],
feed_dict={
learning_rate: 0.00005,
inputs: batch_inputs,
targets: batch_targets,
initial_state_packed: batch_zero_state_packed
})
sys.stdout.write('.')
sys.stdout.flush()
print("")
epoch_train_loss = get_loss("Train", train_batch_count, train_indices)
print("Training results epoch %d, loss %s" %( epoch, str(epoch_train_loss)))
epoch_dev_loss = get_loss("Dev", dev_batch_count, dev_indices)
print("Dev results epoch %d, loss %s" %( epoch, str(epoch_dev_loss)))
loss_results[epoch,0] = epoch_train_loss
loss_results[epoch,1] = epoch_dev_loss
ti += 1
generate_graph()
#generate a last long graph
generate_graph(graph_size=1000)
plotly.offline.iplot({
"data": [Scatter(name="loss train",y=loss_results[:,0]),Scatter(name="loss dev",y=loss_results[:,1])],
"layout": Layout(title="")})
Epoch 0
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 54.9094
Train results batch 20, loss 52.7554
Train results batch 40, loss 57.1101
Train results batch 60, loss 56.6699
Train results batch 80, loss 52.1481
Train results batch 100, loss 55.9796
Train results batch 120, loss 58.3971
Train results batch 140, loss 44.4456
Train results batch 160, loss 48.3201
Train results batch 180, loss 48.1845
Train results batch 200, loss 55.2113
Train results batch 220, loss 54.7782
Training results epoch 0, loss 0.420098819824
Dev results batch 0, loss 51.6062
Dev results batch 20, loss 53.5696
Dev results batch 40, loss 50.2322
Dev results batch 60, loss 51.0714
Dev results epoch 0, loss 0.407184930349
......................................................................................................................................................
Epoch 1
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 21.4379
Train results batch 20, loss 20.0213
Train results batch 40, loss 21.2412
Train results batch 60, loss 19.3035
Train results batch 80, loss 22.9997
Train results batch 100, loss 22.1121
Train results batch 120, loss 19.6073
Train results batch 140, loss 20.3425
Train results batch 160, loss 19.6809
Train results batch 180, loss 25.0082
Train results batch 200, loss 22.1662
Train results batch 220, loss 24.211
Training results epoch 1, loss 0.163680443564
Dev results batch 0, loss 19.6441
Dev results batch 20, loss 20.6591
Dev results batch 40, loss 21.2059
Dev results batch 60, loss 19.5042
Dev results epoch 1, loss 0.15843713808
......................................................................................................................................................
Epoch 2
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 1.25657
Train results batch 20, loss 1.24823
Train results batch 40, loss 1.5213
Train results batch 60, loss 1.51712
Train results batch 80, loss 1.37371
Train results batch 100, loss 1.52443
Train results batch 120, loss 1.04627
Train results batch 140, loss 1.12556
Train results batch 160, loss 1.01674
Train results batch 180, loss 1.34206
Train results batch 200, loss 1.23301
Train results batch 220, loss 1.15482
Training results epoch 2, loss 0.00969723457763
Dev results batch 0, loss 1.21576
Dev results batch 20, loss 1.2668
Dev results batch 40, loss 1.44446
Dev results batch 60, loss 1.07501
Dev results epoch 2, loss 0.00952362719155
......................................................................................................................................................
Epoch 3
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 0.179961
Train results batch 20, loss 0.153622
Train results batch 40, loss 0.207009
Train results batch 60, loss 0.189542
Train results batch 80, loss 0.156345
Train results batch 100, loss 0.205141
Train results batch 120, loss 0.145814
Train results batch 140, loss 0.153639
Train results batch 160, loss 0.185179
Train results batch 180, loss 0.170148
Train results batch 200, loss 0.184466
Train results batch 220, loss 0.198975
Training results epoch 3, loss 0.00135172683512
Dev results batch 0, loss 0.182201
Dev results batch 20, loss 0.171388
Dev results batch 40, loss 0.174947
Dev results batch 60, loss 0.157753
Dev results epoch 3, loss 0.00133814723993
......................................................................................................................................................
Epoch 4
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 0.0727791
Train results batch 20, loss 0.0689376
Train results batch 40, loss 0.0716537
Train results batch 60, loss 0.0817372
Train results batch 80, loss 0.0861643
Train results batch 100, loss 0.0728834
Train results batch 120, loss 0.0586975
Train results batch 140, loss 0.065918
Train results batch 160, loss 0.0647993
Train results batch 180, loss 0.0598816
Train results batch 200, loss 0.0765543
Train results batch 220, loss 0.0749957
Training results epoch 4, loss 0.000547558254865
Dev results batch 0, loss 0.0717915
Dev results batch 20, loss 0.0704993
Dev results batch 40, loss 0.0613825
Dev results batch 60, loss 0.0708818
Dev results epoch 4, loss 0.00054083534219
......................................................................................................................................................
Epoch 5
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 0.0688844
Train results batch 20, loss 0.0541076
Train results batch 40, loss 0.065995
Train results batch 60, loss 0.056871
Train results batch 80, loss 0.0703863
Train results batch 100, loss 0.0622354
Train results batch 120, loss 0.0632361
Train results batch 140, loss 0.0497587
Train results batch 160, loss 0.0681106
Train results batch 180, loss 0.0735378
Train results batch 200, loss 0.0551927
Train results batch 220, loss 0.0586691
Training results epoch 5, loss 0.000470599653114
Dev results batch 0, loss 0.0599357
Dev results batch 20, loss 0.060271
Dev results batch 40, loss 0.0517052
Dev results batch 60, loss 0.0637929
Dev results epoch 5, loss 0.000465033017603
......................................................................................................................................................
Epoch 6
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 0.053541
Train results batch 20, loss 0.0699349
Train results batch 40, loss 0.0553604
Train results batch 60, loss 0.0620093
Train results batch 80, loss 0.0647623
Train results batch 100, loss 0.0518999
Train results batch 120, loss 0.0632549
Train results batch 140, loss 0.0673768
Train results batch 160, loss 0.0645557
Train results batch 180, loss 0.0533836
Train results batch 200, loss 0.0603966
Train results batch 220, loss 0.0555538
Training results epoch 6, loss 0.000448670988889
Dev results batch 0, loss 0.0567565
Dev results batch 20, loss 0.0563003
Dev results batch 40, loss 0.0496434
Dev results batch 60, loss 0.0620466
Dev results epoch 6, loss 0.000444287580372
......................................................................................................................................................
Epoch 7
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 0.0564441
Train results batch 20, loss 0.0662476
Train results batch 40, loss 0.0497547
Train results batch 60, loss 0.0620312
Train results batch 80, loss 0.0595999
Train results batch 100, loss 0.0605202
Train results batch 120, loss 0.0603222
Train results batch 140, loss 0.0491505
Train results batch 160, loss 0.0579644
Train results batch 180, loss 0.0600865
Train results batch 200, loss 0.0482739
Train results batch 220, loss 0.0515458
Training results epoch 7, loss 0.000437071640663
Dev results batch 0, loss 0.0552906
Dev results batch 20, loss 0.0541909
Dev results batch 40, loss 0.0487828
Dev results batch 60, loss 0.061333
Dev results epoch 7, loss 0.000433660682277
......................................................................................................................................................
Epoch 8
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 0.058103
Train results batch 20, loss 0.0501114
Train results batch 40, loss 0.0478091
Train results batch 60, loss 0.0674866
Train results batch 80, loss 0.0530147
Train results batch 100, loss 0.0548249
Train results batch 120, loss 0.0628589
Train results batch 140, loss 0.0645167
Train results batch 160, loss 0.0632368
Train results batch 180, loss 0.0476007
Train results batch 200, loss 0.0471682
Train results batch 220, loss 0.0493235
Training results epoch 8, loss 0.00043170396977
Dev results batch 0, loss 0.055131
Dev results batch 20, loss 0.0531145
Dev results batch 40, loss 0.0483831
Dev results batch 60, loss 0.0610029
Dev results epoch 8, loss 0.000428519506533
......................................................................................................................................................
Epoch 9
Train
.........................................................................................................................................................................................................................................
Train results batch 0, loss 0.0510024
Train results batch 20, loss 0.0479287
Train results batch 40, loss 0.0584473
Train results batch 60, loss 0.0485359
Train results batch 80, loss 0.0615271
Train results batch 100, loss 0.0570304
Train results batch 120, loss 0.0464874
Train results batch 140, loss 0.0617962
Train results batch 160, loss 0.0486729
Train results batch 180, loss 0.0585998
Train results batch 200, loss 0.045459
Train results batch 220, loss 0.0659634
Training results epoch 9, loss 0.000429875803059
Dev results batch 0, loss 0.0545206
Dev results batch 20, loss 0.0513194
Dev results batch 40, loss 0.0483249
Dev results batch 60, loss 0.0622594
Dev results epoch 9, loss 0.000427906684893
......................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Conclusion, next steps
When playing around with the parameters I found out that often the network performed better on the dev set than the train set! Clearly overlapping dev and train sets are nonsensical. Also the network seems to optimize on predicting the amplitude but not frequency. Probably the frequency can be better predicted by training on more then only one value (last_output).
This example gave me a good overview of some TensorFlow features. More generally there are so many hyper parameters to choose when building a network architecture:
- basic parameters: network size, learning rate, drop-out, optimization method
- how to choose initial state
- predict one sample, or multiple samples
- loss function
It would be interesting to automatically tune the hyper-parameters as well. Maybe using genetic networks?
I am planning to use the approach in this article to process sampled sound waves. Things that cross my mind:
-
Apply on raw audio
- Sample microphone via WebAudo, send the samples to the notebook via WebSocket, analyze and feed the result back
- Implement a phase vocodor, instead of raw audio, input the frequency features
- Achieve something like this
- Process a MIDI file
- Generate text
- Train on multi-feature sequence (eg. audio and corresponding text)
Stay tuned....