up-or-down/Image_Style_Transfer
To realize image style transfer using basic components of neural network
Image_Style_Transfer
realize image style transfer using basic components of neural network
According to UCAS course: Intelligent Computing Systems
Basic NN Components layers1
FullyConnectedLayer:
forward:
where
is the weight matrix,
is the bias vector,
and
is the output of fully connected layer.
backward:
Define
as partial derivative of Loss function
to
update parameters:
ReLULayer:
forward:
where
backward:
The partial derivative of
SoftmaxLossLayer:
where
forward:
The loss function of softmax layer is defined as:
where
Considering batch processing:
where
is the position
of label matrix
,every row vector of
backward:
The partial derivative of
Considering batch processing:
Demo1 MNIST Classification
Load MNIST dataset:
import struct
import numpy as np
MNIST_DIR = "../mnist_data"
TRAIN_DATA = "train-images-idx3-ubyte"
TRAIN_LABEL = "train-labels-idx1-ubyte"
TEST_DATA = "t10k-images-idx3-ubyte"
TEST_LABEL = "t10k-labels-idx1-ubyte"
def load_mnist(file_dir, is_images = 'True'):
# Read binary data
bin_file = open(file_dir, 'rb')
bin_data = bin_file.read()
bin_file.close()
# Analyze file header
if is_images:
# Read images
fmt_header = '>iiii'
magic, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, 0)
else:
# Read labels
fmt_header = '>ii'
magic, num_images = struct.unpack_from(fmt_header, bin_data, 0)
num_rows, num_cols = 1, 1
data_size = num_images * num_rows * num_cols
mat_data = struct.unpack_from('>' + str(data_size) + 'B', bin_data, struct.calcsize(fmt_header))
mat_data = np.reshape(mat_data, [num_images, num_rows * num_cols])
print('Load images from %s, number: %d, data shape: %s' % (file_dir, num_images, str(mat_data.shape)))
return mat_data
train_images = load_mnist(TRAIN_DATA, True)
train_labels = load_mnist(TRAIN_LABEL, False)
test_images = load_mnist(TEST_DATA, True)
test_labels = load_mnist(TEST_LABEL, False)Result:
Basic CNN Components layers2
In this section,we use VGG19 instead of VGG16.
| Name | Type | Kernel Size | Stride | Padding Size | Cin | Cout | K |
|---|---|---|---|---|---|---|---|
| conv1_1 | Conv | 3 | 1 | 1 | 3 | 64 | 224 |
| conv1_2 | Conv | 3 | 1 | 1 | 64 | 64 | 224 |
| pool1 | MaxPool | 2 | 2 | - | 64 | 64 | 112 |
| conv2_1 | Conv | 3 | 1 | 1 | 64 | 128 | 112 |
| conv2_2 | Conv | 3 | 1 | 1 | 128 | 128 | 112 |
| pool2 | MaxPool | 2 | 2 | - | 128 | 128 | 56 |
| conv3_1 | Conv | 3 | 1 | 1 | 128 | 256 | 56 |
| conv3_2 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
| conv3_3 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
| conv3_4 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
| pool3 | MaxPool | 2 | 2 | - | 256 | 256 | 28 |
| conv4_1 | Conv | 3 | 1 | 1 | 256 | 512 | 28 |
| conv4_2 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
| conv4_3 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
| conv4_4 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
| pool4 | MaxPool | 2 | 2 | - | 512 | 512 | 14 |
| conv5_1 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
| conv5_2 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
| conv5_3 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
| conv5_4 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
| pool5 | MaxPool | 2 | 2 | - | 512 | 512 | 7 |
| fc6 | FCL | - | - | - | 512*7*7 | 4096 | 1 |
| fc7 | FCL | - | - | - | 4096 | 4096 | 1 |
| fc8 | FCL | - | - | - | 4096 | 1000 | 1 |
| softmax | Softmax | - | - | - | - | - | - |
ConvolutionalLayer
Convolution Kernel
The input feature map
To obtain expected output in each layer, after image padding:
Apply convolution operation to
backward:
Define
MaxPoolingLayer
The input of max pooling
and the output
backward:
Demo2 VGG19-based ImageNet Classification
Load ImageNet dataset:
Standard model can be acquired from vgg,so official dataset is unnecessary.
Code for loading test pictures as follows:
def load_image(image_dir):
input_image = scipy.misc.imread(image_dir)
input_image = scipy.misc.imresize(input_image,[224,224,3]) #unifies the size of the input
input_image = np.array(input_image).astype(np.float32) #quantification
input_image -= image_mean #separately calculated
input_image = np.reshape(input_image,[1]+list(input_image.shape)) #input dim:[N=1,height=224,width=224,channel=3]
input_image = np.transpose(input_image,[0,3,1,2]) #input dim:[N=1,channel=3,height=224,width=224]Result
Classification result id=281,class category refers to
here
Demo3 Image Style Transfer(not real-time)layers3
Content Loss
Suppose
is the
feature map of style transfer image, and
is the
feature map of targeted content image.Centent loss can be represented by
and
The content loss is the average Euclidean distance of all positions in feature maps.
The gradient of content loss to feature map can be calculated by:
In experiment, feature map of content image is chosen from output of ReLU layer after conv4_2.
Style Loss
Suppose
is the
feature map of style transfer image, and
is the
feature map of targeted content image. In forward propagation,the style feature of style transfer image
and targeted style image
are calculated with Gram moment:
where
indicates one sample,$i,j\in[1,C]$
corresponds to one channel.The style loss of
The overall style loss is the weighted sum of style loss in each layer.
In backward propagation,the gradient of
to
Based on content loss and style loss, the total loss can be represented as:
Adam Optimizer
To train neural network,batch random gradient descent is used to update network parameters.In experiment,Adam algorithom is used instead of batch random gradient descent, because it converges faster.
Parameter updating:
where
is estimition of the order one moment of gradient,
is that of the order two.
and
Result:
Note: It will cost a lot of time to process images(about one hour each epoch). Model acceleration will be considered in the future.













