Rate this post

Neural Style Transfer là một kỹ thuật ấn tượng trong lĩnh vực học máy, cho phép bạn áp dụng phong cách của một bức tranh nghệ thuật lên một bức ảnh bất kỳ, tạo ra một tác phẩm nghệ thuật mới. TensorFlow, một thư viện mã nguồn mở mạnh mẽ của Google, cung cấp các công cụ và API để thực hiện Neural Style Transfer một cách hiệu quả. Bài viết này sẽ hướng dẫn chi tiết cách thực hiện Neural Style Transfer sử dụng TensorFlow.

Neural Style Transfer là một kỹ thuật dựa trên deep learning, sử dụng các mô hình học sâu (deep neural networks) để tách biệt và tái kết hợp nội dung của một hình ảnh với phong cách của một hình ảnh khác. Kỹ thuật này được giới thiệu lần đầu tiên bởi Gatys et al. vào năm 2015 và đã trở thành một chủ đề nghiên cứu phổ biến trong học máy.

Tóm tắt nội dung

Chuẩn Bị Dữ Liệu

Chúng ta sẽ cần hai hình ảnh: một hình ảnh nội dung và một hình ảnh phong cách. Bạn có thể sử dụng bất kỳ hình ảnh nào, nhưng để dễ dàng theo dõi, hãy sử dụng các hình ảnh có kích thước phù hợp.

from PIL import Image
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

# Tải và chuẩn bị hình ảnh
def load_img(path_to_img):
    max_dim = 512
    img = Image.open(path_to_img)
    long = max(img.size)
    scale = max_dim / long
    img = img.resize((round(img.size[0] * scale), round(img.size[1] * scale)), Image.ANTIALIAS)
    img = np.array(img)

    # Chuyển đổi hình ảnh sang định dạng Tensor
    img = tf.convert_to_tensor(img, dtype=tf.float32)
    img = img[tf.newaxis, :]
    return img

# Hiển thị hình ảnh
def imshow(image, title=None):
    if len(image.shape) > 3:
        image = tf.squeeze(image, axis=0)
    plt.imshow(image)
    if title:
        plt.title(title)
    plt.show()

content_path = 'path_to_content_image.jpg'
style_path = 'path_to_style_image.jpg'

content_image = load_img(content_path)
style_image = load_img(style_path)

imshow(content_image, 'Content Image')
imshow(style_image, 'Style Image')

Xây Dựng Mô Hình Neural Style Transfer

Chúng ta sẽ sử dụng mô hình VGG19, một mô hình mạng neural sâu đã được huấn luyện trước trên tập dữ liệu ImageNet. VGG19 được sử dụng để trích xuất các đặc trưng (features) của hình ảnh nội dung và phong cách.

Trích Xuất Đặc Trưng

Đầu tiên, chúng ta cần định nghĩa các lớp (layers) mà chúng ta sẽ trích xuất đặc trưng từ đó.

# Các lớp để trích xuất đặc trưng nội dung và phong cách
content_layers = ['block5_conv2'] 
style_layers = ['block1_conv1', 
                'block2_conv1', 
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1']

num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

# Tạo mô hình VGG19 và trích xuất đặc trưng từ các lớp đã chọn
def vgg_layers(layer_names):
    vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
    vgg.trainable = False
    outputs = [vgg.get_layer(name).output for name in layer_names]
    model = tf.keras.Model([vgg.input], outputs)
    return model

style_extractor = vgg_layers(style_layers)
content_extractor = vgg_layers(content_layers)

def style_content_model(style_layers, content_layers):
    vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
    vgg.trainable = False
    style_outputs = [vgg.get_layer(name).output for name in style_layers]
    content_outputs = [vgg.get_layer(name).output for name in content_layers]
    model_outputs = style_outputs + content_outputs
    return tf.keras.Model(vgg.input, model_outputs)

Tính Toán Mất Mát (Loss)

Để thực hiện style transfer, chúng ta cần tính toán mất mát giữa các đặc trưng của hình ảnh tạo ra với các đặc trưng của hình ảnh nội dung và phong cách. Mất mát này bao gồm mất mát nội dung và mất mát phong cách.

# Hàm tính Gram matrix để xác định phong cách của hình ảnh
def gram_matrix(input_tensor):
    result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
    input_shape = tf.shape(input_tensor)
    num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
    return result / num_locations

# Mô hình tính toán mất mát nội dung và phong cách
class StyleContentModel(tf.keras.models.Model):
    def __init__(self, style_layers, content_layers):
        super(StyleContentModel, self).__init__()
        self.vgg = vgg_layers(style_layers + content_layers)
        self.style_layers = style_layers
        self.content_layers = content_layers
        self.num_style_layers = len(style_layers)
        self.vgg.trainable = False

    def call(self, inputs):
        "Expecting float input in [0,1]"
        inputs = inputs*255.0
        preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
        outputs = self.vgg(preprocessed_input)
        style_outputs, content_outputs = (outputs[:self.num_style_layers], 
                                          outputs[self.num_style_layers:])

        style_outputs = [gram_matrix(style_output) for style_output in style_outputs]

        content_dict = {content_name: value 
                        for content_name, value 
                        in zip(self.content_layers, content_outputs)}

        style_dict = {style_name: value
                      for style_name, value
                      in zip(self.style_layers, style_outputs)}

        return {'content': content_dict, 'style': style_dict}

extractor = StyleContentModel(style_layers, content_layers)

# Đặt các trọng số cho mất mát
style_weight=1e-2
content_weight=1e4

def style_content_loss(outputs):
    style_outputs = outputs['style']
    content_outputs = outputs['content']
    style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2) 
                           for name in style_outputs.keys()])
    style_loss *= style_weight / num_style_layers

    content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2) 
                             for name in content_outputs.keys()])
    content_loss *= content_weight / num_content_layers
    loss = style_loss + content_loss
    return loss

Huấn Luyện Mô Hình

Chúng ta sẽ khởi tạo hình ảnh mục tiêu bằng hình ảnh nội dung và tối ưu hóa để giảm thiểu mất mát.

# Tạo hình ảnh mục tiêu
target_image = tf.Variable(content_image)

# Sử dụng optimizer Adam
optimizer = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)

@tf.function()
def train_step(image):
    with tf.GradientTape() as tape:
        outputs = extractor(image)
        loss = style_content_loss(outputs)

    grad = tape.gradient(loss, image)
    optimizer.apply_gradients([(grad, image)])
    image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))

# Đặt mục tiêu style và content
style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']

# Số bước huấn luyện
epochs = 10
steps_per_epoch = 100

for n in range(epochs):
    for m in range(steps_per_epoch):
        train_step(target_image)
    print("Epoch:", n, "Step:", m)
    imshow(target_image.read_value())

Kết Luận

Neural Style Transfer là một kỹ thuật thú vị và mạnh mẽ trong học máy, cho phép bạn tạo ra các tác phẩm nghệ thuật bằng cách kết hợp nội dung và phong cách của các hình ảnh khác nhau. Sử dụng TensorFlow, bạn có thể dễ dàng xây dựng và triển khai mô hình Neural Style Transfer. Bằng cách hiểu rõ các bước và khái niệm cơ bản, bạn có thể tùy chỉnh và cải thiện mô hình của mình để đạt được kết quả tốt nhất.

Tham Khảo

Dưới đây là một số tài liệu tham khảo hữu ích để bạn có thể tìm hiểu thêm về Neural Style Transfer và TensorFlow:

TensorFlow Official Documentation – Tài liệu chính thức của TensorFlow.
TensorFlow Tutorials – Các hướng dẫn và ví dụ chi tiết về TensorFlow.
Deep Learning with Python by François Chollet – Sách về học sâu với Python, bao gồm TensorFlow.
Neural Style Transfer Paper by Gatys et al. – Bài báo gốc về Neural Style Transfer.

Hy vọng bài viết này đã cung cấp cho bạn cái nhìn tổng quan và chi tiết về cách thực hiện Neural Style Transfer sử dụng TensorFlow. Chúc bạn thành công trong việc khám phá và áp dụng kỹ thuật này!

Tensorflow

Neural Style Transfer trong TensorFlow