Tensorflow Variational AutoEncoder 개념 및 코드 예시(VAE, MNIST Dataset)

Data Science/Tensorflow

Tensorflow Variational AutoEncoder 개념 및 코드 예시(VAE, MNIST Dataset)

상어군 2022. 11. 1. 10:38

1. VAE 개념

VAE는 일반 AutoEncoding과는 다르게 입력데이터(X)를 Lantent Representation(Z)로 압축하고 이를 사용하여 기존과 유사하나 새로운 데이터를 생성하는 것을 목표로 합니다.

(기존 AutoEncoder는 입력데이터(X)를 유사하게 복구하는 것이 목표이다.)

이때 Latent Representation은 생성하고자하는 결과의 평균 및 분산을 담고있다고 가정한다.

\(z \ \ \ \ \rightarrow \ \ \ \ x \)
\(p\theta^{*}(z) \ \ \ \ \ \ p\theta^{*}(x|z^{(i)})\)

\(p(z)\) : latent vector(Z)의 확률밀도함수, 가우시안분포를 따름

\(p(x|z)\) : Z로부터 어떠한 X가 나올 조건부 확률에 대한 확률밀도함수

\(\theta\) : 모델의 파라미터(가중치)

한땀한땀 딥러닝 컴퓨터 비전 백과사전 https://wikidocs.net/152474

2. 기본세팅

2.1. 라이브러리

import os
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow_datasets as tfds
from tensorflow.keras import Model, layers
from matplotlib import pyplot as plt

2.2 GPU 세팅

os.environ["CUDA_VISIBLE_DEVICES"]="0"
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        tf.config.experimental.set_memory_growth(gpus[0], True)
    except RuntimeError as e:
        print(e)

3. Data Load

포스팅에서 사용되는 MNIST 데이터셋은 손으로 쓰여진 0~9 숫자들의 이미지를 가지는 데이터셋이다.

tensorflow_dataset.load()를 통해서 MNIST 데이터셋을 받아올 수 있다.

train set 60000개, test set 10000개의 이미지로 구성되어 있다.

각 이미지는 28*28 픽셀로 이루어져있으며, 각 픽셀에는 해당하는 색정보가 숫자로 들어있다.

(색정보는 0~255의 숫자로 표현된다.)

dataset = tfds.load('mnist', split='train')

4. Data Preprocessing

batch_size = 1024
train_data = dataset.map(lambda data: tf.cast(data['image'], tf.float32) / 255.).batch(batch_size)

5. VAE 모델 생성 및 학습

5.1 VAE Encoder/Decoder 구현

# Encoder 정의
class Vanila_Encoder(Model):
    def __init__(self, latent_dim):
        super().__init__()
        self.latent_dim = latent_dim
        self.encoder = tf.keras.Sequential([
            layers.Flatten(),
            layers.Dense(512, activation='relu'),
            layers.Dense(256, activation='relu'),
            layers.Dense(latent_dim * 2)
        ])
        
    def __call__(self, x):
        # self.encoder(x)가 도출한 각각의 값을 mu, logvar로 mapping
        # tf.split: value into a list of subtensors
        mu, logvar = tf.split(self.encoder(x), 2, axis=1) 
        return mu, logvar

# Decoder 정의
class Vanila_Decoder(Model):
    def __init__(self, latent_dim):
        super().__init__()
        
        self.latent_dim = latent_dim
        self.decoder = tf.keras.Sequential([
            layers.Dense(256, activation='relu'), 
            layers.Dense(512, activation='relu'),
            layers.Dense(784, activation='sigmoid'),
            layers.Reshape((28,28, 1))
        ])
        
    def __call__(self, z):
        return self.decoder(z)

5.2 Sampling 및 train_step 구현

# reparametrization
def sample(mu, logvar):
    epsilon = tf.random.normal(mu.shape)
    sigma = tf.exp(0.5 * logvar)
    return epsilon * sigma + mu

# train VAE model
def train_step(inputs):
    # GradientTape에서 gradient값들을 수집함
    with tf.GradientTape() as tape:
        # Encoder로부터 mu, logvar를 얻음 : q(z|x)
        mu, logvar = encoder(inputs) 
        # mu, logvar를 사용해서 reparameterization trick 생성
        z = sample(mu, logvar)
        # reparameterization trick을 Decoder에 넣어 reconstruct x 얻기 : (p(x|z))
        x_recon = decoder(z)
        # reconstruction loss: q(z|x)logp(x|z)
        # 입력과 생성된 이미지의 차이
        reconstruction_error = tf.reduce_sum(tf.losses.binary_crossentropy(inputs, x_recon))
        # regularization loss: KL(p(z)|q(z|x))
        # KL의 의미는?
        kl = 0.5 * tf.reduce_sum(tf.exp(logvar) + tf.square(mu) - 1. - logvar)
        # inputs.shape[0]: # of samples
        loss = (kl + reconstruction_error) / inputs.shape[0]
         # get trainable parameter
        vars_ = encoder.trainable_variables + decoder.trainable_variables 
        # get grads
        grads_ = tape.gradient(loss, vars_) 
        # apply gradient descent (update model)
        optimizer.apply_gradients(zip(grads_, vars_)) 

    return loss, reconstruction_error, kl

5.3 모델 구성 및 학습

# Set hyperparameters
n_epochs = 50
latent_dim = 2
learning_rate = 1e-3
log_interval = 10

encoder = Vanila_Encoder(latent_dim)
decoder = Vanila_Decoder(latent_dim)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

for epoch in range(1, n_epochs + 1):    
    total_loss, total_recon, total_kl = 0, 0, 0
    for x in train_data:
        loss, recon, kl = train_step(x)
        # loss 저장
        total_loss += loss * x.shape[0]
        # error 저장
        total_recon += recon
        # total KL 저장
        total_kl += kl
    
    if epoch % log_interval == 0:
        print(
            f'{epoch:3d} iteration: ELBO {total_loss / len(dataset):.2f}, ' \
            f'Recon {total_recon / len(dataset):.2f}, ' \
            f'KL {total_kl / len(dataset):.2f}'
        )

참고문헌

https://wikidocs.net/152474

저작자표시