데이터과학 삼학년

Transfer Learning (universal-sentence-encoder-multilingual) 본문

Natural Language Processing

Transfer Learning (universal-sentence-encoder-multilingual)

Dan-k 2020. 8. 10. 17:12
반응형

Tranfer learning은 NLP 관련 Task에서 제한적인 양의 training data 가 쓰이고 있기 때문에 실제 연구나 실무에서 고 성능의 NLP 연구를 위해 transfer learning을 쓸 수 있다.

(특히 word embedding 과 같은 부분에서; word2vec, GloVe) 

 

최근 연구(2017) 에서는 word 단위가 아닌 sentence 단위의 embedding이 성능이 더 높다는 것이 입증 되었음 

 

entence embedding을 위한 2가지 모델을 제시하며, 이 모델들은 좋은 성능을 입증되었다. 

 

이에 embedding layer는 다국어의 변형이 가능한 (16개 언어) universal sentece encoder multilingual 모델을 이용하여 transfer learning layer를 추가하고, 이후 분석 모델은 custom하게 작성하여 사용해 본다.

 

 

universal sentence encoder를 사용하기 위해서는 꼭 tensorflow_text를 설치해야함

!pip install tensorflow_text
 
import urllib
import pandas as pd

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import optimizers

from sklearn.model_selection import train_test_split
tf.__version__
'2.3.0'
## naver 영화데이터 불러오기
urllib.request.urlretrieve("https://raw.githubusercontent.com/e9t/nsmc/master/ratings_train.txt", filename="ratings_train.txt")
urllib.request.urlretrieve("https://raw.githubusercontent.com/e9t/nsmc/master/ratings_test.txt", filename="ratings_test.txt")

train_data = pd.read_table('ratings_train.txt')
test_data = pd.read_table('ratings_test.txt')

train_data['document'] = train_data['document'].apply(str)

X = train_data.iloc[:2000,:].document
y = train_data.iloc[:2000,:].label
X.head()
0                                  아 더빙.. 진짜 짜증나네요 목소리
1                    흠...포스터보고 초딩영화줄....오버연기조차 가볍지 않구나
2                                    너무재밓었다그래서보는것을추천한다
3                        교도소 이야기구먼 ..솔직히 재미는 없다..평점 조정
4    사이몬페그의 익살스런 연기가 돋보였던 영화!스파이더맨에서 늙어보이기만 했던 커스틴 ...
Name: document, dtype: object
def encode_labels(sources):
    classes = [source for source in sources]
    one_hots = to_categorical(classes)
    return one_hots

X_train, X_valid, y_train, y_valid = train_test_split(X, encode_labels(y), test_size=0.1, random_state=42)
n_classes = 2
optimizer = optimizers.Adam(learning_rate=0.01)
use_url = "https://tfhub.dev/google/universal-sentence-encoder-multilingual/3"
use_url_large = "https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3"
loaded_obj = hub.load(use_url)
model = Sequential(
    [   
      hub.KerasLayer(use_url, input_shape=[], dtype=tf.string, trainable=False, name='USE-multilingual'),
      tf.keras.layers.Dense(2, activation="relu", name="layer1"),
      tf.keras.layers.Dense(3, activation="relu", name="layer2"),
      tf.keras.layers.Dense(2, activation="softmax", name="layer3")
    ]
)

model.compile(optimizer = tf.keras.optimizers.Adam(lr=1e-3), loss ='categorical_crossentropy', metrics =['accuracy'])
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
USE-multilingual (KerasLayer (None, 512)               68927232  
_________________________________________________________________
layer1 (Dense)               (None, 2)                 1026      
_________________________________________________________________
layer2 (Dense)               (None, 3)                 9         
_________________________________________________________________
layer3 (Dense)               (None, 2)                 8         
=================================================================
Total params: 68,928,275
Trainable params: 1,043
Non-trainable params: 68,927,232
_________________________________________________________________
model.fit(X_train.values, y_train, epochs=2)
Epoch 1/2
57/57 [==============================] - 5s 93ms/step - loss: 0.6933 - accuracy: 0.4939
Epoch 2/2
57/57 [==============================] - 5s 93ms/step - loss: 0.6905 - accuracy: 0.5711
<tensorflow.python.keras.callbacks.History at 0x7f5f11bbdd68>
tf.keras.utils.plot_model(model, show_shapes=True, dpi=90)
Out[54]:
MODEL_EXPORT_PATH = './use_model/'
tf.saved_model.save(model, MODEL_EXPORT_PATH)
INFO:tensorflow:Assets written to: ./use_model/assets
INFO:tensorflow:Assets written to: ./use_model/assets
loaded_model = tf.keras.models.load_model(MODEL_EXPORT_PATH)
test_sample = '나는 밥을 아주 맛있게 먹었습니다.'
## 저장후 load한 모델 vs 원 모델 결과 비교
loaded_model.predict([test_sample])
array([[0.47928882, 0.52071124]], dtype=float32)
model.predict([test_sample])
array([[0.47928882, 0.52071124]], dtype=float32)
 

 

모델을 저장해도 tranfer learning layer가 같이 저장된 것을 확인할 수 있음

728x90
반응형
LIST
Comments