[TF 2.2] Text Vectorization Layer 적용된 모델의 예측 방법 (feat.GCP)

250x250

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

데이터과학 삼학년

[TF 2.2] Text Vectorization Layer 적용된 모델의 예측 방법 (feat.GCP) 본문

Machine Learning

[TF 2.2] Text Vectorization Layer 적용된 모델의 예측 방법 (feat.GCP)

Dan-k 2020. 7. 27. 11:55

TextVectorizaion Layer 적용된 모델의 예측 방법

배경 및 현황

tf.keras에서 2.x 버전부터 experimental로 preprocessing layer를 제공해줌

text to vector의 layer 존재 확인(tf 2.2)
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

이를 이용해 모델 Layer에 TextVectorization Layer가 있으면 input을 string 형태로 넣어도 된다는 장점이 있음
모델의 Layer에 Text Vectorization을 넣은 경우 문제점이 발견되어 문제점 / 원인 / 해결방안에 대해 본 보고서에 다루고자 함

문제점

학습 단계에서 Callbacks 을 wrapper 해서 쓸 수 없음

개발내 bug로 keras git에서 확인 (internal error)

callback method 관련 유사한 에러가 많아 유사사례에 대해서는 개발자가 tf.2.2 에서 해결한다고 했지만, TextVecotrization layer 관련 에러는 언제 해결될지 지켜볼 필요 있음

학습된 모델을 예측단계에 실행하는 동안 문제가 발생됨

예측 과정은 크게 Online prediction 과 Batch prediction으로 나눌 수 있음

Online prediction : 실행 OK
batch prediction : internal error/ 한없는 prediction 시간

해당 데이터로 100줄이 이루어진 데이터를 수행했을 때 비교

with open("test_string_input.json", "w") as file:

   for i in range(10):

       print('{"data": ["진짜 좋네요 목소리"], "pid":"7BB9946C26C044B5B1EDEF5DA97E8A84", "logkey":"F1D826C90DE94A568B0ACC88D00322E4"}', file=file)

Online prediction

수행시간 : 1초 내외
결과 : 성공

Batch prediction

수행시간 : 19분 4초

결과 : 실패

Internal error

Online VS Batch

Online 예측의 경우, prediction과정을 최적화
Batch 예측의 경우, 한정된 자원 (mls1-c1-m2(기본)) 에서 예측 수행(메모리 2GB)
위 두가지 prediction 방식의 차이로, 전부 string 형태를 담고 있는 예측 데이터로 인해 batch prediction이 제대로 실행되지 않을 가능성이 있음

해결방안

BQ ML을 이용한 예측 (온라인 prediction)

BQ ml의 경우, online prediction으로 적용

예측 소요시간 (146,182 데이터 기준) : 30초 내외
해당 단계로 실행시 예측과 BQ로드가 동시에 되는 장점이 있음

PipeLine

Check Preprocessed DATA → [ Train Model → Create BQ Model ] → BQ predict (online) & Load predict result → Report

정리

버전별 비교

파이프라인	V1	V2	V3
단계	7단계	6단계	5단계
전체 소요시간	35분	-	29분
결과 시인성	보통	측정불가	좋음

파이프라인 버전 정리
V1

check preprocess → [train model -> register model] → preprocessing for predict file → predict→ load result in BQ → report

check preprocess → [train model -> register model] → predict → load result in BQ → report

check preprocess → [train model -> create BQ ML model] → BQ predict (online) & Load predict result → Report

728x90

LIST

'Machine Learning' 카테고리의 다른 글

CNN for sequence models (0)	2020.08.10
[유사도 분석] angular distance (vs cosine similarity) (0)	2020.08.06
[TF 2.2] tf.keras.layers.experimental.preprocessing.TextVectorization 한계와 해결 방법 (0)	2020.07.17
Keras API 간단 정리 (From. Google Developers ) (0)	2020.07.16
[TF.2.x] Keras 모델의 predict output을 사용자가 커스텀하는 방법 (0)	2020.07.09