Natural Language Processing
Text classification using CloudML (jupyter notebook with tf.keras)
Dan-k
2020. 6. 29. 14:14
반응형
PROJECT_ID = "project"
BUCKET_NAME = "text"
REGION = "us-central1"
! gsutil ls -al gs://$BUCKET_NAME
!gcloud config set project $PROJECT_ID
!gcloud config set compute/region $REGION
MODEL_NAME = 'text_practice_model'
VERSION_NAME = 'v1'
Train
!gcloud ai-platform jobs submit training text_practice_model_20200629 \
--job-dir gs://daehwan/text/model/text_practice_model_20200629 \
--module-name trainer.task \
--package-path ./trainer \
--region us-central1 \
--python-version 3.7 \
--runtime-version 2.1 \
--stream-logs \
-- \
--model-name='RNN' \
--optimizer='Adam' \
--learning-rate=0.001 \
--embed-dim=32 \
--n-classes=2 \
--train-files=gs://text/movie_train.csv \
--pred-files=gs://text/movie_predict.csv \
--pred-sequence-files=gs://text/preprocess/movie_predict_sequence.json \
--num-epoch=5 \
--batch-size=128
Predict
Online Predict
모델 버전등록
!gcloud ai-platform models create text_practice_model \
--regions us-central1
# Create model version based on that SavedModel directory
!gcloud beta ai-platform versions create v1 \
--model text_practice_model \
--runtime-version 2.1 \
--python-version 3.7 \
--framework tensorflow \
--origin gs://text/model/text_practice_model_20200629/keras_export
DATA for Online Predict
import json
import pandas as pd
dat= 'gs://text/preprocess/movie_predict_sequence.csv'
pred_df = pd.read_csv(dat)
print(pred_df.head())
prediction_input = pd.DataFrame(pred_df).sample(20)
with open('prediction_input.json', 'w') as json_file:
for row in prediction_input.values.tolist():
json.dump(row, json_file)
json_file.write('\n')
## Online predict (local)
!gcloud ai-platform predict \
--model text_practice_model \
--version v1 \
--json-instances prediction_input.json
Batch Predict
- keras model이 csv 파일을 제대로 못읽는다.
- json 파일 형식으로 변경해서 predict 실행
!gcloud ai-platform jobs submit prediction predict_text_model_pactice_20200629 \
--model-dir 'gs://text/model/text_practice_model_20200629/keras_export' \
--runtime-version 2.1 \
--data-format text \
--region us-central1 \
--input-paths 'gs://text/preprocess/movie_predict_sequence.json' \
--output-path 'gs://text/predict/practice_output'
### Wait predict job done
!gcloud ai-platform jobs stream-logs predict_text_model_pactice_20200629
- online prediction 에서는 차원이 맞지 않아도 예측결과가 나옴
- batch prediction에서는 차원이 정확히 일치해야함
- keras custom model 예측시 csv 파일을 못받음 --> json파일로 변경 후 예측 가능tf.data.TextLineDataset(file_paths)
load predict result to BQ
!bq load --project_id=pro \
--autodetect \
--replace=false \
--source_format=NEWLINE_DELIMITED_JSON \
pan.text_clf_predict_result_20200626 \
gs://text/predict/practice_output/prediction.results-*
check result
%load_ext google.cloud.bigquery
%%bigquery result
SELECT dense[offset(0)] as negat, dense[offset(1)] as posit
FROM `project.dataset.text_clf_predict_result_20200626` LIMIT 1000
result
728x90
반응형
LIST