PROJECT_ID = "project"
BUCKET_NAME = "text" 
REGION = "us-central1"

! gsutil ls -al gs://$BUCKET_NAME

!gcloud config set project $PROJECT_ID
!gcloud config set compute/region $REGION

MODEL_NAME = 'text_practice_model'
VERSION_NAME = 'v1'

Train

!gcloud ai-platform jobs submit training text_practice_model_20200629 \
        --job-dir gs://daehwan/text/model/text_practice_model_20200629 \
        --module-name trainer.task \
        --package-path ./trainer \
        --region us-central1 \
        --python-version 3.7 \
        --runtime-version 2.1 \
        --stream-logs \
        -- \
        --model-name='RNN' \
        --optimizer='Adam' \
        --learning-rate=0.001 \
        --embed-dim=32 \
        --n-classes=2 \
        --train-files=gs://text/movie_train.csv \
        --pred-files=gs://text/movie_predict.csv \
        --pred-sequence-files=gs://text/preprocess/movie_predict_sequence.json \
        --num-epoch=5 \
        --batch-size=128

Predict

Online Predict

모델 버전등록

!gcloud ai-platform models create text_practice_model \
  --regions us-central1

# Create model version based on that SavedModel directory
!gcloud beta ai-platform versions create v1 \
    --model text_practice_model \
    --runtime-version 2.1 \
    --python-version 3.7 \
    --framework tensorflow \
    --origin gs://text/model/text_practice_model_20200629/keras_export

Creating version (this might take a few minutes)......done.

DATA for Online Predict

import json
import pandas as pd

dat= 'gs://text/preprocess/movie_predict_sequence.csv'
pred_df = pd.read_csv(dat)
print(pred_df.head())
prediction_input = pd.DataFrame(pred_df).sample(20)


with open('prediction_input.json', 'w') as json_file:
    for row in prediction_input.values.tolist():
        json.dump(row, json_file)
        json_file.write('\n')

      67    425     19    167  1240  639      0    0.1   0.2   0.3  ...  0.54  \
0   1196    419     24    570     1  293   2138     18  1346  5860  ...     0   
1      8  44850    497     53   332  211    145      0     0     0  ...     0   
2   8096    117   7755    209   139    9      5     41    25  5676  ...     0   
3  44851  20091   2025     18     6  780    986      1  2910   142  ...     0   
4    903  44854  31522  44855  1480  513  44856  44857     1     7  ...     0   

   0.55  0.56  0.57  0.58  0.59  0.60  0.61  0.62  0.63  
0     0     0     0     0     0     0     0     0     0  
1     0     0     0     0     0     0     0     0     0  
2     0     0     0     0     0     0     0     0     0  
3     0     0     0     0     0     0     0     0     0  
4     0     0     0     0     0     0     0     0     0  

[5 rows x 70 columns]

## Online predict (local)
!gcloud ai-platform predict \
  --model text_practice_model \
  --version v1 \
  --json-instances prediction_input.json

DENSE
[0.7098474502563477, 0.29015251994132996]
[0.0017769546248018742, 0.9982230067253113]
[0.4356653392314911, 0.5643346905708313]
[0.010444060899317265, 0.9895558953285217]
[0.9958200454711914, 0.004179933108389378]
[0.005581081844866276, 0.9944189786911011]
[0.9964893460273743, 0.0035106223076581955]
[0.9983691573143005, 0.0016308064805343747]
[0.006141127087175846, 0.9938588738441467]
[0.003378230147063732, 0.9966217279434204]
[0.9973077774047852, 0.0026921695098280907]
[0.9920097589492798, 0.007990231737494469]
[0.025658944621682167, 0.974341094493866]
[0.9853862524032593, 0.014613732695579529]
[0.976745069026947, 0.02325497567653656]
[0.9999063014984131, 9.363394929096103e-05]
[0.9136345982551575, 0.08636544644832611]
[0.9642234444618225, 0.03577658161520958]
[0.0012327401200309396, 0.9987672567367554]
[0.011342661455273628, 0.9886572957038879]

Batch Predict

keras model이 csv 파일을 제대로 못읽는다.
json 파일 형식으로 변경해서 predict 실행

!gcloud ai-platform jobs submit prediction predict_text_model_pactice_20200629 \
                 --model-dir 'gs://text/model/text_practice_model_20200629/keras_export' \
                 --runtime-version 2.1 \
                 --data-format text \
                 --region us-central1 \
                 --input-paths 'gs://text/preprocess/movie_predict_sequence.json' \
                 --output-path 'gs://text/predict/practice_output'


### Wait predict job done
!gcloud ai-platform jobs stream-logs predict_text_model_pactice_20200629

Job [predict_text_model_pactice_20200629] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe predict_text_model_pactice_20200629

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs predict_text_model_pactice_20200629
jobId: predict_text_model_pactice_20200629
state: QUEUED
INFO	2020-06-29 03:15:29 +0000	service		Validating job requirements...
INFO	2020-06-29 03:15:30 +0000	service		Job creation request has been successfully validated.
INFO	2020-06-29 03:15:30 +0000	service		Job predict_text_model_pactice_20200629 is queued.

online prediction 에서는 차원이 맞지 않아도 예측결과가 나옴
batch prediction에서는 차원이 정확히 일치해야함
keras custom model 예측시 csv 파일을 못받음 --> json파일로 변경 후 예측 가능tf.data.TextLineDataset(file_paths)

load predict result to BQ

!bq load --project_id=pro \
    --autodetect \
    --replace=false \
    --source_format=NEWLINE_DELIMITED_JSON \
    pan.text_clf_predict_result_20200626 \
    gs://text/predict/practice_output/prediction.results-*

Waiting on bqjob_r616546182e7c76b2_00000172fe22bc60_1 ... (2s) Current status: DONE

check result

%load_ext google.cloud.bigquery

The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery

%%bigquery result
SELECT dense[offset(0)] as negat, dense[offset(1)] as posit  
FROM `project.dataset.text_clf_predict_result_20200626` LIMIT 1000

result

	negat	posit
0	0.282532	0.717468
1	0.010798	0.989202
2	0.004679	0.995321
3	0.004679	0.995321
4	0.849121	0.150879
...	...	...
995	0.305789	0.694211
996	0.000689	0.999311
997	0.988043	0.011957
998	0.988043	0.011957
999	0.926032	0.073968

[TF 2.x] model layer에 text vectorization 단계를 넣기 (0)	2020.07.15
[tf 2.x] tf.keras 로 predict 결과 custom 하기 --> GCP ai-platform ( keyed model, serving_signature) (1)	2020.07.08
bi-directional 어텐션 메카니즘 vs bi-directional 모델 (네이버 영화리뷰) (0)	2020.06.23
Word Embedding (0)	2020.06.17
tf.keras (2.0) & soynlp를 이용한 텍스트 분류 (DNN, RNN, CNN) (0)	2020.06.12

데이터과학 삼학년

데이터과학 삼학년

Text classification using CloudML (jupyter notebook with tf.keras) 본문

Text classification using CloudML (jupyter notebook with tf.keras)

Train

Predict

Online Predict

모델 버전등록

DATA for Online Predict

Batch Predict

load predict result to BQ

check result

'Natural Language Processing' 카테고리의 다른 글

티스토리툴바

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31