데이터과학 삼학년

Text classification using CloudML (jupyter notebook with tf.keras) 본문

Natural Language Processing

Text classification using CloudML (jupyter notebook with tf.keras)

Dan-k 2020. 6. 29. 14:14
반응형
 
PROJECT_ID = "project"
BUCKET_NAME = "text" 
REGION = "us-central1"
 
! gsutil ls -al gs://$BUCKET_NAME
 
 
!gcloud config set project $PROJECT_ID
!gcloud config set compute/region $REGION
 
MODEL_NAME = 'text_practice_model'
VERSION_NAME = 'v1'
 

Train

!gcloud ai-platform jobs submit training text_practice_model_20200629 \
        --job-dir gs://daehwan/text/model/text_practice_model_20200629 \
        --module-name trainer.task \
        --package-path ./trainer \
        --region us-central1 \
        --python-version 3.7 \
        --runtime-version 2.1 \
        --stream-logs \
        -- \
        --model-name='RNN' \
        --optimizer='Adam' \
        --learning-rate=0.001 \
        --embed-dim=32 \
        --n-classes=2 \
        --train-files=gs://text/movie_train.csv \
        --pred-files=gs://text/movie_predict.csv \
        --pred-sequence-files=gs://text/preprocess/movie_predict_sequence.json \
        --num-epoch=5 \
        --batch-size=128 
 
 
 

Predict

Online Predict

모델 버전등록

!gcloud ai-platform models create text_practice_model \
  --regions us-central1
# Create model version based on that SavedModel directory
!gcloud beta ai-platform versions create v1 \
    --model text_practice_model \
    --runtime-version 2.1 \
    --python-version 3.7 \
    --framework tensorflow \
    --origin gs://text/model/text_practice_model_20200629/keras_export
 
Creating version (this might take a few minutes)......done.                    
 

DATA for Online Predict

import json
import pandas as pd

dat= 'gs://text/preprocess/movie_predict_sequence.csv'
pred_df = pd.read_csv(dat)
print(pred_df.head())
prediction_input = pd.DataFrame(pred_df).sample(20)


with open('prediction_input.json', 'w') as json_file:
    for row in prediction_input.values.tolist():
        json.dump(row, json_file)
        json_file.write('\n')
 
      67    425     19    167  1240  639      0    0.1   0.2   0.3  ...  0.54  \
0   1196    419     24    570     1  293   2138     18  1346  5860  ...     0   
1      8  44850    497     53   332  211    145      0     0     0  ...     0   
2   8096    117   7755    209   139    9      5     41    25  5676  ...     0   
3  44851  20091   2025     18     6  780    986      1  2910   142  ...     0   
4    903  44854  31522  44855  1480  513  44856  44857     1     7  ...     0   

   0.55  0.56  0.57  0.58  0.59  0.60  0.61  0.62  0.63  
0     0     0     0     0     0     0     0     0     0  
1     0     0     0     0     0     0     0     0     0  
2     0     0     0     0     0     0     0     0     0  
3     0     0     0     0     0     0     0     0     0  
4     0     0     0     0     0     0     0     0     0  

[5 rows x 70 columns]
## Online predict (local)
!gcloud ai-platform predict \
  --model text_practice_model \
  --version v1 \
  --json-instances prediction_input.json
 
DENSE
[0.7098474502563477, 0.29015251994132996]
[0.0017769546248018742, 0.9982230067253113]
[0.4356653392314911, 0.5643346905708313]
[0.010444060899317265, 0.9895558953285217]
[0.9958200454711914, 0.004179933108389378]
[0.005581081844866276, 0.9944189786911011]
[0.9964893460273743, 0.0035106223076581955]
[0.9983691573143005, 0.0016308064805343747]
[0.006141127087175846, 0.9938588738441467]
[0.003378230147063732, 0.9966217279434204]
[0.9973077774047852, 0.0026921695098280907]
[0.9920097589492798, 0.007990231737494469]
[0.025658944621682167, 0.974341094493866]
[0.9853862524032593, 0.014613732695579529]
[0.976745069026947, 0.02325497567653656]
[0.9999063014984131, 9.363394929096103e-05]
[0.9136345982551575, 0.08636544644832611]
[0.9642234444618225, 0.03577658161520958]
[0.0012327401200309396, 0.9987672567367554]
[0.011342661455273628, 0.9886572957038879]
 

Batch Predict

  • keras model이 csv 파일을 제대로 못읽는다.
  • json 파일 형식으로 변경해서 predict 실행
!gcloud ai-platform jobs submit prediction predict_text_model_pactice_20200629 \
                 --model-dir 'gs://text/model/text_practice_model_20200629/keras_export' \
                 --runtime-version 2.1 \
                 --data-format text \
                 --region us-central1 \
                 --input-paths 'gs://text/preprocess/movie_predict_sequence.json' \
                 --output-path 'gs://text/predict/practice_output'


### Wait predict job done
!gcloud ai-platform jobs stream-logs predict_text_model_pactice_20200629
 
Job [predict_text_model_pactice_20200629] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe predict_text_model_pactice_20200629

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs predict_text_model_pactice_20200629
jobId: predict_text_model_pactice_20200629
state: QUEUED
INFO	2020-06-29 03:15:29 +0000	service		Validating job requirements...
INFO	2020-06-29 03:15:30 +0000	service		Job creation request has been successfully validated.
INFO	2020-06-29 03:15:30 +0000	service		Job predict_text_model_pactice_20200629 is queued.
 
  1. online prediction 에서는 차원이 맞지 않아도 예측결과가 나옴
  2. batch prediction에서는 차원이 정확히 일치해야함
  3. keras custom model 예측시 csv 파일을 못받음 --> json파일로 변경 후 예측 가능tf.data.TextLineDataset(file_paths)
 

load predict result to BQ

!bq load --project_id=pro \
    --autodetect \
    --replace=false \
    --source_format=NEWLINE_DELIMITED_JSON \
    pan.text_clf_predict_result_20200626 \
    gs://text/predict/practice_output/prediction.results-*
 
Waiting on bqjob_r616546182e7c76b2_00000172fe22bc60_1 ... (2s) Current status: DONE   
 

check result

%load_ext google.cloud.bigquery
 
The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery
%%bigquery result
SELECT dense[offset(0)] as negat, dense[offset(1)] as posit  
FROM `project.dataset.text_clf_predict_result_20200626` LIMIT 1000
result
  negat posit
0 0.282532 0.717468
1 0.010798 0.989202
2 0.004679 0.995321
3 0.004679 0.995321
4 0.849121 0.150879
... ... ...
995 0.305789 0.694211
996 0.000689 0.999311
997 0.988043 0.011957
998 0.988043 0.011957
999 0.926032 0.073968

1000 rows × 2 columns

 

728x90
반응형
LIST
Comments