TensorRT for Inference

Machine Learning

TensorRT for Inference

Dan-k 2023. 9. 4. 11:00

TensorRT

- Tensorflow의 latency를 낮춰 좋게 만들기위한 방법으로 TensorRT로 변환!!!

- inference 최적화 및 latency를 최소화히기위한 플랫폼

- NVIDIA GPUs환경에서 작동

- TensorRT로 모든 주요 프레임웍에서 동작시킬수 있음

Tensorflow → TensorRT 변환 방법

- GPU환경에서 변환 가능 → TensorRT는 GPU에서 작동 가능함

- CPU에서 최적화하고 싶다면 OpenVino나 ONNX를 사용

from tensorflow.python.compiler.tensorrt import trt_convert as trt

# Conversion Parameters 
conversion_params = trt.TrtConversionParams(
precision_mode=trt.TrtPrecisionMode.<FP32 or FP16>)

converter = trt.TrtGraphConverterV2(input_saved_model_dir=input_saved_model_dir, conversion_params=conversion_params)

# Converter method used to partition and optimize TensorRT compatible segments
converter.convert()

# Save the model to the disk 
converter.save(output_saved_model_dir)

TensorRT inference 방법

- 기존 tensorflow 모델과 동일한 방법으로 불러와 signatures로 inference

# Load the model using TensorFlow
model = tf.saved_model.load(input_model_dir,\\
                        tags=[tag_constants.SERVING])# Creating Signature for the model
signature_keys = list(model.signatures.keys())# Building a graph function for prediction
graph_func = model.signatures[signature_keys[0]]# Making prediction out of it.
model_score = graph_func(input_tensor)

- TensorRT는 대게 3배 더 빠른 inference가 가능하다고 한다

참고

Improve Inference time for TensorFlow Models using TensorRT

Building a Machine/Deep Learning model is only 10 percent of job for any data scientist.

medium.com

728x90

LIST