What we are going to do

In this tutorial, we will run MNIST classification which we previously ran in Quickstart, on your local PC instead of inside Docker image we provided. We will optimize and compile the TFLite model and run the compiled model on your local PC using a single thread.

Requirements

We assume that you have installed Optimium and Optimium Runtime on your PC. If you have not, we highly recommend visiting Optimium Setup.

The process is divided into two steps: 1. Optimize, and 2. Deploy.

1. Optimize

Before you start

1. Download model and image files

You need the TFLite model and image files that were used in the Quickstart. You can download them here and copy them into your workspace directory.

tiny_mnist_example.tflite: A simple MNIST model to compare Optimium with TFLite
mnist_sample.jpg: A sample image to test Optimium
verify_output.py: A python script to compare performance and verify output of Optimium with TFLite

2. Set environmental variable

The environment variable WORKING_DIR must be set before importing Optimium.

WORKING_DIR is the path where Optimium-related logs and outputs are created.

cd <any workspace you may want>
export WORKING_DIR=$PWD

3. Create a user argument file template

In order to fully optimize your model for your target hardware, you need to provide hardware information.

Don't worry; you can simply follow the prompted steps (see detailed steps here)

First, run Python and enter the following commands:

import optimium
optimium.create_args_template()

Once you have followed all prompted instructions, user_arguments.json will be created in your WORKING_DIR.

Note: In this guide, for better accessibility, please selec the following options:

When prompted with "Is your target device remotely connected?", select "no".

When prompted with "Select your framework", enter "2".

When prompted with "Enable hardware-specific auto-tuning?", select "yes".

Below is an example of a user_arguments.json file created (for X86_64, in this case):

{
    "license_key": null,
    "device_name": "MyDevice",
    "model": {
        "input_shapes": [
            [0, 0, 0, 0]
        ],
        "framework": "tflite",
        "tflite": {
            "fp16": false,
            "model_path": "YOUR_MODEL.tflite"
        }
    },
    "target_devices": {
        "host": {
            "arch": "X86_64",
            "os": "LINUX",
            "mattr": "auto",
        },
        "CPU": {
            "arch": "X86_64",
            "platforms": [
                "NATIVE"
            ]
        }
    },
    "runtime": {
        "num_threads": 1
    },
    "remote": {
        "address": "localhost",
    },
    "optimization": {
        "opt_log_key": "MyOptKey",
        "enable_tuning": true
    },
    "out_dirname": "MyOutputDir"
}

4. Modify model information

Next, you need to update your model information as guided in the details here. You should change the "model_path" and "input_shapes" fields.

"model_path" : The relative path of where you saved tiny_mnist_example.tflite to your workspace directory. (In fact, it is the relative path to WORKING_DIR, which was set to your workspace directory in this previous step)
"input_shapes" : This model has one input with the shape [1, 28, 28, 1].

Modify the model information ("input_shapes" and "model_path") in user_arguments.json as shown below:

{
  	...
    "model": {
        "input_shapes": [
            [1, 28, 28, 1]
        ],
        "framework": "tflite",
        "tflite": {
            "fp16": false,
            "model_path": "[relative path to tiny_mnist_example.tflite]"
        }
    },
    ...
}

5. Set your license key information

Optimium requires a license. If you have not received a license, please check your email or contact us. To set the license, you can save your license key into user_arguments.json inside a key "lincense_key".

# user_arguments.json
{
    "license_key": "AAAAA-BBBBB-CCCCC-DDDDD",
    "device_name": "MyDevice",
    "model": {
 				...
		},
 		...
}

Run Optimium

In your workspace, run python3 and execute following lines. This step optimizes and compiles the provided MNIST classification model.

import optimium

optimium.optimize_tflite_model(max_trials=64)

ℹ️
Optimium dynamically searches and tunes inference performance to fit your target hardware. It takes between 30 and 50 minutes depending on your machine's performance. We are accelerating the process!

Check output

The above step saves a result in $WORKING_DIR/outputs/. The nested directory name depends on "device_name", "opt_log_key", and "out_dirname" in user_arguments.json.

You should get two files as shown below:

2. Deploy and test performance

We will verify and compare the optimized model with TFLite. Run python3 and execute the following lines of code.

1. Import packages

import numpy as np
import time

import optimium.runtime as rt
import numpy as np
from PIL import Image

import tensorflow as tf

warmup = 50
repeat = 50

2. Load sample image

Please change the location of the downloaded mnist_sample.jpg as desired.

# Load the saved image
image_loaded = Image.open('mnist_sample.jpg')
np_img = np.array(image_loaded).astype(np.float32) / 255
np_img = np.expand_dims(
    np.expand_dims(
        np_img, axis=-1,
    ), axis=0,
)

3. Prepare to run Optimium

In line 3 of your code, modify /path/to/your/optimium/output/directory to point to your output directory.

# Load runtime model
ctx = rt.Context()
model = ctx.load_model("/path/to/your/optimium/output/")
# (example) model = ctx.load_model("/workspace/outputs/MyDevice-num_thread_1-MyOptKey/MyOutputDir")
req = model.create_request()

4. Run Optimium model

The classified label is saved in output_label, and the median latency is indicated by optimium_infer_time.

# Test inference
# Prepare input image
inputs = {
    model.input_tensors_info[0].name: np_img
}
req.set_inputs(inputs)
records = []
for _ in range(warmup + repeat):
    start_time = time.time()
    req.infer()
    req.wait()
    end_time = time.time()
    records.append((end_time - start_time) * 1000000)
optimium_infer_time = np.median(records[warmup:])
optimium_output = req.get_outputs()
output_label = np.argmax(optimium_output)

5. Run the model using Tensorflow Lite runtime for comparison

Modify the model_path in line 2 to point to where you saved the TFLite file.

# Load the TFLite model
interpreter = tf.lite.Interpreter(model_path="tiny_mnist_example.tflite")
interpreter.allocate_tensors()

# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
input_data = np_img

# Measure inference time
interpreter.set_tensor(input_details[0]['index'], input_data)
records = []
for _ in range(warmup + repeat):
    start_time = time.time()
    interpreter.invoke()
    end_time = time.time()
    records.append((end_time - start_time) * 1000000)
tflite_infer_time = np.median(records[warmup:])
tflite_output = interpreter.get_tensor(output_details[0]['index'])

6. Compare the results

# Compare and verify output with TFLite
are_outputs_close = np.allclose(tflite_output, optimium_output, atol=1e-5)
if are_outputs_close:
    print("\n\nOutput tensor values match!")
    print("Output Label: ", output_label)
    print(f"TFLite Inference time: {round(tflite_infer_time, 3)} us")
    print(f"Optimium Inference time: {round(optimium_infer_time, 3)} us  ({round(tflite_infer_time/optimium_infer_time, 3)}x)")
else:
    print("\n\nOutput tensor values mismatch!")

In this tutorial, we've shown how to optimize model using Optimium & deploy the optimized model on your PC.

However, it is quite common to have a different serving device from the development device. For example, you may develop your AI model on your local machine and serve it on AWS EC2. In this case, you should optimize your model on AWS EC2, not on your local machine.

For this scenario, we provide remote setting. Please follow the next tutorial Set up device.

In addition, we have shown how to serve your compiled model using the Optimium Runtime Python API. Optimium Runtime supports not only Python but also C++ and Kotlin. For more details, please read this document or the following tutorials: Run pose detection in RPi5 with C++, Run face detection in Android with Kotlin.