Run MNIST w/ Python
What we are going to do
In this tutorial, we will run MNIST classification which we previously ran in Quickstart, on your local PC instead of inside Docker image we provided. We will optimize and compile the TFLite model and run the compiled model on your local PC using a single thread.
Requirements
We assume that you have installed Optimium and Optimium Runtime on your PC. If you have not, we highly recommend visiting Optimium Setup.
The process is divided into two steps: 1. Optimize, and 2. Deploy.
1. Optimize
Before you start
1. Download model and image files
You need the TFLite model and image files that were used in the Quickstart. You can download them here and copy them into your workspace directory.
tiny_mnist_example.tflite: A simple MNIST model to compare Optimium with TFLite
mnist_sample.jpg: A sample image to test Optimium
verify_output.py: A python script to compare performance and verify output of Optimium with TFLite
2. Set environmental variable
The environment variable WORKING_DIR
must be set before importing Optimium.
WORKING_DIR
is the path where Optimium-related logs and outputs are created.
cd <any workspace you may want>
export WORKING_DIR=$PWD
3. Create a user argument file template
In order to fully optimize your model for your target hardware, you need to provide hardware information.
Don't worry; you can simply follow the prompted steps (see detailed steps here)
First, run Python and enter the following commands:
import optimium
optimium.create_args_template()
Once you have followed all prompted instructions, user_arguments.json
will be created in your WORKING_DIR
.
Note: In this guide, for better accessibility, please selec the following options:
- When prompted with "Is your target device remotely connected?", select "no".
- When prompted with "Select your framework", enter "2".
- When prompted with "Enable hardware-specific auto-tuning?", select "yes".
Below is an example of a user_arguments.json
file created (for X86_64, in this case):
{
"license_key": null,
"device_name": "MyDevice",
"model": {
"input_shapes": [
[0, 0, 0, 0]
],
"framework": "tflite",
"tflite": {
"fp16": false,
"model_path": "YOUR_MODEL.tflite"
}
},
"target_devices": {
"host": {
"arch": "X86_64",
"os": "LINUX",
"mattr": "auto",
},
"CPU": {
"arch": "X86_64",
"platforms": [
"NATIVE"
]
}
},
"runtime": {
"num_threads": 1
},
"remote": {
"address": "localhost",
},
"optimization": {
"opt_log_key": "MyOptKey",
"enable_tuning": true
},
"out_dirname": "MyOutputDir"
}
4. Modify model information
Next, you need to update your model information as guided in the details here. You should change the "model_path" and "input_shapes" fields.
- "model_path" : The relative path of where you saved
tiny_mnist_example.tflite
to your workspace directory. (In fact, it is the relative path toWORKING_DIR
, which was set to your workspace directory in this previous step) - "input_shapes" : This model has one input with the shape [1, 28, 28, 1].
Modify the model information ("input_shapes" and "model_path") in user_arguments.json
as shown below:
{
...
"model": {
"input_shapes": [
[1, 28, 28, 1]
],
"framework": "tflite",
"tflite": {
"fp16": false,
"model_path": "[relative path to tiny_mnist_example.tflite]"
}
},
...
}
5. Set your license key information
Optimium requires a license. If you have not received a license, please check your email or contact us. To set the license, you can save your license key into user_arguments.json
inside a key "lincense_key"
.
# user_arguments.json
{
"license_key": "AAAAA-BBBBB-CCCCC-DDDDD",
"device_name": "MyDevice",
"model": {
...
},
...
}
Run Optimium
In your workspace, run python3
and execute following lines. This step optimizes and compiles the provided MNIST classification model.
import optimium
optimium.optimize_tflite_model(max_trials=64)
Optimium dynamically searches and tunes inference performance to fit your target hardware. It takes between 30 and 50 minutes depending on your machine's performance. We are accelerating the process!
Check output
The above step saves a result in $WORKING_DIR/outputs/
. The nested directory name depends on "device_name"
, "opt_log_key"
, and "out_dirname"
in user_arguments.json
.
You should get two files as shown below:
2. Deploy and test performance
We will verify and compare the optimized model with TFLite. Run python3
and execute the following lines of code.
1. Import packages
import numpy as np
import time
import optimium.runtime as rt
import numpy as np
from PIL import Image
import tensorflow as tf
warmup = 50
repeat = 50
2. Load sample image
Please change the location of the downloaded mnist_sample.jpg
as desired.
# Load the saved image
image_loaded = Image.open('mnist_sample.jpg')
np_img = np.array(image_loaded).astype(np.float32) / 255
np_img = np.expand_dims(
np.expand_dims(
np_img, axis=-1,
), axis=0,
)
3. Prepare to run Optimium
In line 3 of your code, modify /path/to/your/optimium/output/directory
to point to your output directory.
# Load runtime model
ctx = rt.Context()
model = ctx.load_model("/path/to/your/optimium/output/")
# (example) model = ctx.load_model("/workspace/outputs/MyDevice-num_thread_1-MyOptKey/MyOutputDir")
req = model.create_request()
4. Run Optimium model
The classified label is saved in output_label
, and the median latency is indicated by optimium_infer_time
.
# Test inference
# Prepare input image
inputs = {
model.input_tensors_info[0].name: np_img
}
req.set_inputs(inputs)
records = []
for _ in range(warmup + repeat):
start_time = time.time()
req.infer()
req.wait()
end_time = time.time()
records.append((end_time - start_time) * 1000000)
optimium_infer_time = np.median(records[warmup:])
optimium_output = req.get_outputs()
output_label = np.argmax(optimium_output)
5. Run the model using Tensorflow Lite runtime for comparison
Modify the model_path
in line 2 to point to where you saved the TFLite file.
# Load the TFLite model
interpreter = tf.lite.Interpreter(model_path="tiny_mnist_example.tflite")
interpreter.allocate_tensors()
# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
input_data = np_img
# Measure inference time
interpreter.set_tensor(input_details[0]['index'], input_data)
records = []
for _ in range(warmup + repeat):
start_time = time.time()
interpreter.invoke()
end_time = time.time()
records.append((end_time - start_time) * 1000000)
tflite_infer_time = np.median(records[warmup:])
tflite_output = interpreter.get_tensor(output_details[0]['index'])
6. Compare the results
# Compare and verify output with TFLite
are_outputs_close = np.allclose(tflite_output, optimium_output, atol=1e-5)
if are_outputs_close:
print("\n\nOutput tensor values match!")
print("Output Label: ", output_label)
print(f"TFLite Inference time: {round(tflite_infer_time, 3)} us")
print(f"Optimium Inference time: {round(optimium_infer_time, 3)} us ({round(tflite_infer_time/optimium_infer_time, 3)}x)")
else:
print("\n\nOutput tensor values mismatch!")
In this tutorial, we've shown how to optimize model using Optimium & deploy the optimized model on your PC.

This is what we've done here
However, it is quite common to have a different serving device from the development device. For example, you may develop your AI model on your local machine and serve it on AWS EC2. In this case, you should optimize your model on AWS EC2, not on your local machine.
For this scenario, we provide remote setting. Please follow the next tutorial Set up device.
In addition, we have shown how to serve your compiled model using the Optimium Runtime Python API. Optimium Runtime supports not only Python but also C++ and Kotlin. For more details, please read this document or the following tutorials: Run pose detection in RPi5 with C++, Run face detection in Android with Kotlin.
Updated 5 months ago