Tutorial
To run examples below, you need CMake 3.21 or above (if you decide to use CMake) and C++ compiler that supports C++17 (tested on GCC 8, Clang 10 or above).
C++
C++
1. Install prerequisites.
To run this example app, you need CMake 3.21 or above (if you decide to use CMake) and C++ compiler that supports C++17 (tested on GCC 8, Clang 10 or above).
# debian-based distros
sudo apt-get install build-essential cmake ninja-build
2. Install Optimium Runtime. Please click here to install the runtime.
3. Add Optimium Runtime for dependency.
find_package(Optimium-Runtime REQUIRED)
target_link_libraries(MyExecutable PRIVATE Optimium::Runtime)
# Use C++17
set(CMAKE_CXX_STANDARD 17)
pkg-config is supported for non-CMake users.
You can get compiler options via `pkg-config --libs --cflags optimium-runtime`.
Optimium Runtime requires C++17 to compile correctly. For that,
set(CMAKE_CXX_STANDARD 17)
to set C++ language version globally. Orset_target_properties(<TARGET> PROPERTIES CXX_STANDARD 17)
to use C++17 only for your cmake target.
IMPORTANT! If you're using Android, please refer to below code.
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE BOTH)
find_package(Optimium-Runtime REQUIRED)
target_link_libraries(MyExecutable PRIVATE Optimium::Runtime)
pkg-config is supported for non-CMake users.
You can get compiler options via `pkg-config --libs --cflags optimium-runtime`.
Plus, if you’re using Android you must add android:extractNativeLibs=true
in AndroidManifest.xml file.
<application ...
android:extractNativeLibs="true"
...>
4. Create a context.
Context is a manager object which is responsible for managing devices, remote connections, and loading models, etc.
Before loading a model, you should create a context.
You can modify the verbosity level and output path of the logger.
To give those change, use LogSettings
class before calling Context::create()
function.
#include <Optimium/Runtime.h>
#include <Optimium/Runtime/Utils/StreamHelper.h> // for logging purpose.
#include <Optimium/Runtime/Logging/LogSettings.h>
int main(...) {
// change verbosity to debug level
rt::LogSettings::setLogLevel(rt::LogLevel::Debug);
// add console log writer. this is no-op on Android devices.
rt::LogSettings::addWriter(rt::WriterOption::ConsoleWriter());
// add file log writer to "output.log" file.
rt::LogSettings::addWriter(rt::WriterOption::FileWriter("output.log"));
// add Android Logcat log writer. this is no-op on non-Android devices.
rt::LogSettings::addWriter(rt::WriterOption::AndroidWriter());
rt::Result<rt::Context> MaybeContext = rt::Context::create();
if (!MaybeContext.ok()) {
std::cout << "error: " << MaybeContext.error()
<< std::endl;
return 1;
}
rt::Context Context = MaybeContext.value();
// ...
}
If the value type is
Result<...>
, returned value by Optimium Runtime needs to be checked.Error check was excluded for the simplicity of Docs, but users must do the error check.
It is recommended to check available devices before loading a model.
DeviceNotFoundError
is a common error when you load a model without checking whether or not the required device is present.
You can get list of available devices from Context.getAvailableDevices()
function.
int main(...) {
// ...
// Iterate every devices to check the device exists.
bool Found = false;
for (const rt::Device &Dev : Context.getAvailableDevices()) {
if (Dev.getPlatform() == rt::PlatformKind::Native) {
Found = true;
break;
}
}
if (!Found) {
std::cout << "error: cannot find needed device."
<< std::endl;
}
// ...or you can try device functions to test the device exists.
rt::Result<rt::Device> MaybeDevice = rt::Device::native(Context);
if (!MaybeDevice.ok()) {
std::cout << "error: cannot find needed device: "
<< MaybeDevice.error() << std::endl;
}
}
5. (Optional) Connect to remote host
Optimium supports running inference on a remote host. However, before connecting to the remote host, you should install and run optimium-remote-server
CLI application first.
Please click here to install remote server.
After installing and launching the optimium-remote-server
, follow this code to connect to remote server.
RemoteContext represents a context of the remote host. Its role may look like Context
, but RemoteContext
only supports enumerating and creating devices.
int main(...) {
// ...
// connect to the remote host.
// host address can be IP address or domain.
// note that direct connection is only supported connection method.
rt::Result<rt::RemoteContext> RemoteContext =
Context.connectRemote("your-remote-address",
rt::ConnectionMethod::Direct).value();
// ...or try this if you changed port configuration.
rt::Result<rt::RemoteContext> RemoteContext =
Context.connectRemote("your-remote-address",
rt::ConnectionMethod::Direct,
YOUR_PORT).value();
// enumerating devices is identical to Context.
for (const rt::Device &Dev : RemoteContext.getAvailableDevices()) {
std::cout << Dev << std::endl;
}
}
You can run inference using both local and remote devices simultaneously.
But communicating between devices over the network is heavy operation. Therefore, it is not recommended for production applications to run model with local and remote devices or remote devices between different hosts.
6. Load a model
Model represents an ML model as you know. You can load a model via Context.loadModel()
function.
Optimium Runtime automatically search the devices described in the model, but it is limited to local devices.
To specify the device to run the model or to use a remote device, you should manually specify the device to use.
You can configure various threading-related settings(threads count, nice, ...) through the ModelOptions object.
(These settings are applied only when inferring the specified model.)
Unlike other AI inference engines like Tensorflow Lite, Optimium allows the model format to be a folder. Since the folder is considered as the model, you should type the path to the model when loading it from the Optimum Runtime. You must always copy the model along with its folder.
int main(...) {
// ...
// load a model with auto-detected devices.
rt::Model Model = Context.loadModel("path/to/model").value();
// load a model with manually specified devices.
rt::Device Devices[2] = {
// Use device for XNNPACK
rt::Device::xnnpack(Context),
// Use cpu device at remote host
rt::Device::native(RemoteContext)
};
rt::ModelOptions Opt;
std::vector<uint32_t> Cores {0, 1, 2, 3};
// set the number of threads to be used for running the model.
Opt.ThreadsCount = ThreadsCount;
// set which cores the threads used for inference will be assigned to.
Opt.Cores.assign(Cores.begin(), Cores.end());
// set the priority for threads running the model.
Opt.Nice = -19;
rt::Model Model =
Context.loadModel("path/to/model",
rt::ArrayRef(Devices), Opt).value();
// loadModel() also accepts single device
rt::Model Model =
Context.loadModel("path/to/model",
rt::Device::native(Context), Opt).value();
// ..
}
7. Listing model information
You can find information about the model using some informative functions.
To get a list of input or output tensors, use Model.getInputTensorsInfo()
function for input and Model.getOutputTensorsInfo()
function for output.
Functions are also provided if you want to get information about a single input or output tensor by its index or name.
int main(...) {
// ...
// get list of input tensor info
std::cout << "input tensors" << std::endl;
for (const rt::TensorInfo &Info : Model.getInputTensorsInfo())
std::cout << Info << std::endl;
// get list of output tensor info
std::cout << "output tensors" << std::endl;
for (const rt::TensorInfo &Info : Model.getOutputTensorsInfo())
std::cout << Info << std::endl;
// printing input tensor info
std::cout << Model.getInputTensorInfo(0).value()
<< std::endl;
std::cout << Model.getInputTensorInfo("input_0").value()
<< std::endl;
// printing output tensor info
std::cout << Model.getOutputTensorInfo(0).value()
<< std::endl;
std::cout << Model.getOutputTensorInfo("output").value()
<< std::endl;
}
TensorInfo, the return value of functions above, represents information about each tensor.
It contains the tensor's name, type, shape, alignment.
Quantization scheme is also contained if the tensor is quantized.
You can access member variables to get details of the tensor.
int main(...) {
// ...
rt::TensorInfo Info = Model.getInputTensorInfo("input_0").value();
std::cout << "name of tensor: "
<< Info.TensorName << std::endl;
std::cout << "shape of tensor: "
<< Info.TensorShape << std::endl;
std::cout << "alignment of tensor: "
<< Info.Alignment << std::endl;
std::cout << "type of tensor: "
<< Info.TensorType << std::endl;
std::cout << "name of tensor: "
<< Info.TensorName << std::endl;
if (Info.Scheme)
std::cout << "scheme of tensor: "
<< *(Info.Scheme) << std::endl;
// ...
}
8. Creating a request
InferRequest represents a single inference that the model runs. Users can create multiple InferRequest
s and execute the same model without interfering with other requests.
Additionally, users can achieve target throughput by queueing multiple requests efficiently.
Running multiple requests simultaneously is currently not available. It will be updated in a near future.
Creating request is done by calling Model.createRequest()
function.
int main(...) {
// ...
rt::InferRequest Request = Model.createRequest().value();
// ...
}
9. Getting a tensor
You can get a tensor from getInputTensor()
and getOutputTensor()
functions.
And you can use copyFrom()
and copyTo()
functions to copy data between the tensor and data buffer.
int main(...) {
// ...
constexpr size_t kInput0Size = ...;
constexpr size_t kOutputSize = ...;
// Assume that input and output buffers are prepared.
float* InputBuffer = ...;
float* OutputBuffer = ...;
// Get input tensor and put data
rt::Tensor Input0 = Request.getInputTensor("input_0").value();
Input0.copyFrom(InputBuffer, kInput0Size );
// Get output tensor and get data
rt::Tensor Output = Request.getOutputTensor("output").value();
Output.copyTo(OutputBuffer, kOutputSize);
}
Optimium Runtime checks the type of input data that the user puts and type of the tensor are compatible. If they do not match, it rejects with Status::TypeMismatch
.
Tensor type is defined for types not supported by C++ and that files located at Optimium/Runtime/Types
.
Tensor type and wrapped C++ type are listed below.
Element Type | C++ Type |
---|---|
ElementType.F16 | rt::float16 |
ElementType.BF16 | rt::bfloat16 |
ElementType.TF32 | rt::tfloat32 |
ElementType.QS8 | rt::qsfloat8 |
ElementType.QU8 | rt::qufloat8 |
ElementType.QS16 | rt::qsfloat16 |
ElementType.QU16 | rt::qufloat16 |
Optimium Runtime only recognizes those C++ types for corresponding tensor type when the runtime checks data type for the tensor. Other data types are not recognized and results in compilation error.
If you want to avoid the copy cost, you can directly access to the device's memory that backs up the tensor.
int main(...) {
// ...
rt::Tensor Tensor = Request.getInputTensor("input_0").value();
{
rt::BufferHolder Buffer = Tensor.getRawBuffer();
void *Ptr = Buffer.data();
// do some direct write
}
// ...
}
This backing buffer does not perform any checks - type check, range check, etc. Use this feature at your own risk
BufferHolder, a
Buffer
variable in the code above, is a RAII(Resource Acquisition Is Initialization) object that occupies memory only within its scope. Therefore, referencing this buffer from different scope should be avoided.
10. Running an inference
Running an inference needs just two lines of code: infer
and wait
.
Request.infer()
function requests starting the inference to the runtime and return immediately and Request.wait()
function waits for the previously requested inference to finish (Regardless of failure).
Request.wait()
function has argument, timeout
, represents milliseconds to wait. If the method returns false
, it represents the inference was finished before the timeout is reached. The method returns true
when the inference was not finished before the timeout is reached.
Note that Request.wait()
method should always be called to check error that was happened during inference. It might cause undefined behavior if the request starts the inference in a fault state.
To check the state of the request, use Request.getStatus()
function.
int main(...) {
// ...
// waits until inference is finished.
Request.infer();
Request.wait();
// waits 500 milliseconds to finish inference
using namespace std::chrono_literals;
Request.infer();
if (Request.wait(500ms).value())
std::cout << "inference not finished after 500ms"
<< std::endl;
else
std::cout << "inference was finished within 500ms"
<< std::endl;
// check the state of the request
std::cout << "status of request: "
<< Request.getStatus()
<< std::endl;
}
Python
Python
Do not put any Optimium Runtime related objects in the global scope or create circular references to them.
This can lead to memory leakages or undefined behavior due to differences in memory management model between C++ and Python.
1. Install Optimium Runtime. Please click here to install the runtime.
2. Create a context
Context is a manager object which is responsible for managing devices, remote connections, and loading models, etc.
Before loading a model, you should create a context.
You can modify the verbosity level and output path of the logger.
import optimium.runtime as rt
def main():
context = rt.Context(
# change verbosity to debug level.
verbosity=rt.LogLevel.Debug,
# change log output from stdout to "output.log" file.
# default is writing log at stdout.
log_path="output.log"
)
It is recommended to check available devices before loading a model.
DeviceNotFoundError
is a common error when you load a model without checking whether or not the required device is present.
You can get a list of available devices context.available_devices
property.
def main():
# ...
# Iterate every devices to check the device exists.
for dev in context.available_devices:
if dev.platform == rt.PlatformKind.Native:
found = True
if not found:
print("cannot find needed device")
# ...or you can try device functions to test the device exists.
try:
rt.Device.native(context)
except rt.DeviceNotFoundError as ex:
print(f"cannot find needed device: {ex}")
3. (Optional) Connect to remote host
Optimium supports running inference on a remote host. However, before connecting to the remote host, you should install and run optimium-remote-server
CLI application first.
Please refer here to install remote server.
After installing and launching optimium-remote-server
, follow this code to connect to the remote server.
RemoteContext represents a context of the remote host. Its role may look like Context
, but RemoteContext
only supports enumerating and creating devices.
def main():
# ...
# connect to the remote host.
# host address can be IP address or domain.
remote_context = context.connect_remote("your-remote-address")
# ...or try this if you changed port configuration
remote_context = context.connect_remote("your-remote-address", port=YOUR_PORT)
# enumerating devices is identical to Context. Same property, same device function.
for dev in remote_context.available_devices:
print(dev)
remote_native = rt.Device.native(remote_context)
You can run inference using both local and remote devices simultaneously.
But communicating between devices over a network incurs costs. Therefore, it is not recommended for production applications to run models with local and remote devices or remote devices between different hosts.
4. Load a model
Model represents an ML model as you know. You can load a model via context.load_model()
method.
Optimium Runtime automatically finds and uses devices described in the model, but it is limited to local devices. To specify the device that runs the model or to use a remote device, you should manually specify the device to use.
You can configure threading-related settings by passing them as parameters to the context.load_model
function.
(These settings are applied only when inferring the specified model.)
Unlike other AI inference engines like Tensorflow Lite, Optimium allows the model format to be a folder. Since the folder is considered as the model, you should type the path to the model when loading it from the Optimum Runtime. You must always copy the model along with its folder.
def main():
# ...
# load a model with auto-detected devices.
model = context.load_model("path/to/model")
# load a model with manually specified devices.
devices = [
# Use device for XNNPACK
rt.Device.xnnpack(context),
# Use cpu device at remote host
rt.Device.native(remote_contex),
]
model = context.load_model("path/to/model", devices
# set the number of threads to be used for running the model.
threads_count = 4,
# set which cores the threads used for inference will be assigned to.
cores = [0, 1, 2, 3],
# set the priority for threads running the model.
nice = -19)
5. Listing model information
You can find information about the model using some informative methods.
To get a list of input or output tensors, use model.input_tensors_info
property for input and model.output_tensors_info
property for output.
Methods are also provided if you want to get information about a single input or output tensor by its index or name.
def main():
# ...
print("input tensors")
for info in model.input_tensors_info:
print(info)
print("output tensors")
for info in model.output_tensors_info:
print(info)
# printing input tensor info
print(model.get_input_tensor_info(0))
print(model.get_input_tensor_info("input_0"))
# printing output tensor info
print(model.get_output_tensor_info(0))
print(model.get_output_tensor_info("output"))
TensorInfo, the return value of the properties and methods above, represents information about each tensor. It contains the tensor's name, type, shape, alignment. Quantization scheme is also contained if the tensor is quantized.
You can access properties to get details of the tensor.
def main():
# ...
info = model.get_input_tensor_info("input_0")
print(f"name of tensor: {info.name}")
print(f"shape of tensor: {info.shape}")
print(f"alignment of tensor: {info.alignment}")
print(f"type of tensor: {info.type}")
if info.scheme:
print(f"quantization scheme of tensor: {info.scheme}")
6. Creating a request
InferRequest represents a single inference that the model runs. Users can create multiple InferRequest
s and execute the same model without interfering with other requests.
Additionally, users can achieve target throughput by queueing multiple requests efficiently.
Running multiple requests simultaneously is currently not available. It will be updated in near feature.
Creating request is done by calling model.create_request()
method.
def main():
# ...
request = model.create_request()
7. Getting a tensor
Optimium Runtime for Python supports NumPy's np.ndarray
natively. You can read and write tensor data as NumPy's ndarray.
request.set_inputs()
accepts single np.ndarray
, sequence of np.ndarray
or dictionary of string key to np.ndarray
and request.get_outputs()
returns list of np.ndarray
.
# ...
import numpy as np
def main():
# ...
input_0 = np.ones((1, 2, 3), dtype=np.float32)
input_1 = np.ones((1, 2, 3), dtype=np.float32)
# this form is allowed when the model has single input.
request.set_inputs(input_0)
# use list or tuple to put multiple data at once.
request.set_inputs([input_0, input_1])
# dictionary is also supported.
request.set_inputs({
"input_0": input_0,
"input_1": input_1
})
# to get outputs, use get_outputs()
outputs = request.get_outputs()
You can also access Tensor
of the request to access tensor data.
def main():
# ...
# get tensor by name
input_0 = request.get_input_tensor("input_0")
# get tensor by index
input_1 = request.get_input_tensor(1)
# get data from the tensor
input_0_data = input_0.to_numpy()
# put data into the tensor
input_1.copy_from(np.ones((1, 2, 3), dtype=input_1.type.to_dtype()))
8. Running an inference
Running an inference is just two lines of code: infer
and wait
.
request.infer()
method requests starting the inference to the runtime and return immediately and request.wait()
method waits previously requested inference until it finished (regardless of failure).
request.wait()
method has argument, timeout
, represents milliseconds to wait. if the method returns False
, it represents the inference was finished before the timeout is reached. The method returns True
when the inference was not finished before the timeout is reached.
Note that request.wait()
method should always be called to check error that was happened during inference. It might cause undefined behavior if the request started the inference in fault state.
To check the state of the request, use request.status
property.
def main():
# ...
# waits until inference is finished.
request.infer()
request.wait()
# waits 500 milliseconds to finish inference
request.infer()
if request.wait(500):
print("inference not finished after 500ms")
else:
print("inference was finsihed within 500ms")
# check status of the inference.
print(f"current state of request: {request.status}")
request.wait()
Kotlin
Kotlin
1. Import Optimium Runtime to your project.
Click here to see how to import Optimium Runtime into your project.
If you’re using Android you must add android:extractNativeLibs=true
in AndroidManifest.xml file.
<application ...
android:extractNativeLibs="true"
...>
2. Create a context
Context is manager object that is responsible for manages devices, remote connection and loading models, etc.
Before you load model, you should create a context.
Creating context is done by ContextFactory
. You can change verbosity of logger and output path of the logger.
// ...
import com.enerzai.optimium.runtime.Context
import com.enerzai.optimium.runtime.ContextFactory
fun main(...) {
val factory = ContextFactory()
// change verbosity to debug level.
factory.verbosity(LogLevel.DEBUG)
// add file log writer to "output.log" file.
factory.enableFileLog(File("output.log"))
// add console log writer. note that this is no-op on Android devices.
factory.enableConsole()
// add Android logcat log writer. note that this is no-op on non-Android devices.
factory.enableLogcat()
factory.create().use { context ->
// ...
}
// ...or use traditional way.
val context = factory.create()
// do something important
// you should close after use.
context.close()
}
Users should call
close()
method inContext
,Model
andInferRequest
objects after use. You can also use Java's try-with-resource statement or Kotlin'suse()
function.
It is recommended to check available devices before you loading a model.
DeviceNotFoundException
is a common error when you load a model without checking whether or not the required device is present.
You can get list of available devices context.availableDevices
property.
// ...
import com.enerzai.optimium.runtime.Devices.native
import com.enerzai.optimium.runtime.PlatformKind
import com.enerzai.optimium.runtime.exceptions.DeviceNotFoundException
fun main(...) {
// ...
// Iterate every devices to check the device exists.
val dev = context.availableDevices.find {
it.platform == PlatformKind.NATIVE
}
if (dev != null) {
println("cannot find neeeded device")
}
// ...or you can try device functions to test the device exists.
val dev = try {
native(context)
} catch (ex: DeviceNotFoundException) {
println("cannot find needed device: $ex")
null
}
}
3. Connect to remote host (Optional)
Optimium supports running inference on remote host. However, before connecting to the remote host, you should install and run optimium-remote-server
CLI application first.
Please refer here to install remote server.
After installing and launching optimium-remote-server
, follow this code to connect remote server.
RemoteContext represents a context of the remote host. Its role may look like Context
, but RemoteContext
only supports enumerating and creating devices.
// ...
import com.enerzai.optimium.runtime.RemoteContext
fun main(...) {
// ...
// connect to the remote host.
// host address can be IP address or domain.
val remoteContext =
context.connectRemote("your-remote-address")
// ...or try this if you changed port configuration.
val remoteContext =
context.connectRemote("your-remote-address",
port = YOUR_PORT)
// enumerating devices is identical to Context. Same property, same device function.
remoteContext.availableDevices.forEach { println("$it") }
val remoteNative = native(remoteContext)
}
You can run inference using local and remote devices simultaneously.
But communicating between devices over the network is heavy operation. Therefore, it is not recommended for production applications to run model with local and remote devices or remote devices between different hosts.
4. Load a model
Model represents an ML model as you know. You can load a model via context.loadModel()
method.
Optimium Runtime automatically finds and uses devices described in the model, but it is limited to local devices. To specify device that runs the model or to use a remote device, you should manually specify the device to use.
You can configure threading-related settings by passing them as parameters to the context.loadModel()
function. (These settings are applied only when inferring the specified model.)
Unlike other AI inference engines like Tensorflow Lite, Optimium allows the model format to be a folder. Since the folder is considered as the model, you should type the path to the model when loading it from the Optimum Runtime. You must always copy the model along with its folder.
// ...
import com.enerzai.optimium.runtime.Model
import com.enerzai.optimium.runtime.Devices.xnnpack
import java.io.File
fun main(...) {
// ...
// load a model with auto-detected devices.
context.loadModel(File("path/to/model")).use { model ->
// do something useful
}
// load a model with manually specified devices.
val devices = listOf(
// Use device for XNNPACK
xnnpack(context),
// Use cpu device at remote host
native(remoteContext)
)
context.loadModel(File("path/to/model"), devices,
// set the number of threads to be used for running the model.
threadsCount = 4,
// set which cores the threads used for inference will be assigned to.
cores = listOf(0, 1, 2, 3),
// set the priority for threads running the model.
nice = -19
).use { model ->
// do something useful
}
}
5. Listing model information
You can find information about the model using some informative methods.
To get a list of input or output tensors, use model.inputTensorsInfo
property for input and model.outputTensorsInfo
property for output.
Methods are also provided if you want to get information about a single input or output tensor by its index or name.
// ...
import com.enerzai.optimium.runtime.TensorInfo
fun main(...) {
// ...
println("input tensors")
model.inputTensorsInfo.forEach {
println("$it")
}
println("output tensors")
model.outputTensorsInfo.forEach {
println("$it")
}
// printing input tensor info
println("${model.getInputTensorInfo(0)}")
println("${model.getInputTensorInfo("input_0")}")
// printing output tensor info
println("${model.getOutputTensorInfo(0)}")
println("${model.getOutputTensorInfo("input_0")}")
}
TensorInfo, the return value of properties and methods above, represents information of each tensor. It contains the tensor's name, type, shape, alignment. Quantization scheme is also contained if the tensor is quantized.
You can access properties to get details of the tensor.
fun main(...) {
// ...
val info = model.getInputTensorInfo("input_0")
with(info) {
println("name of tensor: $name")
println("shape of tensor: $shape")
println("alignment of tensor: $alignment")
println("type of tensor: $type")
if (scheme != null) {
println("scheme of tensor: $scheme")
}
}
}
6. Creating a request
InferRequest represents a single inference that the model runs. Users can create multiple InferRequest
s and execute same model without interfering with other requests.
Additionally, users can achieve target throughput by queueing multiple requests efficiently.
Running multiple requests simultaneously is currently not available. It will be updated in near feature.
Creating request is done by calling model.createRequest()
method.
// ...
import com.enerzai.optimium.runtime.InferRequest
fun main(...) {
// ...
model.createRequest().use { request ->
// do something useful
}
}
7. Getting a tensor
Optimium Runtime for Kotlin supports primitive array and Buffer classes to read and write tensor data.
// ...
import java.nio.FloatBuffer
fun main(...) {
// ...
val buffer = FloatBuffer.allocate(...)
request.setInput("input_0", buffer)
// indexes are also supported
request.setInput(0, buffer)
val array = FloatArray(...)
// get output data
request.getOutput("output", array)
}
You can also access Tensor
of the request to access tensor data.
// ...
import com.enerzai.optimium.runtime.Tensor
fun main(...) {
// ...
val input0 = request.getInputTensor("input_0")
val array = FloatArray(...)
// Put data into the tensor
input0.copyFrom(array, offset = ...)
// Get data from the tensor
input0.copyTo(array, offset = ...)
}
Due to limitation of Java, types that cannot be expressed in Java are changed to other types that have same size.
Accepted Java types for the tensor are listed below:
Java Types | ElementType |
---|---|
Byte (ByteBuffer) | Allowed for any types |
Short (ShortBuffer) | I16, U16, F16, QS16, QU16 |
Int (IntBuffer) | I32, U32 |
Long (LongBuffer) | I64, U64 |
Float (FloatBuffer) | F32 |
Double (DoubleBuffer) | F64 |
Boolean | BOOL |
Please be aware that Java types and actual tensor type can be different. Check your tensor type before use values on Java.
Buffer for
ElementType.BOOL
is not supported: Java does not provide equivalent one.
8. Running inference
Running inference needs just two lines of code: infer
and waitForFinish
.
request.infer()
method requests starting the inference to the runtime and return immediately and request.waitForFinish()
method waits for the previous requested inference to finish (regardless of failure).
request.waitForFinish()
method has argument, timeout
, represents milliseconds to wait. If the method returns false
, the inference was finished before the timeout is reached. The method returns true
when the inference was not finished before the timeout is reached.
Note that request.waitForFinish()
method should always be called to check if an error occurred during inference. It might cause undefined behavior if the request started the inference in a fault state.
To check the state of the request, use request.status
property.
fun main(...) {
// ...
// waits until inference is finished.
request.infer()
request.waitForFinish()
// waits 500 milliseconds to finish inference
request.infer()
if (request.wait(500)) {
println("inference not finished after 500ms")
} else {
println("inference was finished within 500ms"
}
// check status of the request
print(f"current state of request: {request.status}")
}
Updated 5 months ago