The number nobody talks about enough
Every vision-based safety system will happily quote you a detection accuracy. Far fewer are upfront about latency. And yet in industrial environments, where a forklift can cover a metre in less than the blink of an eye, accuracy without speed is not safety. It is reporting.
The practical threshold we hold ourselves to is 150 milliseconds, measured end-to-end from the frame arriving at the camera sensor to the action being physically triggered at the safety output. Below that number, a system can realistically prevent incidents. Above it, the system can only help you understand what happened after the fact.
Where the 150 ms budget actually goes
It is worth breaking the number down, because a casual reader might assume the whole budget is spent on inference. It is not. A realistic end-to-end budget looks something like this:
- •Capture and transport: frame acquisition plus the time to get pixels to the compute unit.
- •Pre-processing: decoding, resizing, and preparing the frame for inference.
- •Inference: the detection model itself producing bounding boxes and confidence scores.
- •Rule evaluation: zone intersection, persistence, and policy checks.
- •Action: the command being issued to the safety output (relay, gate, warning, light).
- •Electromechanical response: the output actually changing state on site.
Every stage spends some of the budget. Cloud round-trips spend a lot of it. A system that pipes frames to a remote service for inference has effectively written off the budget before the model has even run.
Why edge compute is not optional
This is the practical reason active safety has to run at the edge. Not because edge compute is fashionable, but because the numbers do not add up any other way. Cloud-based inference almost always means 200 ms or more just in network and queuing overhead, and that assumes the connection is healthy.
Edge compute, done properly, can consistently hit the budget. That means hardware chosen for deterministic inference performance, models optimised for the platform rather than ported from a research notebook, and a rules engine that is not waiting on the internet before it can make a decision.
Active versus advisory: a different category
When a system reliably meets a sub-150 ms budget, it stops being a monitoring tool and starts being a safety system. The distinction matters for three reasons:
- ✓Risk assessment: an active system can be designed into the safety case as a genuine risk-reduction measure.
- ✓Behaviour on site: operators and drivers adjust to a system that actually stops things happening.
- ✓Evidence: time-stamped actions, not just time-stamped observations, demonstrate the system prevented an event.
What to ask your vendor
If you are evaluating a vision-based safety system, these are the questions worth pushing on:
- •What is the end-to-end latency from camera sensor to safety output, measured on site rather than in a lab?
- •Where does the inference actually run, and what happens to latency if the internet connection is lost?
- •How is the latency budget spent across capture, inference, rules, and action?
- •Can you show me the action record for a real incident, and the time between detection and output?
If a vendor cannot answer those clearly, you are probably looking at an advisory system, not an active one. That may still be valuable. It is just not the same category of product.
Where we land on this
SAiFI is designed around the 150 ms threshold from the ground up. Edge inference on purpose-chosen vision hardware, a deterministic rules engine, direct integration with safety outputs, and offline-first operation. The result is a system that acts inside the window where action actually changes the outcome.
If you would like to see what that looks like on your site, our team would be glad to run you through a deployment or a live demonstration.



