Radically more efficient deep learning to enable inference on low-power hardware.
We are working on a future in which an air conditioner turns off when we leave the room. A warehouse directs us to the missing box. And a home lets us know if our elderly need help. It’s a future that is safer and more thoughtful thanks to intelligent sensors inside all the things around us.
Yesterday,
you had to sit at your PC.
Today, you take your
smartphone with you.
Tomorrow, intelligent sensors
turn computing ambient.
A large number of intelligent chips are required to make computing ambient, as a result they have to be small, cheap and low-power. Our technology works on $1 chips that consume less than 10 milliwatts. These are so efficient, they can be powered by a coin battery or small solar cell.
Processing data on the device is inherently more reliable than a connection with the cloud. Intelligence shouldn’t have to depend on weak WiFi.
Sending data from the sensor to the cloud, processing the data, and sending it back again takes time. Sometimes whole seconds. This latency is problematic for products that need to respond to sensor input in real-time.
Running AI in the cloud comes with significant recurring compute costs. Executing the AI on the device instead saves several dollars per month in additional cloud costs.
Sending sensor data such as audio and video to the cloud increases privacy and security risks. To reduce abuse and give people confidence to let intelligent sensors into their lives, the data should not leave the device.
Ubiquitous connected sensors would overwhelm the network. Plumerai software only uses the network when it has something to report. This keeps bandwidth and mobile data costs low.
The farther we move data, the more energy we use. Sending data to the cloud uses a lot of energy. Processing data on-chip is more efficient by orders of magnitude. If a device needs a battery life of months or years, data needs to be processed locally.
Plumerai has developed a complete software solution for camera-based people detection. Trained with over 30 million images, our software detects people with a very high accuracy under a wide variety of conditions. These AI models are so small that they even run on Arm Cortex-M microcontrollers. On Arm Cortex-A CPUs, processor load is minimal such that there’s plenty of compute available for additional applications running on the same device.
Plumerai’s inference engine is the fastest and most memory-efficient in the world, confirmed by MLPerf. It accelerates any neural network. So whether you're developing speech recognition for your microwave, breaking glass detection for your alarm, or activity recognition with an IMU sensor, our inference engine speeds it up. It gives an average speedup of 1.7x, a RAM reduction of 2.0x and a code size reduction of 2.2x without changing the accuracy.
Deep learning models can have millions of parameters and these parameters are encoded in bits. Where others require 32, 16 or 8 bits, Binarized Neural Networks use only 1 single bit for each parameter. This property makes it possible to perfectly calibrate the model to get the maximum performance out of every bit.
Deep learning models can have millions of parameters and these parameters are encoded in bits. Where others require 32, 16 or 8 bits, Binarized Neural Networks use only 1 single bit for each parameter. This property makes it possible to perfectly calibrate the model to get the maximum performance out of every bit.
A BNN needs drastically less memory to store its weights and activations than an 8-bit deep learning model. This saves energy by reducing the need to access off-chip memory and makes it feasible to deploy deep learning on more affordable memory constrained devices.
In addition to this, BNNs are also computationally radically more efficient. Convolutions are an essential building block of deep learning models. They consist of additions and multiplications, which – because the complexity of a multiplier is proportional to the square of the bit-width – can be replaced in a BNN by the simple POPCOUNT and XNOR operations.
Information is lost when the weights and activations are encoded using 1 bit instead of 8 bits. This affects the accuracy of the model.
Furthermore, the activation functions inside BNNs do not have a meaningful derivative. This is a problem when deep learning models are trained using gradient descent.
We solved these issues through our research and technology:
Helwegen et al, Latent weights do not exist: Rethinking binarized neural network optimization, NeurIPS (2019)
Bannink et al, Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-Art Binarized Neural Networks, MLSys (2021)
Larq: Plumerai’s ecosystem of open-source Python packages for Binarized Neural Networks
We combine our optimized inference engine with our collection of tiny AI models to provide turnkey software solutions. These are highly accurate and so efficient that they run on nearly every off-the-shelf chip. And where applicable, we also provide our IP-core for FPGAs.
We develop our neural networks from scratch. We optimize for small embedded devices with customized model architectures and training strategies, based on our world-class research on model quantization. This results in tiny but highly accurate AI models.
Our inference software runs AI very efficiently on microcontrollers. Our inference engines are optimized for ARM Cortex-M, ARM Cortex-A, and RISC-V processors.
We collect, label and build our own datasets. Our data pipeline identifies failure cases to ensure that our models are highly reliable and accurate.
For customers that use FPGAs and require the most energy-efficient solution, we provide a custom IP-core that is highly-optimized to run any AI model.
Plumerai’s full software solution for people detection and other AI tasks is available to our customers, along with our inference engine and our IP-core for FPGAs. Let us know if you would like to evaluate our technology or discuss how Plumerai can address your application.
Get startedWe’re partnering with industry experts