AI inference hardware startup Untether AI has secured new $ 125 million funding to deploy its novel architecture to its first commercial customers in edge and data center environments.
Intel Capital has been a major investor in Untether AI since its inception in 2018. When we shared an in-depth look at their architecture with its CEO in October 2020, the Toronto-based startup had already raised $ 27 million and tested its runAI200 devices. The team, made up of several former FPGA hardware engineers, was optimistic about the potential for custom ASICs for ultra-low power interference, and so are its investors, apparently.
This latest round of funding, led by Tracker Capital and Intel Capital, has also attracted a new investor, the Canada Pension Plan Investor Board (CPP Investments), who will raise the money for the country’s $ 20 million annuity program with a total fund volume of over $ 492 billion. Dollars managed.
These are still the early days for the inference startup, but they managed to secure systems integrator Colfax to transport their tsunAlmi accelerator cards for edge servers along with their imAIgine SDK. Each of the cards has four of the runAI200 devices described here, which, according to Untether, can deliver 2 Petaops in terms of top computing power. In its own benchmarks it says that this corresponds to 80,000 frames per second for ResNet-50 (batch size 1) and 12,000 queries per second for BERT.
The startup focuses on Int-8, low-latency server-based inference, just looking at small batch sizes (Batch 1 was at the heart of their design process). The company’s CEO, Arun Iyengar (you may know his names from leadership positions at Xilinx, AMD and Altera) says they are looking for NLP, recommendation engines and vision systems for the applications where fintech is at the top of their market list , although he was quick to point out that this was less about high-frequency trading and more about broader portfolio balancing (asset management, risk allocation, etc.), as AI has real pull there.
The heart of the unique at-memory computing architecture is a memory bank: 385 KB SRAM with a 2D array of 512 processing elements. With 511 banks per chip, each device offers 200MB of memory, enough to run many networks on a single chip. And with the multi-chip partitioning function of the imAIgine software development kit, larger networks can be split up in order to run on multiple devices or even over multiple tsunAImi accelerator cards.
He also says their low-power approach would be well suited for local centers that do large-scale video aggregation (e.g. smart cities, retail establishments). He readily admits they are starting with these use cases rather than bravely with ambition to find a place among the hallowed hyperscalers, but says there is enough market for low-power, high-performance devices to find their niches.
With no public customers for its early silicon, the company is attractive beyond funding and the uniqueness of the architecture. It has some seasoned people to assist with engineering, including Alex Grbic who leads software development and is known for a long career at Altera. On the hardware engineering side, Alex Michael von Untether, also from Altera, brings decades of experience in IC design, product and manufacturing.
While the manufacturer says there are explosive opportunities for custom inference devices in the data center and at the edge, it remains to be seen who will be the winners and losers in the inference startup game. In our view, the edge opportunity has more wiggle room than the big data centers we focus on here at TNP, and it’s going to be a long, uphill battle to get these high quality (high margin) customers out of their CPU / GPU positions displace.
Register for our newsletter
With highlights, analyzes and stories from the week straight from us in your inbox, without in between.