Blaize Lights up AI Processing with GSP

Article By : Junko Yoshida

The startup's fully programmable Graph Streaming Processor will go into volume production in Q2 2020.

TOKYO — AI processor designer Blaize, formerly known as ThinCI (pronounced “Think-Eye”), revealed its fully programmable Graph Streaming Processor (GSP) will go into volume production in the second quarter of 2020.

While the six-year-old startup is mum on its product specifications — such as power level and benchmarking results — its test chip, taped out in mid-2018 and housed in a Linux-based box, has been engaged in 16 pilot programs worldwide for a year, claimed Blaize co-founder and CEO Dinakar Munagala.

Blaize describes its GSP as capable of performing “direct graph processing, on-chip task-graph management and execution, and task parallelism.” In short, Blaize designed the GSP to fulfill AI processing needs that have been previously unmet by GPU, CPU or DSP.

To many industry analysts covering AI processors, this is a pitch they’ve heard before.

Kevin Krewell, principal analyst at Tirias Research, said, “I know a bit about ThinCI, but never got the architecture pitch. I’m glad they changed the name though.”

The dearth of technical details on the GSP architecture in its slide presentation is feeding frustration and skepticism in the tech analyst community. Munagala, however, promises an information release in the first quarter of 2020.

High-level block diagram of the GSP architecture

Click here for larger image (Source: Blaize)

Click here for larger image

(Source: Blaize)

The GSP architecture consists of an array of graph streaming processors, dedicated math processors, hardware control and various types of data cache. The company claims that the GSP can offer: “True task-level parallelism, minimal use of off-chip memory, depth-first hardware graph scheduling, fully programmable architecture.”

Getting on a qualified vendor list
The good news for Blaize in Munagala’s mind is a crowd of early customers already using its GSP. For a year, Blaize has been shipping a desktop unit with GSP. It can be simply plugged into a power socket and connected to Ethernet. Data scientists, software and hardware developers are already evaluating system-level functions enabled by GSP, Munagala said.

Blaize, with $87 million in funding, is backed by early investors and partners including Denso, Japan’s tier one, Daimler and Magna. “We’ve been also making revenue from the automotive segment since a couple of years ago,” said Munagala.

With a taped-out chip in hand, many startups face a “What do we do now?” dilemma. Richard Terrill, vice president and strategic business development at Blaize, told EE Times, “We already passed that stage a year ago.”

Blaize has turned its focus to building out its infrastructure by beefing up an engineering team (now as big as 325 people) that stretches to California, India and the U.K. It is moving to new facilities and starting to hire field application engineers in Japan and EMEA. “We are keeping our momentum going,” said Munagala.

For Blaize, its GSP business is no longer about competing with rival startups on the specs in power point presentations. It’s about figuring out how customers will be using its GSP for which applications — and how much power it consumes “on a system level” in specific uses.

Blaize has been busy nailing down its logistics, getting its products automotive-qualified, and making sure the internal process and documentation are certified. “We’ve already gone through an auditing process and we are on an approved and qualified vendor list” of one automotive client, said Munagala. This was a much-needed process enforced by carmakers and tier ones, who prefer to avoid startups that might not last long enough to deliver products.

Blaize hired some 30 engineers in the UK (in Kings Langley and Leeds), assigned to work on automotive product development. They are a tightly knit team of engineers set loose when Imagination divested MIPS. “These are a bunch of highly qualified individuals who worked together at MIPS to get MIPS-based ASICs automotive-qualified for Mobileye,” explained Munagala.

Graph computing
Although AI comes in many different types of neural networks, “all neural networks are graph-based,” explained Munagala. In theory, this allows developers to leverage the graph-native structure to build multiple neural networks and entire workflows on a single architecture. Hence the company’s new marketing pitch for its GSP is “100 percent graph-native.”

However, Blaize isn’t exactly a unicorn in the graph-computing universe. Graphcore, Mythic and now failing Wave Computing have all talked about “optimization and compilation of dataflow graphs” in AI processing.

Terrill said, “Of course, graph computing has a more than 60 years of history.”

Blaize GSP claims a distinction from other graph-based data flow processors in three areas, said Munagala. First, “Our GSP is fully programmable,” capable of performing “a wide range of tasks,” he said.

Second, it is “dynamically reprogrammable … on a single clock cycle.”

Third, “We offer the integration of streaming,” which makes it possible to minimize latency. The massive efficiency multiplier is delivered via “a data streaming mechanism,” where non-computational data movement is minimized or eliminated, he explained.

Sequential execution processing

Click here for larger image (Source: Blaize)

Click here for larger image

(Source: Blaize)

The graph-native nature of the GSP architecture can minimize data movement back and forth to external DRAM. Only the first input and final output are needed externally, while everything else in the middle is just temporary, intermediate data. This results in massively reducing memory bandwidth and power consumption.

Graph streaming execution processing

Click here for larger image (Source: Blaize)

Click here for larger image

(Source: Blaize)

The stated goals for Blaize systems are “the lowest possible latency, reduction in memory requirements and energy demand at the chip, board and system levels.”

Asked if Blaize’s graph-computing design will be patent-defensible, Mungala said, “We feel confident about our patent portfolio. We have multiple patents — some already granted and others applied, but we’ve been doing this for multiple years.”

Lessons learned from pilot programs
By rolling out its GSP-embedded desktop units early, Munagala said, “We got entry tickets to get into the real customers and their workload needs.”

Through these pilots, Blaize was pleasantly surprised to learn of a lot more applications and market segments for its GSP-based platform. The company sees its GSP addressing markets ranging from automotive and smart vision (surveillance) to enterprise computing.

Without disclosing performance-per-watt specs, Munagala claimed that the Blaize GSP can meet lower energy constraints imposed everywhere from data centers in the cloud to the edge where data is collected.

Driving momentum for Blaize’s GSP, according to Munagala, are rapidly changing deep-learning technologies that include new topologies, neural networks and algorithms. Such fast-paced technology advancements often prompt clients to tell Blaize, “I wish your chip could do XYZ,” explained Munagala. That’s where fully programmable GSP architecture comes into play.

In the automotive segment, for example, the GSP will be applied to intelligent telematics, ADAS, driver monitoring and occupant assessment, explained Munagala. The key is a single GSP architecture for many automotive applications.

A single chip, for example, can perform ISP processing, semantic segmentation and sensor fusion combined with lidar sensor input and point cloud. This is accomplished by using the two concurrently executing data-flow graphs of video object detection and lidar ranging.

Since Blaize’s key market is automotive (and automotive customers always ask for determinism), the GSP architecture is “deterministic,” said Mungala.

Because of its engagements with many pilot programs, Blaize ended up developing a tool called “Blaize NetDeploy.” Early on, Terrill said, “We realized that many customers have been spending weeks” in the arduous tasks of optimizing a trained AI model on GPU and converting it for deployment on inference engines. What’s needed, said Terrill, is “a software tool that can accelerate the process of quantization, pruning and compressing” the neural network.

Click here for larger image (Source: Blaize)
Click here for larger image

(Source: Blaize)

Blaize NetDeploy, now a part of the company’s software development kit (SDK), automated that process and “bridged efficiency and usability gap between ML/AI training and inference,” according to Blaize. Blaize proved that the process that once took two to three weeks can be done “in minutes,” said Terrill. “You can do this in a single path.” He adedd that as soon as system designers found out about this tool, they told Blaize, “I have to have this.”

Denso connection
Blaize’s partnership with Denso is well known, as its relationship goes back to a few years. Denso’s semiconductor subsidiary called NSITEXE, founded in Tokyo, 2017. With a goal to deliver chip solutions for advanced automated driving systesm, NSITEXE is developing a data flow processor (DFP) by leveraging Blaize’ GSP architecture.

The Japanese company describes the DFP as it can “rapidly perform multiple complex calculations by instantaneously optimizing calculating areas according to the amount and content of information.” The DFP does so, while “minimizing power consumption and heat generation,” the company explaiend. NSITEXE is aiming to commercialize the DFP in the first half of the 2020s.

Subscribe to Newsletter

Test Qr code text s ss