芯片顶会Hot Chips报告免费放送:一文了解英伟达A100(英文)-18页 

Baidu Kunlun An AI processor for diversified workloadsJian Ouyang, 1 ( ouyangjian@baidu.com ) Mijung Noh2, Yong Wang 1, Wei Qi 1, Yin Ma 1, Canghai Gu 1, SoonGon Kim2, Ki-il Hong2, Wang-Keun Bae2, Zhibiao Zhao 1, Jing Wang 1, Peng Wu 1, Xiaozhang Gong 1, Jiaxin Shi 1, Hefei Zhu 1, Xueliang Du 11Baidu, Inc. 2Foundry Business, Samsung ElectronicsThe diversified AI applications SpeechRecognition, generation..VisionClassification, detection, Segmentation..NLPQnA, recommend..1The diversified AI scenarios Cloud Data CenterHPCSmart IndustrySmart City 2Design AI chip products from industry perspectives • Target at mainstream market • Try to explore market volume as much as possible • Need to support AI applications and scenarios as many as possible 3But, the challenge• Large variety of computing and memory accessing patterns – Up to thousand operators in mainstream frameworks – Mix of tensor, vector and scalar operations – With sequential and random memory access • Rapid change in algorithm and applications• Developers have high threshold to new hardware4Baidu Kunlun’s product vision • Large variety of computing and memory accessing patterns • Rapid change in algorithm and applications• The high threshold of developers to new hardware • Generic • Flexibility • Usability and programmability• High performance 5The history of Baidu Kunlun 2010Kickoff SDAProject 2014Hotchips 2014SDA2016Hotchips 2016SDA-II2017Hotchips 2017XPU2019Baidu Kunlun Tapeout2020Deployment 300Gflops1Tops2Tops4Tops256Tops•Move from FPGA to ASIC•Evolve from full customization to full programmability 6•SDA : software-define Accelerator•XPU: the X processor unit for diversified workloads•Baidu Kunlun: the name of Baidu first AI chip, Kunlun is the famous mountain in China The overview of Baidu Kunlun • Samsung Foundry 14nm , 2.5D PKG• 2 x HBM , 512GB/s• PCIE 4.0 x 8• 150W , 256Tops7The overview of Baidu Kunlun board ModelBaidu Kunlun K200Architecture XPUPrecision INT4/8FP32INT/FP16Computing capability INT8: 256TOPSINT/FP16: 64TOPSINT/FP32: 16TOPSHBM Memory Size16GB HBM Bandwidth512GB/sHost IFPCIE Gen4.0 * 8Processing 14nmThermal CoolingPassivePackage2.5DTDP150W8The overview of Baidu Kunlun architecture Xpu-clusterOn-chip Shared memoryHBM0HBM1PCIEGen4(x8)Compute unit0XPU-SDNN Xpu-clusterXpu-clusterXpu-clusterXPU-SDNN XPU-SDNN XPU-SDNN Xpu-clusterOn-chip Shared memoryXPU-SDNN Xpu-clusterXpu-clusterXpu-clusterXPU-SDNN XPU-SDNN XPU-SDNN HBMI/FHBMI/FCompute unit1Multi-port MCMany tiny coresCustomized logicDMADDR4DDR4DDR4•XPU v1, FPGA based : Hotchips 2017•Customized logic for tensor and vector•Tiny cores for scalar •XPU v2•With the same design methodology •More powerful than FPGA version 9SDNN - software-defined Neural Network engine The overview of Baidu Kunlun architecture Xpu-clusterOn-chip Shared memoryHBM0HBM1PCIEGen4(x8)Compute unit0XPU-SDNN Xpu-clusterXpu-clusterXpu-clusterXPU-SDNN XPU-SDNN XPU-SDNN Xpu-clusterOn-chip Shared memoryXPU-SDNN Xpu-clusterXpu-clusterXpu-cl

立即下载
信息科技
2020-11-28
百度
18页
1.56M
收藏
分享

[百度]:芯片顶会Hot Chips报告免费放送:一文了解英伟达A100(英文)-18页 ,点击即可下载。报告格式为PDF,大小1.56M,页数18页,欢迎下载。

本报告共18页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
本报告共18页,只提供前10页预览,清晰完整版报告请下载后查看,喜欢就下载吧!
立即下载
水滴研报所有报告均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
相关图表
图 3 可信 AI 的七个关键要求
信息科技
2020-11-28
来源:扩展 AI 的公认概念:从试验变为工程原则
查看原文
图 2
信息科技
2020-11-28
来源:扩展 AI 的公认概念:从试验变为工程原则
查看原文
从希望到现实
信息科技
2020-11-28
来源:扩展 AI 的公认概念:从试验变为工程原则
查看原文
2011-2023 年我国医疗信息化市场规模(单位:亿元) 图 39:支持血氧检测的 iWatch 6
信息科技
2020-11-27
来源:电子制造行业专题研究:未来三年谁会胜出?
查看原文
医院信息化行业发展阶段 图 37:医院 CIS 建设的实施程度不足 50%
信息科技
2020-11-27
来源:电子制造行业专题研究:未来三年谁会胜出?
查看原文
中国新能源车销量及预测(万辆) 图 34: 汽车电子占比(%)
信息科技
2020-11-27
来源:电子制造行业专题研究:未来三年谁会胜出?
查看原文
回顶部
报告群
公众号
小程序
在线客服
收起