type
status
date
slug
summary
tags
category
icon
password
Falcon 4xxx 基本介绍&优势
某比赛看到浙大使用的GPU chassis,回去搜罗了下资料,个人更愿意统称为pcie交换机,下面就以此称呼
这里以falcon 4210,其余同代产品基本为降配或存储型版本,核心内容一致
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F93cb017c-55b6-46e4-a878-b4857a24ac79%2FUntitled.png?table=block&id=14d517d2-c2ef-4542-b455-6a446731f396&t=14d517d2-c2ef-4542-b455-6a446731f396&width=676&cache=v2)
基本信息表
可以看到配备了BMC, 厂商做的BMC page看起来也相当不错,可以进行很多配置
交换芯片使用了博通的PEX 88096, 为PCIe4.0版本,带有5条 pcie4.0 x16通道(48 DMA channel, one associated with PCIe x2 port), 因此单组为4GPU+1NIC的配置
在供电上,75W的PCIe槽供电+375W的 8pin供电, 虽然上限为450W,但不建议通过PCIe槽供电,过去听说过AMD显卡从PCIe槽取电导致部分规格不高的主板损坏的案例
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Fdbc64773-554d-4292-82a1-2265c50f4545%2FUntitled.png?table=block&id=2e957e20-d19f-4b00-b970-d404a6874fa4&t=2e957e20-d19f-4b00-b970-d404a6874fa4&width=846&cache=v2)
Host端应该为自行开发的网卡,单卡使用4条SFF-8644 连接线,每条用以支持PCIe x4通道, 使用较低规格的PEX 80032 PCIe交换芯片
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Fc6ce6486-58ad-4500-8cf6-fbeadf3a6670%2FUntitled.png?table=block&id=0ee80b14-6362-4da8-8381-30eac0202921&t=0ee80b14-6362-4da8-8381-30eac0202921&width=862&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F5b3371ee-c137-40e8-a3ca-e58e7425a64d%2FUntitled.png?table=block&id=2d998c29-564e-49fe-83b6-51afca28ed8c&t=2d998c29-564e-49fe-83b6-51afca28ed8c&width=725&cache=v2)
拓补结构
总的来说,鉴于可以实现2host x16速率,每一组应该是使用了博通官网所示 fig3 topo结构
两个半高Host接口位于中部的两层,两组间似乎只能通过CPU互联
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F00747290-8817-4fc2-a78e-6abd4547af9f%2FUntitled.png?table=block&id=1662d282-172e-4f97-aa92-b04469c60c92&t=1662d282-172e-4f97-aa92-b04469c60c92&width=62&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F3abb5d52-9738-4fd5-8482-4e3cc43130d8%2FUntitled.png?table=block&id=6e15d2c8-9249-4c0d-867d-3ad26d4195d8&t=6e15d2c8-9249-4c0d-867d-3ad26d4195d8&width=847&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Fced87e25-55eb-4064-a71e-0453ce059756%2FUntitled.png?table=block&id=dff2c7aa-f499-4afb-9783-9afa754c05c8&t=dff2c7aa-f499-4afb-9783-9afa754c05c8&width=1175&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F15cf6a7c-f6d1-4db3-a1e7-2e77f088924d%2FUntitled.png?table=block&id=e2ce4ba1-4f9b-43a2-8fce-b19da0c8e3ea&t=e2ce4ba1-4f9b-43a2-8fce-b19da0c8e3ea&width=690&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Fa74fd6b2-c99e-48ee-a322-5cbadc2479a4%2FUntitled.png?table=block&id=298f9a2a-1f5a-4bf0-8e2e-f90455a49cf0&t=298f9a2a-1f5a-4bf0-8e2e-f90455a49cf0&width=722&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Fb1290388-f611-4a4e-a435-ee0901862f17%2FUntitled.png?table=block&id=eb03eba3-0044-428a-9f2f-9cea6ec73bfd&t=eb03eba3-0044-428a-9f2f-9cea6ec73bfd&width=698&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Faee50940-7f0d-4a8f-bb15-a4ed4c142d0b%2FUntitled.png?table=block&id=e3bde099-2064-4c50-9f1e-f8a8b474d1b5&t=e3bde099-2064-4c50-9f1e-f8a8b474d1b5&width=716&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F3889cbe3-08fb-408b-a4e9-d3df3a90b8f7%2FUntitled.png?table=block&id=863e5d17-5123-48da-bb67-ac8bdb99ba78&t=863e5d17-5123-48da-bb67-ac8bdb99ba78&width=882&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Fee8480a6-c1c9-4feb-9ccd-ff9248fefb69%2FUntitled.png?table=block&id=3eb3a683-2242-456d-b26a-a437f7226e24&t=3eb3a683-2242-456d-b26a-a437f7226e24&width=887&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F7b1b1f51-8ac5-48cd-8df2-544ca207ee54%2FUntitled.png?table=block&id=74033130-472b-4e92-b3be-ea0c4cf35a69&t=74033130-472b-4e92-b3be-ea0c4cf35a69&width=878&cache=v2)
兼容性
主要为最大内存大小 以及 PCIe bus编号数量上限
受到BIOS和CPU影响
实测性能
官网文档进行了测试,有详细测试结果,但不够完整
2xA100 within one Host
两块A100 在同一个switch下, 通信较快
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F81343798-19fb-4241-9881-2068ba8afab6%2FUntitled.png?table=block&id=c088b478-a293-45a7-b687-99ca2535625c&t=c088b478-a293-45a7-b687-99ca2535625c&width=682&cache=v2)
这里没有看到多块卡同时进行HtoD\DtoH的性能是是否平衡,这会影响到AI训练的多卡瓶颈
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F7247883e-589f-4266-b8bc-7c169ed94c2c%2FUntitled.png?table=block&id=50c83490-b154-4544-b14a-ddf559fa0c65&t=50c83490-b154-4544-b14a-ddf559fa0c65&width=670&cache=v2)
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F49675199-678a-4953-bf05-0544a0dadc86%2FUntitled.png?table=block&id=b08a7a0f-74f4-4af5-b412-603bd473edcd&t=b08a7a0f-74f4-4af5-b412-603bd473edcd&width=819&cache=v2)
劣势&改进(仅falcon 4000 个人看法)
- 巨大噪声,等同于GPU服务器
- 8GPU版本为2组4GPU, 组间通信需要通过CPU, 需要一定程序优化
额外的
Fun Play
若连接infiniband NIC,似乎可采用 (2GPU 2NIC)x2的组合,实现GPU间真正的点对点通信,就是太贵了,比较奇葩,跨chassie 连接尚可
CXL Memory Pooling Solution
CXL2.0 基于pcie5.0, 理论带宽上限为64GB/s
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F06f49519-6571-4738-9132-ce32ae2bd9fc%2FUntitled.png?table=block&id=3975c8e3-e65b-4f5d-9234-e3a530f2020a&t=3975c8e3-e65b-4f5d-9234-e3a530f2020a&width=785&cache=v2)
memory chassis 内存拓展柜
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2Ff0229a24-9fc1-43a7-9627-de29eae6b62b%2FUntitled.png?table=block&id=b2eff70d-8d67-432a-8563-013030c8c128&t=b2eff70d-8d67-432a-8563-013030c8c128&width=856&cache=v2)
完成度已经较高
![notion image](https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fe11fbe68-040d-41a6-8b37-0405a354bcd8%2F3aa87b2f-c2ba-420a-8753-9e197e452942%2FUntitled.png?table=block&id=213718eb-6690-49ba-9954-3555932544c5&t=213718eb-6690-49ba-9954-3555932544c5&width=1438&cache=v2)
References
- Author:NotionNext
- URL:https://tangly1024.com/article/Falcon%204000%20%E8%AF%A6%E8%A7%A3
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts