「ディスアグリゲーテッドコンピューティング」とは何か? (5)

はじめに
CDI Info の RAG 参照実験(1)
CDI Info の RAG 参照実験(2)

執筆者：佐藤友昭

※ 「ディスアグリゲーテッドコンピューティングとは何か?」連載記事一覧はこちら

はじめに

この度、「ディスアグリゲーテッドコンピューティング」関連のYouTube講演を時系列に紹介するgithubプロジェクト（GitHub - vaj/CDI-Info) を作成した。目的はディスアグリゲーテッドコンピューティング（以降、CDIと表記する）に関する情報の収集と分析、CDIがもたらしつつある新しいソフトウェア開発パラダイムの啓蒙、そして具体的なソフトウェア開発の機会を喚起することである。

今回は、RAG(Retrieval Augmented Generation)技術で上述のCDI Infoプロジェクトの情報をローカルLLM（llama2:13b）の知識として与え、2024年6月時点（講演数：254）でのCDIに関する一連の質問をぶつけることでベンダーニュートラルな視点から「ディスアグリゲーテッドコンピューティング」とは何か問うてみたい。

CDI Info の RAG 参照実験(1)

まず、前回の記事で紹介した登場頻度トップ10の略語について聞いてみよう。

Term/Acronym	Definition	講演数
CXL	Compute Express Link	145
CPU	Central Processing Unit	120
PCIe	Peripheral Component Interconnect Express	98
AI	Artificial Intelligence	86
DRAM	Dynamic Random Access Memory	82
PCI	Peripheral Component Interconnect	72
GPU	Graphics Processing Unit	67
OCP	Open Compute Project	61
DDR	Double Data Rate	51
NUMA	Non-Uniform Memory Access	50

CXL(Compute Express Link)とは何か。

What is CXL (Compute Express Link)?

Based on the given context, here is the answer to the question "What is CXL (Compute Express Link)?"

CXL (Compute Express Link) is a new industry standard for interconnects that provides high bandwidth and low latency coherent connectivity between devices such as PCIe, GPGPU, SmartNIC, FPGA, computational storage, and other smart devices. It is designed to address the challenges in the evolving compute landscape by making both heterogeneous computing and different types of memory efficient. CXL is built on top of PCI infrastructure and leverages PCIe completely, overlaying caching and memory protocols on top of existing PCI Express protocols. The latest revision of the specification is 3.1.

CPU(Central Processing Unit)とは何か。

What is a CPU (Central Processing Unit)?

Based on the context provided, I cannot answer the question "What is a CPU (Central Processing Unit)?" as it is not mentioned in the given pieces of context.

PCIe(Peripheral Component Interconnect Express)とは何か。

What is PCIe (Peripheral Component Interconnect Express)?

Based on the provided context, PCIe (Peripheral Component Interconnect Express) is a high-speed serial communication interface standard introduced in 2003 to enable fast and efficient data transfer between the CPU and its peripheral components. It has become the primary motherboard expansion bus standard and a popular communication method for many other onboard applications. PCIe is used for resource pooling in data centers, solving the memory bandwidth challenge as the number of DDR channels becomes increasingly expensive.

AI(Artificial Intelligence)とは何か。

What is AI (Artificial Intelligence)?

Based on the context provided, I can answer the question about AI as follows:

AI stands for Artificial Intelligence, which refers to the ability of machines or computers to perform tasks that typically require human intelligence, such as learning, problem-solving, and decision-making. AI systems use algorithms and data to make predictions, classify objects, and generate new information.

The context also mentions that AI is enabled by hardware, particularly GPUs (Graphics Processing Units), and that there are three ingredients that make AI work: deep neural networks, convolutional neural networks, and training with backpropagation. These algorithms have been around since the 1980s, but recent advances in hardware and data availability have made it possible to apply them to a wide range of tasks and industries.

Overall, AI is a rapidly evolving field that has the potential to transform many aspects of our lives, from healthcare and transportation to education and entertainment.

DRAM(Dynamic Random Access Memory)とは何か。

What is DRAM (Dynamic Random Access Memory)?

DRAM (Dynamic Random Access Memory) is a type of computer memory that is widely used in modern computers. It is called "dynamic" because it needs to be constantly refreshed with power to maintain its contents, unlike static RAM which can retain its contents without power. DRAM stores data in a capacitor within each memory cell, and the capacitor is either fully charged or fully discharged, representing a 1 or a 0 bit.

The context you provided suggests that DRAM is being compared to other types of memory, such as storage-class memory, which are approaching DRAM-like latency and bandwidth but with non-volatile characteristics. This implies that DRAM is a performance-oriented memory solution that is not cost-oriented, and it exists today.

GPU(Graphics Processing Unit)とは何か。

What is a GPU (Graphics Processing Unit)?

Based on the context provided, a GPU (Graphics Processing Unit) can be thought of in two ways:

As a CXL type 2 accelerator with device-attached memory that participates in coherency flows with the host using backward invalidate flows or bias-based coherency.

As part of a CXL fabric, communicating with other peer GPUs, the network interface card, and most likely at some level of granularity, such as our Grand Teton GPU architecture.

However, it's important to note that not everyone has the same scale or scope as companies like Meta and Google, and there may be interest in building higher performance fabric architectures around GPUs for specific applications like video processing.

OCP(Open Compute Project)とは何か。

What is OCP (Open Compute Project)?

Based on the given context, Open Compute Project (OCP) appears to be an initiative that focuses on developing and integrating technologies such as CXL, PCIe, JEDEC, and NVMe into systems. The project provides a community for realizing these technologies and pushing them to production. Additionally, there is a focus on near-memory compute and keeping an eye on the academy's work to provide feedback and understand their efforts.

In summary, OCP is an open hardware initiative that aims to bring together technology innovators, manufacturers, and users to develop and deploy efficient computing solutions for data center infrastructure.

DDR(Double Data Rate)とは何か。

What is DDR (Double Data Rate)?

DDR (Double Data Rate) is a type of memory technology used in computer systems. It is called "double data rate" because it transfers data on both the rising and falling edges of the clock signal, effectively doubling the amount of data that can be transferred in a given time period. This results in higher bandwidth and faster performance compared to other types of memory technologies.

In the context of the given pieces of text, DDR5 is mentioned as a type of memory technology that has undergone changes to improve its performance. These changes include twice as many banks, bank groups, and a higher bus rate, which together result in a significant improvement in system bandwidth.

NUMA(Non-Uniform Memory Access)とは何か。

What is NUMA (Non-Uniform Memory Access)?

NUMA (Non-Uniform Memory Access) is a multiprocessor model in which each processor is connected to a dedicated memory, but can also access the memory that is connected to other processors in the system. This allows for the expansion of memory capacity and improved performance of applications running across NUMA domains. The Linux community is working on making changes and improvements for improving the performance of applications running across NUMA domains.

CPUが何かの問いに答えないあたりでRAGが動作している感じがする。GPUが何かの問いに CXL type 2 accelerator とか Grand Teton GPU architecture といったワードが登場した。今回使用したRAG環境のレシピについては別の機会に（CDIのユースケースに絡めて）紹介する予定である。各自、気になる事柄については実際のRAG環境で深掘りされたい。　　

7月18日に公開しました。 valinux.hatenablog.com

CDI Info の RAG 参照実験(2)

では、そろそろ本連載のテーマについて聞いてみよう。

Q1：ディスアグリゲーテッドコンピューティングとは何か。

What is 'disaggregated computing'?　

Based on the context provided, "disaggregated computing" refers to the ability to compose and manage computing resources at a finer grain level than traditional systems, allowing for more flexibility and customization in building and deploying distributed systems. This includes the ability to purchase disaggregated technologies or network technologies from multiple vendors and manage them through a single API or management platform. The concept of composable disaggregated infrastructure is similar to the idea of creating storage pools and volumes in traditional systems, but applied to a wider range of resources and with more flexibility and granularity.

Q2：ディスアグリゲーテッドコンピューティングとCDIの関係

What is the relationship between disaggregated computing and CDI?

Based on the context provided, there is a close relationship between disaggregated computing and Composable Disaggregated Infrastructures (CDI). Disaggregated computing refers to the practice of separating computer resources, such as processors, memory, and storage, from the traditional monolithic system architecture. CDI takes this concept further by physically disaggregating these resources over high-speed fabrics and dynamically composing them as needed into a computer system. In other words, CDI is a form of disaggregated computing that allows for more efficient and flexible use of computing resources.

Q3：CDIが求めらる背景となった現在のITインフラが抱えている問題点

List the problems facing the current IT infrastructure that have led to the need for CDI.

Based on the context provided, the following are some of the problems facing the current IT infrastructure that have led to the need for Cloud-based Disaggregated Infrastructure (CDI):

Over-provisioning of processors, memory, and storage: CDI helps reduce the need for over-provisioning by providing resources on demand.

Reduction of electricity costs: CDI reduces the number of idling devices, which leads to lower electricity costs.

Limited scalability: Traditional IT infrastructure can be limited in terms of scalability, while CDI allows for the creation of custom-made servers on demand.

Inefficient use of resources: CDI helps optimize the use of resources by providing the right resources to the right applications at the right times.

High total cost of ownership (TCO): CDI can help reduce the TCO by minimizing unused resources and reducing electricity costs.

Limited flexibility: Traditional IT infrastructure can be inflexible, while CDI allows for the creation of server clusters and custom-made servers on demand.

Q4：メモリーの壁の問題の原因と解決策

What are the causes and solutions to memory wall problems?

Based on the given context, here are the causes and potential solutions to memory wall problems:

Causes:

Increasing demand for memory-intensive applications: As applications become more sophisticated and data-hungry, they require more memory, leading to memory wall challenges.

Limited bandwidth and latency: The amount of memory bandwidth per core has not increased over the last ten years, and even decreased, causing a bottleneck between compute and memory.

Data locality: Applications require data to be stored near the processing unit to minimize memory access latency, which can lead to memory wall issues.

Potential Solutions:

New memory technologies: Emerging memory technologies like 3D XPoint, ReRAM, and others can potentially overcome memory wall challenges by providing higher bandwidth, lower latency, and more local storage for data.

Data-centric computing: Instead of moving data back and forth between compute and memory, putting compute where the data resides can help reduce the memory wall problem.

Memory pooling: Sharing memory on a temporal basis between CPUs or executing specific applications on a memory pool can help alleviate memory wall issues by providing more memory resources.

Latency optimization: Minimizing latency between compute and memory can help improve memory access and reduce the impact of the memory wall challenge.

Bandwidth optimization: Increasing bandwidth between compute and memory can also help mitigate memory wall challenges by providing more data transfer rates.

Q5：CDIを構成する主な要素技術

List the main elemental technologies of CDI.

Based on the provided context, the main elemental technologies of Composable Disaggregated Infrastructure (CDI) are:

Resource pool: A pool of resources such as CPUs, GPUs, memory, and storage that can be composed and reconfigured as needed.

PCIe or CXL or photonics switches: These switches allow for the connection of components in the resource pool to create custom-made servers.

CDI management software: This software controls the switches in the resource pool based on user demand to create composed bare metal.

Software definition: CDI enables the composition of custom-made servers through software definition, allowing for flexibility and adaptability in resource utilization.

Disaggregation: CDI disaggregates existing servers into separate components in the resource pool, enabling more efficient use of resources and reducing waste.

Composability: CDI enables the composition of bare metal resources to create custom-made servers that can be scaled up or down as needed.

キリがないので今回はこの辺りで終わるが、最後にディスアグリゲーテッドコンピューティング(CDI)を実現するためのソフトウェアの課題は何か聞いてみよう。

Q6；ディスアグリゲーテッドコンピューティング(CDI)を実現するためのソフトウェアの課題は何か。

List the software challenges to achieving disaggregated computing or CDI.

Based on the provided context, here are the software challenges to achieving disaggregated computing or CDI:

Dynamic composition and allocation of resources: Developing software that can dynamically compose and allocate resources from a pool of disaggregated devices is a significant challenge.

Resource management: Managing the allocation and deallocation of resources in real-time, ensuring that the right resources are allocated to the right applications at the right time, is a complex task.

Interoperability: Ensuring that different types of devices from different vendors can work together seamlessly within a disaggregated infrastructure is a software challenge.

Security: Ensuring the security of a disaggregated infrastructure and protecting against unauthorized access or data breaches is a critical software challenge.

Scalability: Developing software that can scale to accommodate growing demands and increasing amounts of data is essential for a successful CDI implementation.

Flexibility: Providing the flexibility to compose different types of devices together and allocate resources as needed is a software challenge that requires careful consideration of hardware and software compatibility.

これがRAGというやつか。物忘れに効く便利なツールが増えて嬉しい限りである。