diff --git "a/_wiki/\351\233\206\347\276\244\344\275\277\347\224\250/cluster_usage.md" "b/_wiki/\351\233\206\347\276\244\344\275\277\347\224\250/cluster_usage.md" index 9c75b2d1..8a6fde7c 100755 --- "a/_wiki/\351\233\206\347\276\244\344\275\277\347\224\250/cluster_usage.md" +++ "b/_wiki/\351\233\206\347\276\244\344\275\277\347\224\250/cluster_usage.md" @@ -1,10 +1,28 @@ --- title: 计算集群使用说明 +authors: Yunpei Liu, Yongbin Zhuang priority: 1.01 --- # 计算集群使用说明 +## 集群的基本概念 +### CPU/Core(核)的概念 +`CPU` 是 Central Processing Unit 的缩写。比起全称,他的缩写更为大家所熟知。我们买电脑时都会看这个电脑拥有几个 CPU。CPU可以计算数字或者执行你的代码等等。在科学计算中,我们倾向使用核(`Core`)这个名称。本质上这知识一个别称而已。 +### Memory(内存)的概念 + +`内存`(Memory)就是储存数据的地方。跟硬盘(disk)储存的数据不同,`内存`里的数据可以直接被 `核` 读取。跟你在硬盘里储存的数据类似,只是它被`核`读取的速度更快。当执行程序时,有一些数据会先被读入`内存`,然后再执行计算。因此内存越大,被读入的数据也就越多,能够同时处理的数据也就越多,代码运行的时间会更短。 + +### Node(节点)的概念 +`节点`(Node)换个日常的说法就是你的`电脑`,比如一台台式机或者笔记本电脑。它由若干个`核`和一个`内存`组成。因此可以把`节点`简单理解成日常见到的电脑(主机)。 + +### HPC(集群/超级计算机/超算)的概念 +`HPC`就是High Performance Cluster的缩写,又称为超级计算机,高性能集群等。它由若干个`节点`组成。实际使用中,这些`节点`会有不同的角色,通常包含`登录节点`,`管理节点`和`计算节点`等。`登录节点`顾名思义就是用来登录的`节点`。用户从自己电脑可以登录到`登录节点`。`计算节点`是用来计算的节点,他们的唯一使命就是计算。`管理节点`比较特殊,用来管理`计算节点`,比如分配某某计算任务给某几个`计算节点`来算。 + +### Message Passing Interface(MPI)并行计算的概念 +`并行计算`是若干个`节点`一起执行计算的意思。从`节点`的概念可以知道,一个`节点`的`内存`和`核`肯定是有限。比如,现有一个`节点`有24个`核`和32GB的`内存`,我们想执行一个计算,用到48个`核`,自然需要用到两个`节点`。问题是另一个`节点`的24个`核`如何读取到第一个`节点`的`内存`里的数据?这一个时候就要用到`MPI`/`并行计算`了。`MPI`是信息传输界面的简称。是一种告诉`节点`怎么跨节点读取`内存`的代码。也就是说这是计算机代码的一部分,我们常用的计算软件`vasp`或`cp2k`都已经写入了,所以只要直接使用便可以。 + +## 组内集群知识 本课题组使用 Zeus 计算集群提交计算任务进行计算模拟。Zeus 集群由两个登陆节点、一个管理节点、三个计算集群构成,每个计算集群包含多个计算节点(含一个 GPU 节点和一个大内存胖节点)。 另暂时设有 Metal GPU 集群,提供 40 张 GPU 加速卡供用户提交深度学习等任务(未来将择机整合到 Zeus 集群)。 diff --git "a/_wiki/\351\233\206\347\276\244\344\275\277\347\224\250/hpc.md" "b/_wiki/\351\233\206\347\276\244\344\275\277\347\224\250/hpc.md" deleted file mode 100644 index 475c8c22..00000000 --- "a/_wiki/\351\233\206\347\276\244\344\275\277\347\224\250/hpc.md" +++ /dev/null @@ -1,36 +0,0 @@ ---- -title: High Performance Cluster(HPC) Topics -authors: Yongbin Zhuang - ---- - - - - - -# High Performance Cluster(HPC) Topics - -## Basic Concepts - -### CPU/Core - -CPU is the acronym of Central Processing Unit. You might have been familiar with this terminology. This object is used to compute numbers, excute your codes and so on. Actually, we prefer the word `core` when we gather a number of CPUs together to form a High Performance Cluster(HPC). `Core` is just a alias for CPU in HPC. - -### Memory/内存 - -The reason I present the chinese word here is to make a connection between what you have learn before and the terminology here, like `Memory`. Memory is the places where data is stored. It is nothing more than your `hard disk` except the speed of message communicating with `core`. The `core` will fetch data directly from `memory` when it works/executes some process. Therefore, if you can put everything in your memory if it is big enough, you will have less time you execute your programs. - -### Node/节点 - -This terminology might be redundant if you hear for the first time. It should be enough to count the number of `cores` in HPC, like we have HPC with 1024 `cores`. However, not all of your `cores` can fetch data from same `memory`. Memory has limited space, and cannot connect directly with inifinite `cores`. Therefore, we have limited number of `cores` to share with one `memory`, which constitute the thing we call `node`. For instance, in `51cluster`(one of HPC in our group), you can type command `bhosts` to display how many nodes we have. In the screen, you will see 32 nodes with 1 login node. For every `node`, it has *24* `cores` with *64* GB `memory` . Check this for `52cluster`, for your practice. - -### Message Passing Interface(MPI) - -Now, we have another question. If `cores` cannot fetch data from one memory, can they fetch data between each `node`? The answer is *YES*. There has a technics or program called Message Passing Interface(`MPI`). `MPI` is a process run on your `node` which is same as your normal `code`. What the job of `MPI` is exchange message with different `nodes`, of course, it's slower than from direct`memory` . Take a example, we have computed a value `A` in `node1`. By using `MPI`, the `node2` can get the value `A` from `node1`. What is the usage? When you are doing a parallel computation, you don't have to put the data from `node1` back to your `hard disk` then transfer to your `node2` which is definitely slower. - - - - - - -