How to Set Up Virtual Clusters
What is Hived Scheduler and How to Configure it
OpenPAI supports two kinds of schedulers: the Kubernetes scheduler, and hivedscheduler. Hivedscheduler is a Kubernetes Scheduler for Deep Learning. It supports virtual cluster division, topology-aware resource guarantee, and optimized gang scheduling, which are not supported in the k8s default scheduler. If you didn't specify enable_hived_scheduler: false
during installation, hived scheduler is enabled by default. Please notice the only hived scheduler supports virtual cluster setup, k8s default scheduler doesn't support it.
Set Up Virtual Clusters
Configuration for GPU Virtual Cluster
Before we start, please read this doc to learn how to write hived scheduler configuration.
In services-configuration.yaml
, there is a section for hived scheduler, for example:
# services-configuration.yaml
...
hivedscheduler:
config: |
physicalCluster:
skuTypes:
DT:
gpu: 1
cpu: 5
memory: 56334Mi
cellTypes:
DT-NODE:
childCellType: DT
childCellNumber: 4
isNodeLevel: true
DT-NODE-POOL:
childCellType: DT-NODE
childCellNumber: 3
physicalCells:
- cellType: DT-NODE-POOL
cellChildren:
- cellAddress: worker1
- cellAddress: worker2
- cellAddress: worker3
virtualClusters:
default:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 3
...
If you have followed the installation guide, you would find similar setting in your services-configuration.yaml
. The detailed explanation of these fields are in the hived scheduler document. You can update the configuration and set up virtual clusters. For example, in the above settings, we have 3 nodes, worker1
, worker2
, and worker3
. They are all in the default
virtual cluster. If we want to create two VCs, one is called default
and has 2 nodes, the other is called new
and has 1 node, we can first modify services-configuration.yaml
:
# services-configuration.yaml
...
hivedscheduler:
config: |
physicalCluster:
skuTypes:
DT:
gpu: 1
cpu: 5
memory: 56334Mi
cellTypes:
DT-NODE:
childCellType: DT
childCellNumber: 4
isNodeLevel: true
DT-NODE-POOL:
childCellType: DT-NODE
childCellNumber: 3
physicalCells:
- cellType: DT-NODE-POOL
cellChildren:
- cellAddress: worker1
- cellAddress: worker2
- cellAddress: worker3
virtualClusters:
default:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 2
new:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 1
...
Configuration for CPU-only Virtual Cluster
Currently we recommend you to set up a pure-CPU virtual cluster, and don't mix CPU nodes with GPU nodes in one virtual cluster. Please omit gpu
field or use gpu: 0
in skuTypes
for the VC. Here is an example:
hivedscheduler:
config: |
physicalCluster:
skuTypes:
DT:
gpu: 1
cpu: 5
memory: 56334Mi
CPU:
cpu: 1
memory: 10240Mi
cellTypes:
DT-NODE:
childCellType: DT
childCellNumber: 4
isNodeLevel: true
DT-NODE-POOL:
childCellType: DT-NODE
childCellNumber: 3
CPU-NODE:
childCellType: CPU
childCellNumber: 8
isNodeLevel: true
CPU-NODE-POOL:
childCellType: CPU-NODE
childCellNumber: 1
physicalCells:
- cellType: DT-NODE-POOL
cellChildren:
- cellAddress: worker1
- cellAddress: worker2
- cellAddress: worker3
- cellType: CPU-NODE-POOL
cellChildren:
- cellAddress: cpu-worker1
virtualClusters:
default:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 3
cpu:
virtualCells:
- cellType: CPU-NODE-POOL.CPU-NODE
cellNumber: 1
Explanation of the above example: Supposing we have a node named cpu-worker1
in Kubernetes. It has 80GB memory and 8 allocatable CPUs (please use kubectl describe node cpu-worker1
to confirm the allocatable resources). Then, in skuTypes
, we can set a CPU
sku, which has 1 CPU and 10240 MiB (80GiB / 8) memory. You can reserve some memory or CPUs if you want. CPU-NODE
and CPU-NODE-POOL
are set correspondingly in the cellTypes
. Finally, the setting will result in one default
VC and one cpu
VC. The cpu
VC contains one CPU node.
Apply Configuration in Cluster
After modification of the configuration, use the following commands to apply the settings:
./paictl.py service stop -n rest-server hivedscheduler
./paictl.py config push -p <config-folder> -m service
./paictl.py service start -n hivedscheduler rest-server
You can now test these new VCs, with any admin accounts in OpenPAI. Next section will introduce how to grant VC access to non-admin users.
How to Grant VC to Users
Admin users have access to all VCs. Thus, if you set up a new VC, you can use an admin account to test it. For non-admin users, administrators should grant VC access to them manually. The specific way depends on authentication mode of your cluster.
In Basic Authentication Mode
In basic authentication mode, administrators can grant VC access to users on the User Management
page. First, click Edit
on the page. Then you can configure VC access permission for each user as follows.
Please note you cannot revoke default
VC from users.
In AAD Mode
Users are grouped by the AAD service in AAD mode. You should assign VC access to each group.
First, find the following section in your services-configuration.yaml
:
# services-configuration.yaml
...
group-manager:
...
grouplist:
- groupname: group1
externalName: sg1
extension:
acls:
admin: false
virtualClusters: ["vc1"]
storageConfigs: ["azure-file-storage"]
- groupname: group2
externalName: sg2
extension:
acls:
admin: false
virtualClusters: ["vc1", "vc2"]
storageConfigs: ["nfs-storage"]
...
This should be self-explanatory. The virtualClusters
field is used to manage VC access for different groups. Use the following commands to apply your configuration change:
./paictl.py service stop -n rest-server
./paictl.py config push -p <config-folder> -m service
./paictl.py service start -n rest-server
Different Hardware in Worker Nodes
We recommend one VC should have the same hardware, which leads to one skuType
of one VC in the hived scheduler setting. If you have different types of worker nodes (e.g. different GPU types on different nodes), please configure them in different VCs. Here is an example of 2 kinds of nodes:
hivedscheduler:
config: |
physicalCluster:
skuTypes:
K80:
gpu: 1
cpu: 5
memory: 56334Mi
V100:
gpu: 1
cpu: 8
memory: 80000Mi
cellTypes:
K80-NODE:
childCellType: K80
childCellNumber: 4
isNodeLevel: true
K80-NODE-POOL:
childCellType: K80-NODE
childCellNumber: 2
V100-NODE:
childCellType: V100
childCellNumber: 4
isNodeLevel: true
V100-NODE-POOL:
childCellType: V100-NODE
childCellNumber: 3
physicalCells:
- cellType: K80-NODE-POOL
cellChildren:
- cellAddress: k80-worker1
- cellAddress: k80-worker2
- cellType: V100-NODE-POOL
cellChildren:
- cellAddress: v100-worker1
- cellAddress: v100-worker2
- cellAddress: v100-worker3
virtualClusters:
default:
virtualCells:
- cellType: K80-NODE-POOL.K80-NODE
cellNumber: 2
V100:
virtualCells:
- cellType: V100-NODE-POOL.V100-NODE
cellNumber: 3
In the above example, we set up 2 VCs: default
and v100
. The default
VC has 2 K80 nodes, and V100
VC has 3 V100 nodes. Every K80 node has 4 K80 GPUs and every V100 node has 4 V100 GPUs.
Configure CPU and GPU SKU on the Same Node
If you want to configure both CPU and GPU sku types on the same node, you could use the same cellAddress
for different cellTypes
, here is an example.
hivedscheduler:
config: |
physicalCluster:
skuTypes:
GPU:
gpu: 1
cpu: 4
memory: 40960Mi
CPU:
gpu: 0
cpu: 1
memory: 10240Mi
cellTypes:
GPU-NODE:
childCellType: GPU
childCellNumber: 4
isNodeLevel: true
GPU-NODE-POOL:
childCellType: GPU-NODE
childCellNumber: 2
CPU-NODE:
childCellType: CPU
childCellNumber: 12
isNodeLevel: true
CPU-NODE-POOL:
childCellType: CPU-NODE
childCellNumber: 2
physicalCells:
- cellType: GPU-NODE-POOL
cellChildren:
- cellAddress: node1
- cellAddress: node2
- cellType: CPU-NODE-POOL
cellChildren:
- cellAddress: node1
- cellAddress: node2
virtualClusters:
default:
virtualCells:
- cellType: GPU-NODE-POOL.GPU-NODE
cellNumber: 2
cpu:
virtualCells:
- cellType: CPU-NODE-POOL.CPU-NODE
cellNumber: 2
Currently, we only support mixing CPU and GPU types on one NVIDIA GPU node or one AMD GPU node, rare cases including NVIDIA cards and AMD cards on one node are not supported.
Use Pinned Cell to Reserve Certain Node in a Virtual Cluster
In some cases, you might want to reserve a certain node in a virtual cluster, and submit jobs to this node explicitly for debugging or quick testing. OpenPAI provides you with a way to "pin" a node to a virtual cluster.
For example, assuming you have three worker nodes: worker1
, worker2
, and worker3
, and 2 virtual clusters: default
and new
. The default
VC has 2 workers, and the new
VC only has one worker. The following is an example of the configuration:
# services-configuration.yaml
...
hivedscheduler:
config: |
physicalCluster:
skuTypes:
DT:
gpu: 1
cpu: 5
memory: 56334Mi
cellTypes:
DT-NODE:
childCellType: DT
childCellNumber: 4
isNodeLevel: true
DT-NODE-POOL:
childCellType: DT-NODE
childCellNumber: 3
physicalCells:
- cellType: DT-NODE-POOL
cellChildren:
- cellAddress: worker1
- cellAddress: worker2
- cellAddress: worker3
virtualClusters:
default:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 2
new:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 1
...
Now, if you want to "pin" the node worker2
to the default VC. You can edit the configuration to be:
# services-configuration.yaml
...
hivedscheduler:
config: |
physicalCluster:
skuTypes:
DT:
gpu: 1
cpu: 5
memory: 56334Mi
cellTypes:
DT-NODE:
childCellType: DT
childCellNumber: 4
isNodeLevel: true
DT-NODE-POOL:
childCellType: DT-NODE
childCellNumber: 3
physicalCells:
- cellType: DT-NODE-POOL
cellChildren:
- cellAddress: worker1
- cellAddress: worker2
pinnedCellId: node-worker2
- cellAddress: worker3
virtualClusters:
default:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 1
pinnedCells:
- pinnedCellId: node-worker2
new:
virtualCells:
- cellType: DT-NODE-POOL.DT-NODE
cellNumber: 1
...
As you can see in the configuration, one virtual cluster can contain both virtual cells and pinned cells. Now the default
VC has one virtual cell and one pinned cell. To include a pinned cell into the virtual cluster, you should give it a pinnedCellId
.
The configuration will reserve worker2
in the default
VC. You can also submit jobs to worker2
explicitly by specifying corresponding pinned cell id in extras.hivedScheduler.taskRoles.<task-role-name>.pinnedCellId
, here is an example:
protocolVersion: 2
name: job-with-pinned-cell
type: job
jobRetryCount: 0
prerequisites:
- type: dockerimage
uri: 'openpai/standard:python_3.6-pytorch_1.2.0-gpu'
name: docker_image_0
taskRoles:
myworker:
instances: 1
completion:
minFailedInstances: 1
taskRetryCount: 0
dockerImage: docker_image_0
resourcePerInstance:
gpu: 1
cpu: 4
memoryMB: 8192
commands:
- sleep 100s
defaults:
virtualCluster: default
extras:
com.microsoft.pai.runtimeplugin:
- plugin: ssh
parameters:
jobssh: true
hivedScheduler:
taskRoles:
myworker:
pinnedCellId: node-worker2