How to Set Up Storage

This document describes how to use Kubernetes Persistent Volumes (PV) as storage on PAI. To set up existing storage (NFS, Samba, Azure blob, etc.), you need:

  1. Create PV and PVC as PAI storage on Kubernetes.
  2. Confirm the worker nodes have the proper package to mount the PVC. For example, the NFS PVC requires package nfs-common to work on Ubuntu.
  3. Assign PVC to specific user groups.

Users could mount those PV/PVC into their jobs after you set up the storage properly. The name of PVC is used to onboard on PAI.

Create PV/PVC on Kubernetes

There're many approaches to create PV/PVC, you could refer to Kubernetes docs if you are not familiar yet. The followings are some commonly used PV/PVC examples.

NFS

# NFS Persistent Volume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-storage-pv
  labels:
    name: nfs-storage
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  mountOptions:
    - nfsvers=4.1
  nfs:
    path: /data
    server: 10.0.0.1
---
# NFS Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-storage
# labels:
#   share: "false"      # to mount sub path on PAI
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi    # no more than PV capacity
  selector:
    matchLabels:
      name: nfs-storage # corresponding to PV label

Save the above file as nfs-storage.yaml and run kubectl apply -f nfs-storage.yaml to create a PV named nfs-storage-pv and a PVC named nfs-storage for nfs server nfs://10.0.0.1:/data. The PVC will be bound to specific PV through label selector, using label name: nfs-storage.

Users could use the PVC name nfs-storage as the storage name to mount this NFS storage in their jobs.

If you want to configure the above NFS as personal storage so that each user could only visit their directory on PAI like Linux home directory, for example, Alice can only mount /data/Alice while Bob can only mount /data/Bob, you could add a share: "false" label to PVC. In this case, PAI will use ${PAI_USER_NAME} as the subpath when mounting to job containers.

Samba

Please refer to this document to install cifs/smb FlexVolume driver and create PV/PVC for Samba.

Azure Blob

Please refer to this document to install blobfuse FlexVolume driver and create PV/PVC for Azure Blob.

Tips

If you cannot mount blobfuse PVC into containers and the corresponding job in OpenPAI sticks in WAITING status, please double-check the following requirements:

requirement 1. Every worker node should have blobfuse installed. Try the following commands to ensure:

# change 16.04 to a different release if your system is not Ubuntu 16.04
wget https://packages.microsoft.com/config/ubuntu/16.04/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo apt-get update
sudo apt-get install --assume-yes blobfuse fuse

requirement 2. blobfuse FlexVolume driver has been installed:

curl -s https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/deployment/blobfuse-flexvol-installer-1.9.yaml \
  | sed "s#path: /etc/kubernetes/volumeplugins/#path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/#g" \
  | kubectl apply -f -

NOTE: There is a known issue #4637 to mount the same PV multiple times on the same node, please either: * use the patched blobfuse flexvolume installer instead. * use the earlier version 1.1.1 instead.

Azure File

First, create a Kubernetes secret to access the Azure file share.

kubectl create secret generic azure-secret --from-literal=azurestorageaccountname=$AKS_PERS_STORAGE_ACCOUNT_NAME --from-literal=azurestorageaccountkey=$STORAGE_KEY

Then create PV/PVC for the file azure.

# Azure File Persistent Volume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: azure-file-storage-pv
  labels:
    name: azure-file-storage
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile
  azureFile:
    secretName: azure-secret
    shareName: aksshare
    readOnly: false
  mountOptions:
    - dir_mode=0777
    - file_mode=0777
    - uid=1000
    - gid=1000
    - mfsymlinks
    - nobrl
---
# Azure File Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-file-storage
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile
  resources:
    requests:
      storage: 5Gi
  selector:
    matchLabels:
      name: azure-file-storage

More details on Azure File volume could be found in this document.

Read-only Storage

If not specified, storage in OpenPAI can be read/written to by users. If you want the storage to be read-only, please set the corresponding PV's attribute PersistentVolume.Spec.<PersistentVolumeSource>.ReadOnly to be true.

For example, you can set a read-only NFS PV by specifying the spec.nfs.readOnly field in its definition:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-storage-pv
  labels:
    name: nfs-storage
spec:
  ......
  nfs:
    readOnly: true
    .......

Here is another example for AzureBlob:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: azure-file-storage-pv
  labels:
    name: azure-file-storage
spec:
  ......
  flexVolume:
    readOnly: true
    .......

Please notice, PersistentVolume.Spec.AccessModes and PersistentVolumeClaim.Spec.AccessModes doesn't affect whether a storage is writable in PAI. They only take effect during the binding time between PV and PVC.

Confirm Environment on Worker Nodes

The notice in Kubernetes' document mentions: helper program may be required to consume a certain type of PersistentVolume. For example, all worker nodes should have nfs-common installed if you want to use NFS PV. You can confirm it using the command apt install nfs-common on every worker node.

Since different PVs have different requirements, you should check the environment according to the document of the PV.

Assign Storage to PAI Groups

The PVC name is used as the storage name in OpenPAI. After you have set up the PV/PVC and checked the environment, you need to assign storage to users. In OpenPAI, the name of the PVC is used as the storage name, and the access of different storage is managed by user groups. To assign storage to a user, please use RESTful API to assign storage to the groups of the user.

Before querying the API, you should get an access token for the API. Go to your profile page and copy one:

In OpenPAI, storage is bound to group. Thus you use the Group API to assign storage to groups. Get a group first, and then Update its extension.

For example, if you want to assign nfs-storage PVC to default group. First, GET http(s)://<pai-master-ip>/rest-server/api/v2/groups/default, it will return:

{
  "groupname": "default",
  "description": "group for default vc",
  "externalName": "",
  "extension": {
    "acls": {
      "storageConfigs": [],
      "admin": false,
      "virtualClusters": ["default"]
    }
  }
}

The GET request must use the header Authorization: Bearer <token> for authorization. This remains the same for all API calls. You may notice the storageConfigs in the return body. It controls which storage a group can use. To add a nfs-storage to it, PUT http(s)://<pai-master-ip>/rest-server/api/v2/groups. The request body is:

{
  "data": {
    "groupname": "default",
    "extension": {
      "acls": {
        "storageConfigs": ["nfs-storage"],
        "admin": false,
        "virtualClusters": ["default"]
      }
    }
  },
  "patch": true
}

Do not omit any fields in extension or it will change the virtualClusters setting unexpectedly.

Example: Use Storage Manager to Create an NFS + SAMBA Server

To help you set up the storage, OpenPAI provides a storage manager, which can set up an NFS + SAMBA server. In the cluster, the NFS storage can be accessed in OpenPAI containers. Out of the cluster, users can mount the storage on a Unix-like system, or access it in File Explorer on Windows.

Please read the document about service management and paictl first, and start a dev box container. Then, in the dev box container, pull the configuration by:

./paictl config pull -o /cluster-configuration

To use storage manager, you should first decide on a machine in the PAI system to be the storage server. The machine must be one of PAI workers, not PAI master. Please open /cluster-configuration/layout.yaml, choose a worker machine, then add a pai-storage: "true" field to it. Here is an example of the edited layout.yaml:

......

- hostname: worker1
  nodename: worker1
  hostip: 10.0.0.1
  machine-type: GENERIC-WORKER
  pai-worker: "true"
  pai-storage: "true"  # this line is newly added

......

In this tutorial, we assume you choose the machine with IP 10.0.0.1 as the storage server. Then, in /cluster-configuration/services-configuration.yaml, find the storage manager section:

# storage-manager:
#   localpath: /share
#   security-type: AUTO
#   workgroup: WORKGROUP
#   smbuser: smbuser
#   smbpwd: smbpwd

Uncomment it like:

storage-manager:
  localpath: /share
#  security-type: AUTO
#  workgroup: WORKGROUP
  smbuser: smbuser
  smbpwd: smbpwd

The localpath determines the root data dir for NFS on the storage server. The smbuser and smbpwd determine the username and password when you access the storage in File Explorer on Windows.

Follow these commands to start the storage manager:

./paictl.py service stop -n cluster-configuration storage-manager
./paictl.py config push -p /cluster-configuration -m service
./paictl.py service start -n cluster-configuration storage-manager

If the storage manager is successfully started, you will find the folder /share/data and /share/users on the storage server. On a Ubuntu machine, you can use the following command to test whether the NFS server is correctly set up:

# replace 10.0.0.1 with your storage server IP
sudo apt update 
sudo apt install nfs-common
mkmdir -p /mnt/data
sudo mount -t nfs --options nfsvers=4.1 10.0.0.1:/data/ /mnt/data

To make the NFS storage available in PAI, we should create the PV and PVC for it. Thus, create the following nfs-storage.yaml file in the dev box container first:

# replace 10.0.0.1 with your storage server IP
# NFS Persistent Volume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-storage-pv
  labels:
    name: nfs-storage
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  mountOptions:
    - nfsvers=4.1
  nfs:
    path: /data
    server: 10.0.0.1
---
# NFS Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-storage
# labels:
#   share: "false"      # to mount sub path on PAI
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi    # no more than PV capacity
  selector:
    matchLabels:
      name: nfs-storage # corresponding to PV label

Use kubectl create -f nfs-storage.yaml to create the PV and PVC.

Since the Kubernetes PV requires the node using it has the corresponding driver, we should use apt install nfs-common to install the nfs-common package on every worker node.

Finally, assign storage to PAI groups by rest-server API. Then you can mount it into job containers.

How to upload data to the storage server? On Windows, open the File Explorer, type in \\10.0.0.1 (please change 10.0.0.1 to your storage server IP), and press ENTER. The File Explorer will ask you for authorization. Please use smbuser and smbpwd as the username and password to log in. On a Unix-like system, you can mount the NFS folder to the file system. For example, on Ubuntu, use the following command to mount it:

# replace 10.0.0.1 with your storage server IP
sudo apt update 
sudo apt install nfs-common
mkmdir -p /mnt/data
sudo mount -t nfs --options nfsvers=4.1 10.0.0.1:/data/ /mnt/data

The above steps only set up a basic SAMBA server. So each user shares the same username and password to access it on Windows. If your cluster is in AAD mode, and you want to integrate the SAMBA server with the AAD system, please refer to the following configuration for storage manager:

storage-manager:
  workgroup: # workgroup
  security-type: ADS
  default_realm: # default realm
  krb5_realms: # realms
    XXX1: # relam name
      kdc: # kdc
      default_domain: # default domain
    XXX2: # relam name
      kdc: # kdc
      default_domain: # default domain
  domain_realm: # domain realm
    kdc: # kdc
    default_domain: # default domain
  domainuser: # domain user
  domainpwd: # password of domain user
  idmap: # idmap
  - "idmap config XXX1"
  - "idmap config XXX2"
  - "idmap config XXX3"
  - "idmap config XXX4"