Setting Up Infiniband on Proxmox VE
This article will cover how to configure an Infiniband card on Proxmox VE —tested on version 9.x, but should apply to newer versions—.
We will use the following terminology:
- IB: Infiniband
- MLX: Mellanox
- SM: Subnet Manager
If you wish to passthrough the IB card to VM’s, you will need to enable SR-IOV, otherwise you may skip all SR-IOV/IOMMU related instructions.
Another important consideration is that we will need to enable the systemd
opensm
(Open Subnet Manager) service in the event we wish to use IPoIB
(IP-over-Infiniband).
This service must run only on ONE node in the Infiniband network and is only required if the switch itself does not provide SM support.
Requirements
Firstly to setup SR-IOV we must enable IOMMU, for that you should refer to the official Proxmox VE documentation:
Once you’ve ensured IOMMU is enabled, we must install the following required packages and dependencies:
- infiniband-diags
- ibutils
- rdma-core
- rdmacm-utils
- mstflint
apt update -y
apt install -y infiniband-diags ibutils rdma-core rdmacm-utils mstflint
Manually Enabling SR-IOV
In this article we will use 4 GUIDs / Virtual IOs for passing through the Mellanox card as it’s a performance oriented count, however if you require more density you can adjust all the scripts to a higher count (8).
Getting the Identifiers
To enable SR-IOV we must first identify the Mellanox card’s PCI Bus.
lspci | grep -i mellanox
You will get the following output (or something similar):
01:00.0 Infiniband Controller: Mellanox Technologies MT28908 Family [ConnectX-6]
This tells us the PCI Bus is 01:00.0
so we will set it as a variable for easy
reference in the following commands.
# Get the MLX Device Name and Bus
## Method 1
MLX_DEV=$(ibstat --list_of_cas | head -n 1)
MLX_BUS=$(lspci | grep -i mellanox | grep -iv "virtual function" | awk '{print $1}')
## Method 2
MLX_BUS=$(ethtool -i "${MLX_DEV}" | grep "bus-info" | awk -F ": " '{print $NF}')
MLX_DEV=$(ls "/sys/bus/pci/devices/${MLX_BUS}/net/" | head -n 1)
Check that the variables are correct:
cat << EOL
Mellanox Device: ${MLX_DEV}
Mellanox PCI Bus: ${MLX_BUS}
EOL
You should get something like:
# THIS
Mellanox Device: ibp1s0
Mellanox PCI Bus: 01:00.0
# OR THIS
Mellanox Device: ibp1s0
Mellanox PCI Bus: 0000:01:00.0
Setting up the Automatic GUIDs
We need to setup some scripts and services so that the IB card will create the VFIO GUIDs automatically on system start.
First we can query the card details:
# Command
root@pve:~# mstflint -d "$MLX_BUS" q
# Output
Image type: FS4
FW Version: 20.31.1014
FW Release Date: 30.6.2021
Product Version: 20.31.1014
Rom Info: type=UEFI version=14.24.13 cpu=AMD64,AARCH64
type=PXE version=3.6.403 cpu=AMD64
Description: UID GuidsNumber
Base GUID: e8ebd30300a022fe 0
Base MAC: e8ebd3a022fe 0
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000222
Security Attributes: N/A
As we can see we have no GUIDs, we will also enable those (besides SR-IOV) with the following command:
# Command
root@pve:~# mstconfig -d "$MLX_BUS" set SRIOV_EN=1 NUM_OF_VFS=4
# Output
Device #1:
----------
Device type: ConnectX6
Name: MCX653105A-ECA_Ax
Description: ConnectX-6 VPI adapter card; 100Gb/s (HDR100; EDR IB and 100GbE); single-port QSFP56; PCIe3.0 x16; tall bracket; ROHS R6
Device: 01:00.0
Configurations: Next Boot New
SRIOV_EN True(1) True(1)
NUM_OF_VFS 4 4
Apply new Configuration? (y/n) [n] : y
Now we need to automatically initialize the GUIDs on startup, and for that we can use the scripts kindly provided and open-sourced by jose-d on his repository.
Create the following script /usr/local/bin/init-ib-guids.sh
with the content
below:
#!/bin/bash
first_dev=$(ibstat --list_of_cas | head -n 1)
node_guid=$(ibstat ${first_dev} | grep "Node GUID" | cut -d ':' -f 2 | xargs | cut -d 'x' -f 2)
port_guid=$(ibstat ${first_dev} | grep "Port GUID" | cut -d ':' -f 2 | xargs | cut -d 'x' -f 2)
echo "first dev: $first_dev"
echo "node guid: $node_guid"
echo "port_guid: $port_guid"
if ip link show $first_dev &> /dev/null ; then
for vf in {0..3}; do
vf_guid=$(echo "${port_guid::-5}cafe$((vf+1))" | sed 's/..\B/&:/g')
echo "vf_guid for vf $vf is $vf_guid"
ip link set dev ${first_dev} vf $vf port_guid ${vf_guid}
ip link set dev ${first_dev} vf $vf node_guid ${vf_guid}
ip link set dev ${first_dev} vf $vf state auto
done
fi
## Section below was added to start opensm after GUID startup
enable_opensm=false
for arg in "$@"; do
case $arg in
--enable-opensm)
enable_opensm=true
shift
;;
esac
done
if $enable_opensm; then
echo "OpenSM is enabled, starting service."
systemctl start opensm.service
fi
Don’t forget to make the script executable and root owned.
chown root:root /usr/local/bin/init-ib-guids.sh
chmod +x /usr/local/bin/init-ib-guids.sh
After that we can create the systemd service for it:
With OpenSM Startup
cat > /etc/systemd/system/mellanox-initvf.service << EOF
[Unit]
After=network.target
[Service]
Type=oneshot
# note: change according to your hardware:
ExecStart=/bin/bash -c "/usr/bin/echo 4 > /sys/class/infiniband/${MLX_DEV}/device/sriov_numvfs"
ExecStart=/usr/local/bin/init-ib-guids.sh --enable-opensm
StandardOutput=journal
TimeoutStartSec=60
RestartSec=60
[Install]
WantedBy=multi-user.target
EOF
Without OpenSM Startup
cat > /etc/systemd/system/mellanox-initvf.service << EOF
[Unit]
After=network.target
[Service]
Type=oneshot
# note: change according to your hardware:
ExecStart=/bin/bash -c "/usr/bin/echo 4 > /sys/class/infiniband/${MLX_DEV}/device/sriov_numvfs"
ExecStart=/usr/local/bin/init-ib-guids.sh
StandardOutput=journal
TimeoutStartSec=60
RestartSec=60
[Install]
WantedBy=multi-user.target
EOF
Then enable it:
# Enable service on-boot
systemctl enable mellanox-initvf.service
# Enable with now flag will also start it
systemctl enable mellanox-initvf.service --now
Now reboot the physical node.
On bootup you should see something akin to this:
root@pve:~# ibstat --list_of_cas
ibp1s0
mlx5_1
mlx5_2
mlx5_3
root@pve:~# ibstat
CA 'ibp1s0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0xe8ebd30300a022fe
System image GUID: 0xe8ebd30300a022fe
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x2651e84a
Port GUID: 0xe8ebd30300a022fe
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4124
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0x0000000000000000
System image GUID: 0xe8ebd30300a022fe
Port 1:
State: Down
Physical state: LinkUp
Rate: 100
Base lid: 65535
LMC: 0
SM lid: 1
Capability mask: 0x2651ec48
Port GUID: 0x0000000000000000
Link layer: InfiniBand
CA 'mlx5_2'
CA type: MT4124
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0x0000000000000000
System image GUID: 0xe8ebd30300a022fe
Port 1:
State: Down
Physical state: LinkUp
Rate: 100
Base lid: 65535
LMC: 0
SM lid: 1
Capability mask: 0x2651ec48
Port GUID: 0x0000000000000000
Link layer: InfiniBand
CA 'mlx5_3'
CA type: MT4124
Number of ports: 1
Firmware version: 20.31.1014
Hardware version: 0
Node GUID: 0x0000000000000000
System image GUID: 0xe8ebd30300a022fe
Port 1:
State: Down
Physical state: LinkUp
Rate: 100
Base lid: 65535
LMC: 0
SM lid: 1
Capability mask: 0x2651ec48
Port GUID: 0x0000000000000000
Link layer: InfiniBand
You might also want to check that your opensm.service
is running:
root@pve:~# systemctl status opensm.service
● opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers
Loaded: loaded (/usr/lib/systemd/system/opensm.service; disabled; preset: enabled)
Active: active (exited) since Wed 2025-10-22 12:34:51 -03; 57min ago
Invocation: 10e467fed7744e15978b29d595e599de
Docs: man:opensm(8)
Process: 495736 ExecCondition=/bin/sh -c if test "$PORTS" = NONE; then echo "opensm is disabled via PORTS=NONE."; exit 1; fi (code=exited, status=0/SUCCESS)
Process: 495737 ExecStart=/bin/sh -c if test "$PORTS" = ALL; then PORTS=$(/usr/sbin/ibstat -p); if test -z "$PORTS"; then echo "No InfiniBand ports found."; exit 0>
Main PID: 495737 (code=exited, status=0/SUCCESS)
Mem peak: 3M
CPU: 34ms
Oct 22 12:34:51 prox4 systemd[1]: Starting opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers...
Oct 22 12:34:51 prox4 sh[495737]: Starting opensm on following ports: 0xe8ebd30300a022fe
Oct 22 12:34:51 prox4 systemd[1]: Finished opensm.service - Starts the OpenSM InfiniBand fabric Subnet Managers.
You should now be able to set an IP address on the physical Infiniband card on Proxmox VE, and pass it through to your guests.
Resource Mapping
Now you can also do resource mapping for your Infiniband VFIOs by going to
Datacenter -> Resource Mappings -> PCI Devices -> Add
Then add each Virtual Function (don’t add the parent physical PCI-E Device or you’ll loose it on the physical host).