Let's dive into machine learning

You might have heard, there’s this thing out there called artificial intelligence. Though in many cases, I think that machine learning is still a better description. While giant models and commercial services get most of the attention, it’s become easier than ever to run models on your own private infrastructure.

One of my goals when I upgraded my Proxmox virtualization environment was to have some capacity for machine learning. So I bought a roughly $400 graphics card (circa 2022) with an Nvidia GPU and 12 GB of VRAM.

This article will show how to build a Debian virtual machine under Proxmox, connect the GPU on the video card, and setup oobabooga. This is still a fast moving space, so while these instructions should work as of January 2024, but they might not in February 2024.

Because things move fast and break in AI/ML, I think it makes a great use case for virtualization. Once you have a working environment, snapshot and back it up. If a future update breaks everything, just roll back.

Assumptions

You will need experience with Linux and Proxmox. This is not a tutorial for beginners.

We will use a graphics card with a single Nvidia GPU. We’re not going to try and use AMD GPUs. We’re not using multiple cards, but that should be possible. We’re not building a research cluster, just experimenting at home.

We’re not sharing the GPU, our using any abstractions like VirtIO-GPU. It might be possible to share a GPU across multiple virtual machines, it might not.

I am not using the graphics card for video. I use the integrated video on the motherboard. It’s possible that if you are using the graphics card as a, graphics card, you might run into issues sharing the PCIe device. Maybe not.

You can probably skip the Proxmox setup and apply most of this to a physical hardware running Debian, but that’s left up to the reader.

My Proxmox server has an AMD CPU. This has implications for the PCIe pass-through. It’s slightly different on Intel CPUs, and not all motherboards will properly support PCIe pass-through with Proxmox.

These instructions have not been double checked, there may even more mistakes than usual.

Proxmox

By default, Proxmox will not allow PCIe pass-through. Your motherboard probably won’t either.

First, make sure that I/O Memory Management Unit interrupt remapping is turned on in your BIOS/UEFI configuration (look for IOMMU or VT-d). Your hardware may vary, so you’re on your own here.

Follow the instructions in the Proxmox wiki: https://pve.proxmox.com/wiki/PCI(e)_Passthrough

For my AMD system:

Edit /etc/default/grub and change the default Linux kernel command-line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt"

Intel systems will use this option instead: intel_iommu=on

Then add the following lines to /etc/modules:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Run update-grub and update-initramfs -u -k all.

You will need to reboot your Proxmox server. Again, consult the Proxmox wiki. Follow their instructions if they differ from mine.

Locate your graphics card on the PCIe bus

Once you’ve rebooted, you will need to determine the PCIe bus address for your graphics card.

For example:

root@pve1:~# lspci | grep "VGA"
04:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)

Make note of the first part of the output, “04:00.0”. Newer versions of Proxmox will also list out the names of the PCIe devices when assigning to a virtual machine, but keep this information handy, just in case.

Debian

Create a new Debian 12 virtual machine. I won’t go into any detail on how to do this. However, keep in mind:

  1. You will need a lot of disk space for models.
  2. We will switch Debian to a serial console, just to make extra sure we don’t use the video card as a video card.

I recommend installing Debian to a relatively small disk (64 GB is plenty). Once Debian is running, we’ll add a second larger disk for the models. A couple advantages here:

  • You can use different storage for your models. Maybe NFS. They can be on a larger and possibly slower storage systems.
  • You can skip backups on the disk for models. Backing up a terabyte of model data everyday is a great way to exhaust even the largest backup storage system. Since you’re likely downloading them from public sources, you probably don’t need good backups.
  • Flexibility. Move, grow, shrink, etc. without having to mess with your boot disk.

With that in mind, create a new virtual machine. Just create a single disk, default graphics card, etc:

Create: Virtual Machine

I’ve found you really need at least 16 GB of RAM and likely more, but you could start with 1-2 GB just to install.

Start the virtual machine and install Debian. The exact installation options shouldn’t matter much, however, I recommend:

  • Partition Disks
    • Choose: “Guided - use entire disk and setup LVM” (gives you flexibility later).
    • Partitioning scheme: “All files in one partition” (should be fine for this, if you like separate partitions, that’s fine too).
    • Use something other than EXT4 (Debian’s default file system). I’m going to use XFS, but it shouldn’t really matter, and I won’t cover how to switch from EXT4 here.
  • Software selection
    • Don’t install a desktop environment. We will switch to a serial console after installation, so it would just add bloat and none of the tools I used thus far would benefit from it.
    • Do install an SSH server. You should just tick the “SSH server” and “standard system utilities” check boxes.

Finish the install and continue with this guide when the virtual machine has booted and you’re at a root shell.

Serial Console

We’ll switch Debian and Proxmox to use a serial console. This probably isn’t necessary. However, I don’t want to deal with any conflicts or weird issues if the virtual machine suddenly decides to use the graphics card for video. Plus, it’s retro cool.

Setup GRUB to use the serial console at boot:

Log in as root and add this to /etc/default/grub:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8"
GRUB_TERMINAL=serial
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"

It should look like this, I also removed quiet from the default command line:

/etc/default/grub

Then run update-grub.

Tell systemd to use a serial terminal by running:

systemctl start serial-getty@ttyS0.service

Shut down the virtual machine: halt -p

In Proxmox, change the display from “Default” to “Serial terminal (serial0)”:

Serial Console

You also need add a serial device to the virtual machine:

Click “Add”, select “Serial Port” from the drop down, then add port number 0:

Serial Port

Start the virtual machine. If it works, GRUB will give you that authentic ASCII art retro look:

ASCII art GRUB

If Linux switched to the serial console, you should see “ttyS0” near the login prompt:

Serial Console Success

While you can continue to use the noVNC console, consider switching to xterm.js. Cut and paste will work. Alternatively, just SSH into the box. :-/

Add more devices

We’re getting closer, but we need to add a disk for models and connect our video card. Shut down the virtual machine and go back to Proxmox.

Add a second disk (see the Proxmox documentation for details). For my instance, I’ll add a 512 GB local disk.

You might also want to detach the Debian install media. If you delete the install ISO or move to a Proxmox server that doesn’t have it, the virtual machine will fail to start.

Last, map you graphics card. Click “Add” and select “PCI Device” from the drop down.

Select the “Raw Device” radio button and click the “Device:” drop down:

Add PCI Device

Double check that the device ID is the same as the one you located when you ran lspci earlier. You can ignore the leading 0000.

Not pictured above, but there is a check box under “Raw Device” that says “All Functions”. Check that. I’m not sure if it matters, but it works for me.

So now, your virtual machine’s hardware should look like:

Finalized Hardware

Start up the virtual machine, we’re making progress!

A few more things

Did you notice that it takes a while to start the virtual machine now? On my system, once you connect a PCIe device, it seems to take about 30 seconds for the virtual machine to start. I’m not sure why the PCIe mapping is so slow, but we’ll ignore this.

Let’s get things ready by setting up that disk for models, installing some utilities, and the Nvidia CUDA runtime.

Install your preferred utilities beyond what Debian installed, for me:

apt install git python3-pip vim ncdu curl tmux sudo

You will definitely need python3-pip as most AI/ML things are written in Python and use the PIP package manager. We will use GIT to grab oobabooga later.

You will absolutely need the Nvidia CUDA toolkit. You might need the bleeding edge drivers from Nvidia, however, I’ve found the ones from the Debian non-free repository work fine.

As always, consult the upstream documentation at https://wiki.debian.org/NvidiaGraphicsDrivers#Debian_12_.22Bookworm.22 but this should work:

  • Add contrib, non-free, and non-free-firmware to /etc/apt/sources.list.
  • apt update
  • apt install nvidia-driver firmware-misc-nonfree nvidia-cuda-dev nvidia-cuda-toolkit

You will probably see a message that there’s a conflicting driver (nouveau) with the recommendation to reboot to fix. That’s fine and expected.

Once that installs, reboot, then let’s do a super quick test:

root@ml2024:~# nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Sun Jan 21 16:32:52 2024
Driver Version                            : 525.147.05
CUDA Version                              : 12.0

Attached GPUs                             : 1
GPU 00000000:00:10.0
    Product Name                          : NVIDIA GeForce RTX 3060
    Product Brand                         : GeForce
    Product Architecture                  : Ampere
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled

nvidia-smi will show a ton of information. As long as it identifies your card, you’re good.

One last thing before we install any AI/ML applications: create and mount your models disk.

Again, this is not a tutorial on Linux, Debian, Proxmox, so, you’re looking at something like this:

cfdisk /dev/vdb   # Create a partition
mkfs.xfs /dev/vdb1
mkdir /mnt/models
echo "/dev/vdb1 /mnt/models xfs defaults 0 0" >> /etc/fstab
systemctl daemon-reload
mount /mnt/models

You should make your user account the owner of that mount: chown user:user /mnt/models (I named the normal user account user during the Debian installation).

Last and optional, but recommended. Shut down the virtual machine and take a snapshot in Proxmox. I’d probably go ahead and make a backup as well. This will give you a clean snapshot to roll back to.

Let’s install something

At this point, you should have a decent environment capable of running most things. But we’re going to install oobabooga. This will give you the ability to run large language models, and more or less have your own private chatbot.

As previously warned, AI/ML is a fast moving space. The following instructions worked in January 2024, but they might not today. Read the upstream documentation.

Start your virtual machine, if you stopped it to take clean backup and/or snapshot.

Login as a normal user (I named this account user during Debian setup).

Clone the oobabooga repository from GitHub. I usually create a src directory:

mkdir src
cd src
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui

Oobabooga will store your models alongside the GIT repository, but we don’t want that. So let’s create a directory on /mnt/models, move the existing files over, and symlink:

cd ~/src/text-generation-webui
mkdir /mnt/models/oobabooga
# ^^ If you get an error, you probably didn't change the permissions on /mnt/model
cp models/* /mnt/models/oobabooga/
rm -rf models
ln -s /mnt/models/oobabooga models

Run ./start_linux.sh.

This will download some things and setup your environment. After a while…


What is your GPU?

A) NVIDIA
B) AMD (Linux/MacOS only. Requires ROCm SDK 5.6 on Linux)
C) Apple M Series
D) Intel Arc (IPEX)
N) None (I want to run models in CPU mode)

Input>

Select A.

You will also be prompted with this. Since I have an RTX series card, I’ll choose N as instructed:

Do you want to use CUDA 11.8 instead of 12.1? Only choose this option if your GPU is .
For RTX and GTX series GPUs, say "N". If unsure, say "N".

Input (Y/N)> N

It will install PyTorch and probably a bunch of other things.

If everything succeeds:


*******************************************************************
* WARNING: You haven't downloaded any model yet.
* Once the web UI launches, head over to the "Model" tab and download one.
*******************************************************************


17:06:01-813503 INFO     Starting Text generation web UI                        
17:06:01-816099 INFO     Loading the extension "gallery"                        
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Which is great, but you won’t be able to use it because it binds to localhost. To access it outside of the virtual machine, you need to “create a public link”. Another, safer, alternative would be to use an SSH tunnel.

Option one requires editing CMD_FLAGS.txt. Add a line with --listen (or uncomment the existing line, if any):

user@ml2024:~/src/text-generation-webui$ cat CMD_FLAGS.txt 
# Only used by the one-click installer.
# Example:
# --listen --api
--listen --load-in-8bit

--load-in-8bit tells the application to lower the floating point precision. I quickly learned that 12GB of VRAM is nothing, and a lot of models will only run if you reduce the floating point precision. You can also do this in the web interface, so it’s optional.

Restart, and you should be able to connect from another box on the same network:

user@ml2024:~/src/text-generation-webui$ ./start_linux.sh 


*******************************************************************
* WARNING: You haven't downloaded any model yet.
* Once the web UI launches, head over to the "Model" tab and download one.
*******************************************************************


17:15:32-192971 INFO     Starting Text generation web UI                        
17:15:32-195150 WARNING                                                         
                         You are potentially exposing the web UI to the entire  
                         internet without any access password.                  
                         You can create one with the "--gradio-auth" flag like  
                         this:                                                  
                                                                                
                         --gradio-auth username:password                        
                                                                                
                         Make sure to replace username:password with your own.  
17:15:32-196469 INFO     Loading the extension "gallery"                        
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

Heed the warning. We’ve just exposed something to the local area network and possibly beyond. By default, there’s no authentication. Don’t do this on a network you don’t fully control (and maybe not even then).

A safer option would be NOT to set --listen and use an SSH tunnel, but I won’t go into that here.

You have been warned, twice!

Head over to a box with a GUI that can reach the virtual machine’s network. Run ip addr at the shell if you don’t know the IP address of the virtual machine. If you enabled the qemu guest agent when you setup the virtual machine in Proxmox, Debian should have installed the guest tools, and the IP should show up in the virtual machine summary page.

We’re almost there, and from a web browser, it should look like:

Almost there

Models!

Our last warning from oobabooga was that we don’t have any models downloaded.

Oobabooga can download models directly from Hugging Face.

You’re looking for “Natural Language Processing” models, and we’ll just start with the “Conversational” category. This link might work: https://huggingface.co/models?pipeline_tag=conversational

You will want to start with smaller models. I’ve found that on a card with only 12 GB of VRAM, in 8-bit precision, I can generally run models with 3 billion parameters or less. There are tricks to get around this, and progress is being made, so this is getting better every day.

At the moment, the microsoft/DialoGPT-medium model is trending and is pretty small, so we’ll try that:

Download a model

You still need to load the model into memory. As of January 2024, you need to:

Download a model

  1. Press the “Refresh” button.
  2. Select your model (microsoft/DialoGPT-medium) from the drop down.
  3. Press the “Load” button.

If it worked, you’ll see a success message (indicated by the red arrow).

Success!

As noted by Oobabooga when you load the model, you need to use “instruct” or “chat-instruct” when chatting. You’ll get less than interesting results if you don’t use the prompts or templates the model expects.

Switch to the chat tab and scroll way down, this should reveal a few options, where you can change the mode to “chat-instruct”:

Change the mode

Now, you should be able to get some meaningful responses from this model (or not):

Success?

Final Thoughts

I didn’t leave you with anything especially useful, but there are plenty of models out there and some are quite interesting. Oobabooga also has the ability to add your own training data, so you can extend a model without starting from scratch.

At this point, I would recommend shutting down the virtual machine and taking another snapshot and/or backup.

You will quickly exhaust all your disk space (both in /mnt/models and your Proxmox backups), so keep that in mind. You might want to exclude your models disk from backups in Proxmox. Alternatively, you might just want to disable automatic backups for the virtual machine and just make manual backups as needed.

And last, this is all very portable. You can move your virtual machine to another Proxmox server. Perhaps a friend with better/more GPUs and VRAM. All you should have to do is migrate/restore the virtual machine, fix up the PCIe pass-through, and go.