AlphaFold3/en
This page discusses how to use AlphaFold v3.0.
Source code and documentation for AlphaFold3 can be found at their GitHub page. Any publication that discloses findings arising from use of this source code or the model parameters should cite the AlphaFold3 paper.
Available versions¶
AlphaFold3 is available on our clusters as prebuilt Python packages (wheels). You can list available versions with avail_wheels.
AlphaFold2 is still available. Documentation is here.
Creating a requirements file for AlphaFold3¶
-
Load AlphaFold3 dependencies.
-
Download run script.
Choose the appropriate version:
- For version 3.0.1:
- For version 3.0.0:
-
Create and activate a Python virtual environment.
-
Install a specific version of AlphaFold3 and its Python dependencies.
where
X.Y.Zis the exact desired version, for instance3.0.0. You can omit to specify the version in order to install the latest one available from the wheelhouse. -
Build data.
This will create data files inside your virtual environment.
-
Validate it.
-
Freeze the environment and requirements set.
-
Deactivate the environment.
-
Clean up and remove the virtual environment.
The virtual environment will be created in your job instead.
Model¶
You can obtain the model by requesting it from Google. They aim to respond to requests within 2-3 business days. Please see Obtaining Model Parameters.
Databases¶
Note that AlphaFold3 requires a set of databases.
Important: The databases must live in the $SCRATCH directory.
-
Download the fetch script
-
Download the databases
Running AlphaFold3 in stages¶
AlphaFold3 must be run in stages, that is: 1. Splitting the CPU-only data pipeline from model inference (which requires a GPU), to optimise cost and resource usage. 2. Caching the results of MSA/template search, then reusing the augmented JSON for multiple different inferences across seeds or across variations of other features (e.g. a ligand).
For reference on AlphaFold3: * see inputs * see outputs * see performance
1. Data pipeline (CPU)¶
Edit the following submission script according to your needs.
#!/bin/bash
#SBATCH --job-name=alphafold3-data
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00 # adjust this to match the walltime of your job
#SBATCH --cpus-per-task=8 # a MAXIMUM of 8 core, AlphaFold has no benefit to use more
#SBATCH --mem=64G # adjust this according to the memory you need
# Load modules dependencies.
module load StdEnv/2023 hmmer-alphafold3/3.4 rdkit/2024.03.5 python/3.12
DOWNLOAD_DIR=$SCRATCH/alphafold/dbs # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your input data
OUTPUT_DIR=$SLURM_TMPDIR/alphafold/output # set the appropriate path to your output data
# Generate your virtual environment in $SLURM_TMPDIR.
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
# Install AlphaFold and its dependencies.
pip install --no-index --upgrade pip
pip install --no-index --requirement ~/alphafold3-requirements.txt
# build data in $VIRTUAL_ENV
build_data
# https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#compilation-time-workaround-with-xla-flags
export XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
# Edit with the proper arguments and run your commands.
# run_alphafold.py --help
python run_alphafold.py \
--db_dir=$DOWNLOAD_DIR \
--input_dir=$INPUT_DIR \
--output_dir=$OUTPUT_DIR \
--jax_compilation_cache_dir=$HOME/.cache \
--nhmmer_n_cpu=$SLURM_CPUS_PER_TASK \
--jackhmmer_n_cpu=$SLURM_CPUS_PER_TASK \
--norun_inference # Run data stage
# copy back
mkdir $SCRATCH/alphafold/output
cp -vr $OUTPUT_DIR $SCRATCH/alphafold/output
2. Model inference¶
Edit the following submission script according to your needs.
Compatibility
AlphaFold3 only supports compute capability 8.0 or greater, that is A100s or greater.
#!/bin/bash
#SBATCH --job-name=alphafold3-inference
#SBATCH --account=def-someprof # adjust this to match the accounting group you are using to submit jobs
#SBATCH --time=08:00:00 # adjust this to match the walltime of your job
#SBATCH --cpus-per-task=1 # AlphaFold has no benefit to use more for the inference stage
#SBATCH --gpus=a100:1 # AlphaFold3 inference only runs on ONE A100 or greater.
#SBATCH --mem=20G # adjust this according to the memory you need
# Load modules dependencies.
module load StdEnv/2023 hmmer-alphafold3/3.4 rdkit/2024.03.5 python/3.12 cuda/12.2 cudnn/9.2
DOWNLOAD_DIR=$SCRATCH/alphafold/dbs # set the appropriate path to your downloaded data
INPUT_DIR=$SCRATCH/alphafold/input # set the appropriate path to your input data, following the data stage.
OUTPUT_DIR=$SCRATCH/alphafold/output # set the appropriate path to your output data
# Generate your virtual environment in $SLURM_TMPDIR.
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
# Install AlphaFold and its dependencies.
pip install --no-index --upgrade pip
pip install --no-index --requirement ~/alphafold3-requirements.txt
# build data in $VIRTUAL_ENV
build_data
# https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#compilation-time-workaround-with-xla-flags
export XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
# https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#gpu-memory
export XLA_PYTHON_CLIENT_PREALLOCATE=true
export XLA_CLIENT_MEM_FRACTION=0.95
# Edit with the proper arguments and run your commands.
# run_alphafold.py --help
python run_alphafold.py \
--db_dir=$DOWNLOAD_DIR \
--input_dir=$INPUT_DIR \
--output_dir=$OUTPUT_DIR \
--jax_compilation_cache_dir=$HOME/.cache \
--norun_data_pipeline # Run inference stage
3. Job submission¶
Then, submit the jobs to the scheduler.
Independent jobs¶
Wait until it completes, then submit the second stage:
Dependent jobs¶
jid1=$(sbatch alphafold3-data.sh)
jid2=$(sbatch --dependency=afterok:$jid1 alphafold3-inference.sh)
sq
If the first stage fails, you will have to manually cancel the second stage:
Troubleshooting¶
Out of memory (GPU)¶
If you would like to run AlphaFold3 on inputs larger than 5,120 tokens, or on a GPU with less memory (an A100 with 40 GB of memory, for instance), you can enable unified memory
In your submission script for the inference stage, add these environment variables:
export XLA_PYTHON_CLIENT_PREALLOCATE=false
export TF_FORCE_UNIFIED_MEMORY=true
export XLA_CLIENT_MEM_FRACTION=2.0 # 2 x 40GB = 80 GB
and adjust the amount of memory allocated to your job accordingly, for instance: #SBATCH --mem=80G