MetaPhlAn/en
MetaPhlAn is a "computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With StrainPhlAn, it is possible to perform accurate strain-level microbial profiling", according to its GitHub repository. While the software stack on our clusters does contain modules for a couple of older versions (2.2.0 and 2.8) of this software, we now expect users to install recent versions using a Python virtual environment.
For more information on how to use MetaPhlan, see their wiki.
Available wheels¶
You can list available wheels using the avail_wheels command:
name version python arch
--------- --------- -------- -------
MetaPhlAn 4.0.3 py3 generic
MetaPhlAn 3.0.7 py3 generic
Downloading databases¶
Note
MetaPhlAn requires a set of databases to be downloaded into the $SCRATCH.
Important
The database must live in the $SCRATCH.
Databases can be downloaded from Segatalab FTP.
-
From a login node, create the data folder:
-
Download the data:
parallel wget ::: http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103.tar http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt.bz2 http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJan21_CHOCOPhlAnSGB_202103_species.txt.bz2Note
This step cannot be done from a compute node but must be done from a login node.
-
Extract the downloaded data, for example using an interactive job:
Untar and unzip the databases:
Running MetaPhlAn¶
Once the database files have been downloaded and extracted, you can submit a job. You may edit the following job submission script according to your needs:
Then submit the job to the scheduler:
```bash sbatch metaphlan-job.sh