École de printemps Calcul Québec 2019¶
Data Analysis with Python - Objectives¶
Introduction to the Supercomputer¶
- Go to the Compute Canada wiki and search for "École 2019".
- Upload the submission file proposed on the wiki to the cluster: ecole2019.calculquebec.cloud
- Create a directory on the cluster to contain the submission script, and move the script into it.
- Increase the task duration to 2 minutes using nano.
- Modify the account using nano to use:
def-sponsor00 - Submit the task.
- Retrieve the result files to your computer.
Introduction to OpenRefine¶
- Import the OpenRefine project generated by the task (projet.json).
- Undo the operation that eliminated rows with missing extermination data.
- Replace missing extermination count data with 0.
- Add a "quadrimester" (four-month period) column:
- January to April = 1
- May to August = 2
- September to December = 3
- Save the resulting dataset.
Introduction to Pandas / Python¶
- Upload the result to Jupyter.
- Open the resulting dataset with Pandas.
- Calculate the average number of exterminations per district.
- Calculate the sum of exterminations by quadrimester (four-month period) per year.
- Plot a stacked histogram:
- of the total number of exterminations by quadrimester (four-month period), per year.
- of the total number of exterminations by year, by quadrimester (four-month period).
- Discuss the differences between the two graphs.
Introduction to the Supercomputer¶
Submission Script¶
```bash title="job.sh"
!/bin/bash¶
SBATCH --time=00:00:30¶
SBATCH --account=def-xyz¶
SBATCH --ntasks=1¶
SBATCH --cpus-per-task=1¶
SBATCH --mem=512M¶
/project/def-sponsor00/projet/bin/simulation_punaises.py 5000