Skip to content

École de printemps Calcul Québec 2019

Data Analysis with Python - Objectives

Introduction to the Supercomputer

  • Go to the Compute Canada wiki and search for "École 2019".
  • Upload the submission file proposed on the wiki to the cluster: ecole2019.calculquebec.cloud
  • Create a directory on the cluster to contain the submission script, and move the script into it.
  • Increase the task duration to 2 minutes using nano.
  • Modify the account using nano to use: def-sponsor00
  • Submit the task.
  • Retrieve the result files to your computer.

Introduction to OpenRefine

  • Import the OpenRefine project generated by the task (projet.json).
  • Undo the operation that eliminated rows with missing extermination data.
  • Replace missing extermination count data with 0.
  • Add a "quadrimester" (four-month period) column:
    • January to April = 1
    • May to August = 2
    • September to December = 3
  • Save the resulting dataset.

Introduction to Pandas / Python

  • Upload the result to Jupyter.
  • Open the resulting dataset with Pandas.
  • Calculate the average number of exterminations per district.
  • Calculate the sum of exterminations by quadrimester (four-month period) per year.
  • Plot a stacked histogram:
    • of the total number of exterminations by quadrimester (four-month period), per year.
    • of the total number of exterminations by year, by quadrimester (four-month period).
  • Discuss the differences between the two graphs.

Introduction to the Supercomputer

Submission Script

```bash title="job.sh"

!/bin/bash

SBATCH --time=00:00:30

SBATCH --account=def-xyz

SBATCH --ntasks=1

SBATCH --cpus-per-task=1

SBATCH --mem=512M

/project/def-sponsor00/projet/bin/simulation_punaises.py 5000