Big Data Analytics with Python at AIMS Senegal


Big Data Analytics with Python at AIMS Senegal

Course Background

Introduction

As part of the capacity development pillar of the Big Data for Development project, AIMS-NEI designed, a Big Data for Development Short Course Program (BD4D-SCP) taught across the AIMS-NEI network, first in Rwanda and now in Senegal.

The course targets people with passion in Data Science in general and in particular in Big Data Analytics, having at least a 4 years undergraduate degree or a minimum 2 to 3 years of work experience as a professional in Statistics or any other Data Science related topic.

A number of short courses are in the pipeline to achieve our BD4D objective to increase the number of data scientists in Africa and provide a platform for all practitioners to interact.  AIMS-NEI will also organize the First Leveraging the Power of Big Data (LPBD) Executive Education workshop. The goal of the workshop is to introduce C-level executives in the era of big data, demonstrating how big data is disrupting traditional enterprises and opens the door for new products and services.

Course Overview

Datasets are becoming increasingly large as the world population and things become more and more connected. Traditional data processing software and techniques cannot deal with these large-scale datasets. Thus, you need specialized frameworks and tools such as Apache Spark to deal with large datasets. This course teaches the essential basics of processing large scale datasets using Python. In addition, the course also teaches you how to perform common data science tasks such data wrangling and building machine learning models in Python.  This course takes a practical approach to equip participants with the most essential tools in the shortest possible time. The course emphasizes learning by doing, as such, they are a lot of exercises built into the course to give participants ample time to practice.

Approach

This course takes a practical approach to equip participants with the most essential tools in the shortest possible time. The lessons start with absolute basics of Python focusing mainly on data structures, and then quickly jump into core libraries for doing data science in Python. Next, the course moves to Big Data by first providing brief theoretical concepts about the topic and then teaches Apache Spark-a state of the art tool for processing large datasets. Thereafter, it provides introductory lectures in machine learning before moving on to show participants how to build these algorithms in Python. The course emphasizes learning by doing.

Summarized Course Objectives

  1. Understand advanced concepts of Python language: data structures, functions, classes and more.
  2. Perform common data science tasks using Python: data ingestion, processing, visualization, web scraping etc.
  3. Handle large scale data set (20gb+) on a personal computer using Apache Spark and utilize cloud computing platforms.
  4. Be familiar with theoretical basis of common machine learning algorithms.
  5. Be able to build and evaluate machine learning models using scikit-learn library

Outline

Day 1. 
Advanced Concepts in Python: on this first day, the course will focus on Python language to build strong foundation for the rest of the course materials. Participants will be introduced to intermediate to advanced level practical techniques such as writing functions, classes, error handling, packaging python code and more.

Day 2.
Python for Data Science: during the second day, the focus is performing common data science tasks using Python. We will go through how to do data ingestion, processing, analysis, visualization, web scraping and more using Python and along the way introduce essential packages (e.g., pandas, geopandas, numpy, matplotlib etc.) for doing these tasks.

Day 3.
Big Data Processing: on the third day, the course focuses on how to handle large data sets as using Python. The following topics will be covered: introduction to big data, multiprocessing in Python, Apache Spark, how to use common cloud platforms and more.

Day 4.
Machine Learning (ML) in Python: on this day, the course will first provide an introductory lecture on machine learning. The rest of the day will focus on how to perform various ML tasks (e.g., data preparation, model building, evaluation and interpretation) using the scikit-learn package in Python.

Day 5.
Putting it All Together: during the final day, we will focus on using the skills gained in this course to solve real life data science problems by looking at case studies. Potential case studies to be covered include: how to process nigh lights satellite images (geospatial), how to process massive call records from cellphones (mobile data) and how build ML models to impute missing sensor data (sensor data).

Pre-requisites

  • Programming: ability to write a simple program in Python (basic Python level)
  • Math and Statistics: a background in statistics, data science, or any quantitative sciences.

Information for applicants and about course instructor

Practical Information

This training will take place from 1-5 July 2019 in Senegal.  Attendance to the course is limited to 40 participants and free of charge. Lunches and Coffee breaks are available on-site on demand at the time of the registration. AIMS-NEI does not provide any financial assistance to successful applicants to attend this short course. AIMS-NEI encourages any successful candidate to make his or her own provision to cover all logistic costs related to their participation to this course including transport.

Instructor Profile

Dr. Dunstan Matekenya is a consummate Data Scientist with over 10 years’ experience in both traditional statistics and modern machine learning methods. Currently, he works as a Data Scientist at the World Bank Group Headquarters in Washington DC. Prior to joining the WBG, Dunstan completed his PhD at the University of Tokyo in 2016. His Ph.D. research focused on use of machine learning methods to explore insights from mobile phone data. Before re-orienting his career into Data Science, Dunstan earlier worked as a Statistician at the National Statistical Office in Malawi from 2007 up until 2017. In Malawi, he actively contributed to flagship projects such as the 2008 Malawi Population and Housing Census and led the GIS unit. His passion includes contributing to modernization of official statistics in developing countries with use of alternative data sources such as mobile phone data as well improving capacity in Data Science.

Application Selection Process

Application Selection Process

All candidates interested in applying for the Big Data Analytics with Python short course must use the AIMS-NEI online portal online application system to complete and submit their application with all supporting documents by the deadline indicated. We will notify shortlisted candidates to collect additional information to finalize their applications. Shortlisted candidates will be assessed for Python ability in order to finalize the selections. Within a week of the deadline, we will inform successful applicants.

Deadline for applications: June 7th, 2019 – 11:59 PM (EAT). Any inquiries about this short course were expected to be sent to: aii@nexteinstein.org.