As part of the capacity development pillar of the Big Data for Development project, AIMS-NEI designed, the Big Data for Development Short Course Program (BD4D-SCP) and training sessions are delivered across the AIMS-NEI network, in Rwanda, Senegal and now in Cameroon.
The course targets people with passion in Data Science in general and in particular in Big Data Analytics, having at least a 4 years’ undergraduate degree or a minimum 2 to 3 years of work experience as a professional in Statistics, Information and Technology or any other Data Science related topic.
A number of short courses are in the pipeline to achieve our BD4D objective to increase the number of data scientists in Africa and provide a platform for all practitioners to interact.
As the world population and things become more and more connected, datasets are becoming increasingly large, such that traditional data processing software and techniques cannot deal with these large-scale datasets. Thus, you need specialized frameworks and tools such as Apache Spark to deal with large datasets. This course teaches the essential basics of processing large scale datasets using Python. In addition, the course also teaches you how to perform common data science tasks such data wrangling and building machine learning models in Python. This course takes a practical approach to equip participants with the most essential tools in the shortest possible time. The course emphasizes learning by doing, as such, they are a lot of exercises built into the course to give participants ample time to practice.
Summarized Objectives & Outcomes
1. Understand intermediate to advanced concepts of the Python language: data structures, functions, classes and the python packages ecosystem
2. Perform data science tasks using Python: data ingestion, processing, visualization, web scraping etc.
3. Handle large scale dataset (20gb+) using Apache Spark: big data basics, Hadoop ecosystem, cloud computing platforms, big data processing with Apache Spark.
4. Be familiar with essential machine learning (ML) theory: the learning problem, types of learning, loss functions, linear models, deep learning and more.
5. Be able to build and evaluate machine learning models: use scikit-learn and TensorFlow to build and evaluate models using Python.
6. Appreciate real world ML and big data use cases: object detection in android devices, analyze large scale gps data for human mobility use case.
Day 1: Advanced Concepts in Python: on this first day, the course will focus on Python language to build strong foundation for the rest of the course materials. Participants will be introduced to intermediate to advanced level practical techniques such as writing functions, classes, error handling, packaging python code and more.
Day 2: Python for Data Science: during the second day, the focus is performing common data science tasks using Python. We will go through how to do data ingestion, processing, analysis, visualization, web scraping and more using Python and along the way introduce essential packages (e.g., pandas, geopandas, numpy, matplotlib etc.) for doing these tasks.
Day 3: Big Data Processing: on the third day, the course focuses on how to handle large data sets as using Python. The following topics will be covered: introduction to big data, multiprocessing in Python, Apache Spark, how to use common cloud platforms and more.
Day 4: Machine Learning (ML) in Python: on this day, the course will first provide an introductory lecture on machine learning. The rest of the day will focus on how to perform various ML tasks (e.g., data preparation, model building, evaluation and interpretation) using the scikit-learn package in Python.
Day 5: Putting it All Together: during the final day, we will focus on using the skills gained in this course to solve real life data science problems by looking at case studies. Potential case studies to be covered include: how to process nigh lights satellite images (geospatial), how to process massive call records from cellphones (mobile data) and how build ML models to impute missing sensor data (sensor data).