As part of the capacity development pillar of the Big Data for Development project, AIMS-NEI designed, a Big Data for Development Short Course Program (BD4D-SCP) taught across the AIMS-NEI network, first in Rwanda and now in Senegal.
The course targets people with passion in Data Science in general and in particular in Big Data Analytics, having at least a 4 years undergraduate degree or a minimum 2 to 3 years of work experience as a professional in Statistics or any other Data Science related topic.
A number of short courses are in the pipeline to achieve our BD4D objective to increase the number of data scientists in Africa and provide a platform for all practitioners to interact. AIMS-NEI will also organize the First Leveraging the Power of Big Data (LPBD) Executive Education workshop. The goal of the workshop is to introduce C-level executives in the era of big data, demonstrating how big data is disrupting traditional enterprises and opens the door for new products and services.
Datasets are becoming increasingly large as the world population and things become more and more connected. Traditional data processing software and techniques cannot deal with these large-scale datasets. Thus, you need specialized frameworks and tools such as Apache Spark to deal with large datasets. This course teaches the essential basics of processing large scale datasets using Python. In addition, the course also teaches you how to perform common data science tasks such data wrangling and building machine learning models in Python. This course takes a practical approach to equip participants with the most essential tools in the shortest possible time. The course emphasizes learning by doing, as such, they are a lot of exercises built into the course to give participants ample time to practice.
This course takes a practical approach to equip participants with the most essential tools in the shortest possible time. The lessons start with absolute basics of Python focusing mainly on data structures, and then quickly jump into core libraries for doing data science in Python. Next, the course moves to Big Data by first providing brief theoretical concepts about the topic and then teaches Apache Spark-a state of the art tool for processing large datasets. Thereafter, it provides introductory lectures in machine learning before moving on to show participants how to build these algorithms in Python. The course emphasizes learning by doing.
Summarized Course Objectives
- Understand advanced concepts of Python language: data structures, functions, classes and more.
- Perform common data science tasks using Python: data ingestion, processing, visualization, web scraping etc.
- Handle large scale data set (20gb+) on a personal computer using Apache Spark and utilize cloud computing platforms.
- Be familiar with theoretical basis of common machine learning algorithms.
- Be able to build and evaluate machine learning models using scikit-learn library
Advanced Concepts in Python: on this first day, the course will focus on Python language to build strong foundation for the rest of the course materials. Participants will be introduced to intermediate to advanced level practical techniques such as writing functions, classes, error handling, packaging python code and more.
Python for Data Science: during the second day, the focus is performing common data science tasks using Python. We will go through how to do data ingestion, processing, analysis, visualization, web scraping and more using Python and along the way introduce essential packages (e.g., pandas, geopandas, numpy, matplotlib etc.) for doing these tasks.
Big Data Processing: on the third day, the course focuses on how to handle large data sets as using Python. The following topics will be covered: introduction to big data, multiprocessing in Python, Apache Spark, how to use common cloud platforms and more.
Machine Learning (ML) in Python: on this day, the course will first provide an introductory lecture on machine learning. The rest of the day will focus on how to perform various ML tasks (e.g., data preparation, model building, evaluation and interpretation) using the scikit-learn package in Python.
Putting it All Together: during the final day, we will focus on using the skills gained in this course to solve real life data science problems by looking at case studies. Potential case studies to be covered include: how to process nigh lights satellite images (geospatial), how to process massive call records from cellphones (mobile data) and how build ML models to impute missing sensor data (sensor data).
- Programming: ability to write a simple program in Python (basic Python level)
- Math and Statistics: a background in statistics, data science, or any quantitative sciences.