As part of the Capacity Building component of the Big Data for Development (BD4D) project, AIMS-NEI will be running a series of short courses at AIMS Rwanda. In order to kick off this program, from 11-15 March 2019, we will launch our first BD4D short course entitled “Big Data Analytics with Python”, which targets people with passion in Data Science in general and in particular in Big Data, having at least a 4 years undergraduate degree or 2 to 3 years of work experience as a professional in Statistics or any other Data Science related topic.
Data sets are becoming increasingly large as the world population and things become more and more connected. Traditional data processing software and techniques cannot deal with these large-scale datasets. Thus, it requires specialized frameworks and tools such as Apache Spark to deal with large data sets. This course teaches the essential basics of processing large scale data sets using Python. In addition, the course also teaches how to perform common data science tasks such data wrangling and building machine learning models in Python.
This course takes a practical approach to equip participants with the most essential tools in the shortest possible time. The lessons start with absolute basics of Python focusing mainly on data structures, and then quickly jump into core libraries for doing data science in Python. Next, the course moves to Big Data by first providing brief theoretical concepts about the topic and then teaches Apache Spark-a state of the art tool for processing large datasets. Thereafter, it provides introductory lectures in machine learning before moving on to show you how to build these algorithms in Python. The course emphasizes learning by doing, as such, they are a lot of exercises built into the course to give participants ample time to practice.
Summarized Learning Objectives
- Understand the basics of Python language: syntax, data structures and more.
- Perform common data science tasks using Python: data ingestion, processing, visualization, web scraping etc.
- Handle large scale data set (20 gb+) on a personal computer using Apache Spark and utilize cloud computing platforms.
- Be familiar with theoretical basis of common machine learning algorithms.
- Be able to build and evaluate machine learning models using scikit-learn library
Day 1: Introduction to Python-focuses on the Python language.
Day 2: Doing Common Data Science Tasks in Python: data ingestion, processing, analysis, visualization, web scraping and more.
Day 3: Processing large data sets with Apache Spark: introduction to Big Data, Multiprocessing in Python, Apache Spark and more
Day 4: Machine learning in Python: Lecture in machine learning, building and evaluating models with scikit-learn
Day 5: Putting it all together- Review, advanced tasks in Python and extended exercises.