Welcome to the Nanodegree program

The Skills That Set You Apart

The Data Science Process

Learn the data science process, including how to build effective data visualizations, and how to communicate with various stakeholders

Communicating to Stakeholders

Project Write A Data Science Blog Post

In this project, learners will choose a dataset, identify three questions, and analyze the data to find answers to these questions. They will create a GitHub repository with their project, and write a blog post to communicate their findings to the appropriate audience. This project will help learners reinforce and extend their knowledge of machine learning, data visualization, and communication.

Introduction to Software Engineering

In this lesson, you’ll write production-level code and practice object-oriented programming, which you can integrate into machine learning projects.

Software Engineering Practices Pt I

Software Engineering Practices Pt II

OOP

Portfolio Exercise: Upload a Package to PyPi

Web Development

Portfolio Exercise: Deploy a Data Dashboard

Introduction to Data Engineering

ETL Pipelines

Introduction to NLP

Learn Natural Language Processing one of the fields with the most real applications of Deep Learning

Machine Learning Pipelines

Disaster Response Pipeline

Project1: Disaster Response Pipeline

Concepts in Experiment Design

Statistical Considerations in Testing

Statistical Considerations in Testing

AB Testing Case Study

A/B Testing Case Study

Portfolio Exercise Starbucks

Introduction to Recommendation Engines

Matrix Factorization for Recommendations

Recommendation Engines

Upcoming Lesson

Sentiment Prediction RNN

Convolutional Neural Networks

Transfer Learning

Weight Initialization

Autoencoders

Job Search

Find your dream job with continuous learning and constant effort

Refine Your Entry-Level Resume

Craft Your Cover Letter

Optimize Your GitHub Profile

Develop Your Personal Brand

04. How to Tackle the Exercises

This course assumes you have experience manipulating data with the Pandas library, which is covered in the data analyst nanodegree. Some of these transformation exercises are challenging. The most challenging exercises are marked (challenging). If an exercise is marked as a challenge, it means you’ll get something out of solving it, but it’s not essential for understanding the lesson material or for getting through the final project at the end of this data engineering course.

Throughout the exercises, you might have to read the pandas documentation or search outside the classroom for how to do a certain processing technique. That is not just expected but also encouraged. As a data scientist professional, you will oftentimes have to research how to do something on your own much like what software engineers do. See this answer on Quora about how often do people use stackoverflow when working on data science projects?.

Use Google and other search engines when you’re not sure how to do something!

 

 

What You Will do in the Next Section

In the next section of the lesson, you’ll learn about the extract portion of an ETL pipeline. You’ll get practice with a series of exercises. These exercises are relatively brief and focus on extracting, or in other words, reading in data from different sources. The goal is to familiarize yourself with different types of files and see how the same data can be formatted in different ways.

For a review of pandas, click on the “Extracurricular” section of the classroom. Open the Prerequisite: Python for Data Analysis course, and go to Lesson 7: Pandas.