Detailed Course Outline
Introduction
- About This Course
- About Cloudera
- Course Logistics
- Introductions
Data Science Overview
- What Is Data Science?
- The Growing Need for Data Science
- The Role of a Data Scientist
Use Cases
- Finance
- Retail
- Advertising
- Defense and Intelligence
- Telecommunications and Utilities
- Healthcare and Pharmaceuticals
Project Lifecycle
- Steps in the Project Lifecycle
- Lab Scenario Explanation
Data Acquisition
- Where to Source Data
- Acquisition Techniques
Evaluating Input Data
- Data Formats
- Data Quantity
- Data Quality
Data Transformation
- File Format Conversion
- Joining Data Sets
- Anonymization
Data Analysis and Statistical Methods
- Relationship Between Statistics and Probability
- Descriptive Statistics
- Inferential Statistics
- Vectors and Matrices
Fundamentals of Machine Learning
- Overview
- The Three C’s of Machine Learning
- Importance of Data and Algorithms
- Spotlight: Naive Bayes Classifiers
Recommender Overview
- What is a Recommender System?
- Types of Collaborative Filtering
- Limitations of Recommender Systems
- Fundamental Concepts
Introduction to Apache Spark and MLlib
- What is Apache Spark?
- Comparison to MapReduce
- Fundamentals of Apache Spark
- Spark’s MLlib Package
Implementing Recommenders with MLlib
- Overview of ALS Method for Latent Factor Recommenders
- Hyperparameters for ALS Recommenders
- Building a Recommender in MLlib
- Tuning Hyperparameters
- Weighting
Experimentation and Evaluation
- Designing Effective Experiments
- Conducting an Effective Experiment
- User Interfaces for Recommenders
Production Deployment and Beyond
- Deploying to Production
- Tips and Techniques for Working at Scale
- Summarizing and Visualizing Results
- Considerations for Improvement
- Next Steps for Recommenders
Conclusion