Oracle Big Data Fundamentals (D86898) – Outline
Detailed Course Outline
Introduction
- Questions About You
- Course Objectives
- Course Road Map
- Oracle Big Data Lite (BDLite) Virtual Machine (VM) Home Page
- Starting the Oracle BDLite VM and accessing the Practice Files
- Reviewing the Available Big Data Documentation, Tutorials, and Other Resources
Introducing Oracle Big Data Strategy
- Characteristics of Big Data
- Importance of Big Data
- Big Data Opportunities: Some Examples
- Big Data Challenges
- Big Data implementation examples
- Oracle strategy for Big Data: combining Big Data Processing Engines: Hadoop / NoSQL / RDBMS
Using Oracle Big Data Lite Virtual Machine and Movieplex Application
- Oracle Big Data Lite VM Used in this Course
- Oracle Big Data Lite VM Home Page Sections
- Reviewing the Deployment Guide
- Downloading and installing Oracle VM VirtualBox and its Extension Pack
- Downloading and Running 7-zip Files to create Virtual Box Appliance File
- Importing the Appliance File
- Staring the Big Data Lite VM and Starting and Stopping Services
- Introducing the Oracle Movieplex Case Study
Introduction to the Big Data Ecosystem
- Computer Clusters and Distributed Computing
- Apache Hadoop
- Types of Analysis That Use Hadoop
- Types of Data Generated
- Apache Hadoop Core Components: HDFS, MapReduce (MR1), and YARN (MR2)
- Apache Hadoop Ecosystem
- Cloudera’s Distribution Including Apache Hadoop (CDH)
- CDH Architecture and Components
Introduction to the Hadoop Distributed File System
- Hadoop Distributed Filesystem (HDFS) Design Principles, Characteristics, and Key Definitions
- Sample Hadoop High Availability (HA) Cluster
- HDFS Files and Blocks
- Active and Standby Daemons (Services) Functions
- DataNodes (DN) Daemons Functions
- Writing a File to HDFS: Example
- Interacting With Data Stored in HDFS: Hue, Hadoop Client, WebHDFS, and HttpFS
Acquire Data using CLI, Fuse, Flume, and Kafka
- Reviewing the Command Line Interface (CLI)
- Viewing File System Contents Using the CLI
- FS Shell Commands
- Loading Data Using the CLI
- Overview of FuseDFS
- What is Flume?
- Kafka topics
- Additional Resources
Acquire and Access Data Using Oracle NoSQL Database
- What is a NoSQL Database
- RDBMS Compared to NoSQL
- HDFS Compared to NoSQL
- Define Oracle NoSQL Database
- Oracle NoSQL models: Key-Value and Table
- Acquiring and Accessing Data in a NoSQL DB
- Accessing the CLIs (Data, Admin, SQL)
- Accessing the KVStore
Introduction to MapReduce and YARN Processing Frameworks
- MapReduce Framework Features, Benefits, and Jobs
- Parallel Processing with MapReduce
- Word Count Examples
- Data Locality Optimization in Hadoop
- Submitting and Monitoring a MapReduce Job
- YARN Architecture, Features, and Daemons
- YARN Application Workflow
- Hadoop Basic Cluster: MapReduce 1 Versus YARN (MR 2)
Resource Management Using Yarn
- Job Scheduling in YARN
- First In, First Out (FIFO) Scheduler, Capacity Scheduler, and Fair Scheduler
- Cloudera Manager Resource Management Features
- Static Service Pools
- Working with the Fair Scheduler
- Cloudera Manager Dynamic Resource Management: Example
- Submitting and Monitoring a MapReduce Job Using YARN
- Using the YARN application Command
Overview of Apache Spark
- Benefits of Using Spark
- Spark Architecture
- Spark Application Components: Driver, Master, Cluster Manager, and Executors
- Running a Spark Application on YARN (yarn-cluster Mode)
- Resilient Distributed Dataset (RDD)
- Spark Interactive Shells: spark-shell and pyspark
- Word Count Example by Using Interactive Scala
- Monitoring Spark Jobs Using YARN's ResourceManager Web UI
Overview of Apache Hive
- What is Hive?
- Use Case: Storing Clickstream Data
- Hadoop Architecture
- How is Data Stored in HDFS?
- Organizing and Describing Data With Hive
- Big Data SQL on Top of Hive Data
- Defining Tables Over HDFS
- Hive Queries
Overview of Cloudera Impala
- Overview of Cloudera Impala
- Hadoop: Some Data Access/Processing Options
- Cloudera Impala
- Cloudera Impala: Key Features
- Cloudera Impala: Supported Data Formats
- Cloudera Impala: Programming Interfaces
- How Impala Fits Into the Hadoop Ecosystem
- How Impala Works with Hive
Using Oracle XQuery for Hadoop
- XML Review
- Oracle XQuery for Hadoop (OXH)
- OXH Features
- OXH Data Flow
- Using OXH: Installation, Functions, Adapters, and Configuration Properties
- Running an OXH Query
- XQuery Transformation and Basic Filtering
- Viewing the Completed Query in YARN's ResourceManager
Overview of Solr
- Overview of Solr
- Apache Solr (Cloudera Search)
- Cloudera Search: Key Capabilities
- Cloudera Search: Features
- Cloudera Search Tasks
- Indexing in Cloudera Search
- Types of Indexing
- The solrctl Command
Integrating Your Big Data
- Unifying Data: A Typical Requirement
- Comparing Big Data Processing Engines
- Introducing Data Unification Options
- When To Use These Options?
Batch Loading Options
- Apache Sqoop
- Oracle Loader for Hadoop
- Oracle Copy to Hadoop
Using Oracle SQL Connector for HDFS
- Batch and Dynamic Loading: Oracle SQL Connector for HDFS
- OSCH Architecture
- Using OSCH
- Features
- Parallelism and Performance
- Performance Tuning
- Key Benefits
- Loading: Choosing a Connector
Using Oracle Data Integrator and Oracle GoldenGate for Big Data
- ETL and Synchronization: Oracle Data Integrator
- ODI’s Declarative Design
- ODI Knowledge Modules (KMs)Simpler Physical Design / Shorter Implementation Time
- Using ODI with Big Data Heterogeneous Integration with Hadoop Environments
- Using ODI Studio
- ODI Studio Components: Overview
- ODI Studio: Big Data Knowledge Modules
- Oracle GoldenGate for Big Data
Using Oracle Big Data SQL
- Barriers to Effective Big Data Adoption
- Overcoming Big Data Barriers
- Oracle Big Data SQL: The Hybrid Solution
- Benefits: Virtualizes data access across Oracle Database, Hadoop and NoSQL stores
- Using Oracle Big Data SQL
- Query Performance Overview
- Deployment Options
Using Oracle Big Data Spatial and Graph
- Graph and Spatial Analysis: All About Relationships
- What is Oracle Big Data Spatial and Graph (BDSG)?
- Strategy (supported platforms, etc)
- BDSG: Graph Analysis
- Oracle BDSG: Spatial Analysis
- Multimedia Analytics Framework
- Deployment Options for Oracle BDSG
- Additional Resources
Using Oracle Advanced Analytics
- Oracle Advanced Analytics (OAA)
- OAA: Oracle Data Mining
- OAA: Oracle R Enterprise
Oracle Big Data Deployment Options
- Introduction to the Oracle Big Data Appliance
- Running the Oracle BDA Configuration Generation Utility
- Oracle BDA Mammoth Software Deployment Bundle
- Using the Oracle BDA mammoth Utility
- BDA Hardware and Integrated and Optional Software
- Administering and Securing the Oracle BDA
- Introduction to the Oracle Big Data Cloud Service
- Introduction to the Oracle Big Data Cloud Service – Compute Edition