Skip to navigation (Press Enter)
Skip to search (Press Enter)
Skip to course offerings (Press Enter)
Skip to content (Press Enter)

+20 12 2798 3655 Contact

Oracle

Oracle Big Data Fundamentals (D86898) – Outline

Detailed Course Outline

Introduction

Questions About You
Course Objectives
Course Road Map
Oracle Big Data Lite (BDLite) Virtual Machine (VM) Home Page
Starting the Oracle BDLite VM and accessing the Practice Files
Reviewing the Available Big Data Documentation, Tutorials, and Other Resources

Introducing Oracle Big Data Strategy

Characteristics of Big Data
Importance of Big Data
Big Data Opportunities: Some Examples
Big Data Challenges
Big Data implementation examples
Oracle strategy for Big Data: combining Big Data Processing Engines: Hadoop / NoSQL / RDBMS

Using Oracle Big Data Lite Virtual Machine and Movieplex Application

Oracle Big Data Lite VM Used in this Course
Oracle Big Data Lite VM Home Page Sections
Reviewing the Deployment Guide
Downloading and installing Oracle VM VirtualBox and its Extension Pack
Downloading and Running 7-zip Files to create Virtual Box Appliance File
Importing the Appliance File
Staring the Big Data Lite VM and Starting and Stopping Services
Introducing the Oracle Movieplex Case Study

Introduction to the Big Data Ecosystem

Computer Clusters and Distributed Computing
Apache Hadoop
Types of Analysis That Use Hadoop
Types of Data Generated
Apache Hadoop Core Components: HDFS, MapReduce (MR1), and YARN (MR2)
Apache Hadoop Ecosystem
Cloudera’s Distribution Including Apache Hadoop (CDH)
CDH Architecture and Components

Introduction to the Hadoop Distributed File System

Hadoop Distributed Filesystem (HDFS) Design Principles, Characteristics, and Key Definitions
Sample Hadoop High Availability (HA) Cluster
HDFS Files and Blocks
Active and Standby Daemons (Services) Functions
DataNodes (DN) Daemons Functions
Writing a File to HDFS: Example
Interacting With Data Stored in HDFS: Hue, Hadoop Client, WebHDFS, and HttpFS

Acquire Data using CLI, Fuse, Flume, and Kafka

Reviewing the Command Line Interface (CLI)
Viewing File System Contents Using the CLI
FS Shell Commands
Loading Data Using the CLI
Overview of FuseDFS
What is Flume?
Kafka topics
Additional Resources

Acquire and Access Data Using Oracle NoSQL Database

What is a NoSQL Database
RDBMS Compared to NoSQL
HDFS Compared to NoSQL
Define Oracle NoSQL Database
Oracle NoSQL models: Key-Value and Table
Acquiring and Accessing Data in a NoSQL DB
Accessing the CLIs (Data, Admin, SQL)
Accessing the KVStore

Introduction to MapReduce and YARN Processing Frameworks

MapReduce Framework Features, Benefits, and Jobs
Parallel Processing with MapReduce
Word Count Examples
Data Locality Optimization in Hadoop
Submitting and Monitoring a MapReduce Job
YARN Architecture, Features, and Daemons
YARN Application Workflow
Hadoop Basic Cluster: MapReduce 1 Versus YARN (MR 2)

Resource Management Using Yarn

Job Scheduling in YARN
First In, First Out (FIFO) Scheduler, Capacity Scheduler, and Fair Scheduler
Cloudera Manager Resource Management Features
Static Service Pools
Working with the Fair Scheduler
Cloudera Manager Dynamic Resource Management: Example
Submitting and Monitoring a MapReduce Job Using YARN
Using the YARN application Command

Overview of Apache Spark

Benefits of Using Spark
Spark Architecture
Spark Application Components: Driver, Master, Cluster Manager, and Executors
Running a Spark Application on YARN (yarn-cluster Mode)
Resilient Distributed Dataset (RDD)
Spark Interactive Shells: spark-shell and pyspark
Word Count Example by Using Interactive Scala
Monitoring Spark Jobs Using YARN's ResourceManager Web UI

Overview of Apache Hive

What is Hive?
Use Case: Storing Clickstream Data
Hadoop Architecture
How is Data Stored in HDFS?
Organizing and Describing Data With Hive
Big Data SQL on Top of Hive Data
Defining Tables Over HDFS
Hive Queries

Overview of Cloudera Impala

Overview of Cloudera Impala
Hadoop: Some Data Access/Processing Options
Cloudera Impala
Cloudera Impala: Key Features
Cloudera Impala: Supported Data Formats
Cloudera Impala: Programming Interfaces
How Impala Fits Into the Hadoop Ecosystem
How Impala Works with Hive

Using Oracle XQuery for Hadoop

XML Review
Oracle XQuery for Hadoop (OXH)
OXH Features
OXH Data Flow
Using OXH: Installation, Functions, Adapters, and Configuration Properties
Running an OXH Query
XQuery Transformation and Basic Filtering
Viewing the Completed Query in YARN's ResourceManager

Overview of Solr

Overview of Solr
Apache Solr (Cloudera Search)
Cloudera Search: Key Capabilities
Cloudera Search: Features
Cloudera Search Tasks
Indexing in Cloudera Search
Types of Indexing
The solrctl Command

Integrating Your Big Data

Unifying Data: A Typical Requirement
Comparing Big Data Processing Engines
Introducing Data Unification Options
When To Use These Options?

Batch Loading Options

Apache Sqoop
Oracle Loader for Hadoop
Oracle Copy to Hadoop

Using Oracle SQL Connector for HDFS

Batch and Dynamic Loading: Oracle SQL Connector for HDFS
OSCH Architecture
Using OSCH
Features
Parallelism and Performance
Performance Tuning
Key Benefits
Loading: Choosing a Connector

Using Oracle Data Integrator and Oracle GoldenGate for Big Data

ETL and Synchronization: Oracle Data Integrator
ODI’s Declarative Design
ODI Knowledge Modules (KMs)Simpler Physical Design / Shorter Implementation Time
Using ODI with Big Data Heterogeneous Integration with Hadoop Environments
Using ODI Studio
ODI Studio Components: Overview
ODI Studio: Big Data Knowledge Modules
Oracle GoldenGate for Big Data

Using Oracle Big Data SQL

Barriers to Effective Big Data Adoption
Overcoming Big Data Barriers
Oracle Big Data SQL: The Hybrid Solution
Benefits: Virtualizes data access across Oracle Database, Hadoop and NoSQL stores
Using Oracle Big Data SQL
Query Performance Overview
Deployment Options

Using Oracle Big Data Spatial and Graph

Graph and Spatial Analysis: All About Relationships
What is Oracle Big Data Spatial and Graph (BDSG)?
Strategy (supported platforms, etc)
BDSG: Graph Analysis
Oracle BDSG: Spatial Analysis
Multimedia Analytics Framework
Deployment Options for Oracle BDSG
Additional Resources

Using Oracle Advanced Analytics

Oracle Advanced Analytics (OAA)
OAA: Oracle Data Mining
OAA: Oracle R Enterprise

Oracle Big Data Deployment Options

Introduction to the Oracle Big Data Appliance
Running the Oracle BDA Configuration Generation Utility
Oracle BDA Mammoth Software Deployment Bundle
Using the Oracle BDA mammoth Utility
BDA Hardware and Integrated and Optional Software
Administering and Securing the Oracle BDA
Introduction to the Oracle Big Data Cloud Service
Introduction to the Oracle Big Data Cloud Service – Compute Edition