Skip to navigation
Skip to navigation
Skip to search form
Skip to login form
Skip to footer
Skip to main content
Accessibility options
Accessibility profiles
Visual impairment
Seizure and epileptic
Color vision deficiency
ADHD
Learning
Content adjustments
Readable font
Highlight titles
Highlight links
Stop animations
Text size
+
+ +
+ + +
Line height
+
+ +
+ + +
Text spacing
+
+ +
+ + +
Color adjustments
Dark contrast
Light contrast
High contrast
High saturation
Low saturation
Monochrome
Orientation adjustments
Reading guide
Reading Mask
Big black cursor
Big white cursor
Email: it@huph.edu.vn
Email: it@huph.edu.vn
Các khóa học
Đổi giao diện
Giao diện cũ
Giao diện mới
en
English
Data Science Courses
Learning Pentaho
0 students
Last updated
Feb 2024
Enrol now
Overview
Course content
Instructors
About the course
Show more...
Course content
Sections:
22
•
Activities:
0
•
Resources:
71
Expand all
Section 1
1. Introduction
1. Welcome to the course
2. Course Resources
Section 2
2. Pentaho Data Integration (PDI) Installation and Setup
1. Setting up environment and installing PDI
2. Opening Spoon - The Graphical UI
Section 3
3. A Simple ETL Demonstration
1. The example problem statement
2. Demonstration of a PDI transformation
3. Demonstration of a PDI Job
Section 4
4. Basic concepts - Theory for foundational understanding
1. What is ETL
2. Data Warehouse, Ops Database and Data mart
3. Inmon vs Kimball Architecture
4. ETL vs ELT
Section 5
5. The ETL process The practical part begins here
1. Data and the ETL process
Section 6
6. DATA EXTRACTION Extracting tabular data
1. Manually entering data into PDI
2. Inputting Data from a TXT (text) file
3. Input from multiple CSV files at the same time
4. Inputting Data from an Excel file
5. Extracting Data from Zipped files
Section 7
7. DATA EXTRACTION Extracting non-tabular data
1. Extracting from XML
2. Extracting from JSON
Section 8
8. Extracting from an SQL table
1. Plan for importing sales Data
2. Installing and setting up PostgreSQL
3. Creating Sales table in SQL
4. Extracting from an SQL table
Section 9
9. Storing and Retrieving Data from Cloud storage
1. Storing Data on AWS S3
2. Reading data from AWS S3
Section 10
10. Merging Data Streams
1. Concepts Merging Data Streams
2. Sorted Merge Step - Merging customer data
3. Merging product data
4. Append data stream - merging sales data
Section 11
11. Data Cleansing
1. Introduction to Data Cleansing
2. Value Mapper Step
3. Replace in String Step
4. Fuzzy Match concepts
5. Fuzzy Match Step in PDI
6. Fuzzy Match Algorithms
7. Formula Step and changing data format
8. Common Data Cleaning Steps
Section 12
12. Data Validation
1. Introduction to Data validation
2. Data_validation 1 - String-to-Int and integer range validations
3. Data validation 2 - Checking Reference Values using stream look-up
4. Data validation 3 - Order date shipping date using calculator step
5. Common Data Validation steps
Section 13
13. Error Handling
1. Correcting the errors and merging with main stream
2. Writing the errors to the log
3. Writing the errors to a separate file
Section 14
14. Transformation and Analytics steps
1. Concatenating Address Fields
2. Data Aggregation using Group-by
3. Normalization and Denormalization
4. Number Range Step
Section 15
15. PDI SQL Connection
1. Introduction to PDI - SQL connection
2. Reading and filtering data from DB into PDI
3. Updating and Inserting data into DB from PDI
4. Deleting data from SQL DB using PDI
Section 16
16. Conceptual understanding for Loading Data
1. Facts and Dimensions tables
2. Surrogate Keys in Dimension tables
3. Type 1 & 2 Slowly Changing Dimensions.
4. Schemas
Section 17
17. Loading the data into a Data Mart
1. Creating tables in DB
2. Loading Customer Data using combination lookup update step
3. Loading product data using dimension lookup step
4. Loading sales data after database lookup steps
Section 18
18. Running Java and Javascript
1. Scripting Steps
Section 19
19. PDI Jobs
1. PDI Jobs vs Transformation
2. Controlling the flow of execution
3. Setting variables using set variables step
4. File and Folder Management
5. Sending Email Step
6. Abort Job Step
Section 20
20. Scheduling a job for production environment
1. Running using command prompt and scheduling
Section 21
21. Metadata injection
1. Metadata injection
Section 22
22. Regex Notation
1. Regular Expressions for advanced String Matching
Instructors
Enrolment options
Learning Pentaho
Course modified date:
13 Feb 2024
Enrolled students:
There are no students enrolled in this course.
Guests cannot access this course. Please log in.
Continue
Enrol now
This course includes
Resources
Share this course
Scroll to top
×
Close
×
Close