If you have the desire to turn your passion for computers into a career, IT and Data Engineer. This program will help you to analyse and Visualised Data with tools such as Power BI and Tableau. This training will involve Live projects, interview questions, Career and CV building
Course Outline
Data engineering is a software engineering practice with a focus on the design, development, and production of data processing systems. It includes all the practical aspects of data acquisition, transfer, transformation, and storage on-prem or in the cloud.
This intensive hands-on training course teaches the students how to apply Python to the practical aspects of data engineering and introduces the students to the popular Python libraries used in the field, including NumPy, pandas, Matplotlib, sci-kit-learn, and Apache Spark.
Topics
- Data engineering practice
- High-octane introduction to Python
- Technical reviews of NumPy, pandas, and other Python libraries and data processing systems
- Data visualization and exploratory data analysis
- Data repairing and normalization
- Understanding the data needs and requirements of Machine Learning and Data Science projects
- Python in the Cloud
- Python on Hadoop and Spark (PySpark)
Audience
Developers, Software Engineers, Data Scientists, and IT Architects
Prerequisites
Participants are expected to have practical experience coding in one or more modern programming languages. Knowledge of Python is desirable but not necessary. The students are expected to be able to quickly learn the new material, reinforce the knowledge of a learned topic by doing programming exercises (labs), and then apply their knowledge in data engineering mini-projects.
Duration: 2 Weeks
Data Engineering with Python Training
Defining Data Engineering
- Introduction to Power BI and Tableau
- Translating Data into Operational and Business Insights
- What is Data Engineering
- The Data-Related Roles
- The Data Science Skill Sets
- The Data Engineer Role
- Core Skills and Competencies
- An Example of a Data Product
- What is Data Wrangling (Munging)?
- The Data Exchange Interoperability Options
- Summary
Dashboard and Stories
- Dashboard Objects
- Dashboard interactivity using Action
- Dashboard story points
Quick Introduction to Python for Data Engineers
- What is Python?
- Additional Documentation
- Which version of Python am I running?
- Python Dev Tools and REPLs
- IPython
- Jupyter
- Jupyter Operation Modes
- Jupyter Common Commands
- Anaconda
- Python Variables and Basic Syntax
- Variable Scopes
- PEP8
- The Python Programs
- Getting Help
- Variable Types
- Assigning Multiple Values to Multiple Variables
- Null (None)
- Strings
- Finding Index of a Substring
- String Splitting
- Triple-Delimited String Literals
- Raw String Literals
- String Formatting and Interpolation
- Boolean
- Boolean Operators
- Numbers
- Looking Up the Runtime Type of a Variable
- Divisions
- Assignment-with-Operation
- Dates and Times
- Comments:
- Relational Operators
- The if-elif-else Triad
- An if-elif-else Example
- Conditional Expressions (a.k.a. Ternary Operator)
- The While-Break-Continue Triad
- The for Loop
- try-except-finally
- Lists
Data Processing Phases
- Typical Data Processing Pipeline
- Data Discovery Phase
- Data Harvesting Phase
- Data Priming Phase
- Exploratory Data Analysis
- Model Planning Phase
- Model Building Phase
- Communicating the Results
- Production Roll-out
- Data Logistics and Data Governance
- Data Processing Workflow Engines
- Apache Airflow
- Data Lineage and Provenance
- Apache NiFi
- Summary