0% found this document useful (0 votes)
32 views6 pages

Ciencia Datos Corner

Data Science bootcamp Spanish

Uploaded by

Arias Saraeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

Ciencia Datos Corner

Data Science bootcamp Spanish

Uploaded by

Arias Saraeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Translated by Google

MODULAR DEVELOPMENT

TRAINING MODULE 1: SUPPORT SYSTEMS FOR DECISION MAKING AND


DATA MANAGEMENT

AIM

Develop computer applications to perform basic data processing with the Python language,
identifying methods for exploring large pools of data and relational and non-relational data
management systems (NoSQL).

DURATION IN ANY MODALITY OF DELIVERY: 100 hours

Teletraining: Duration of face-to-face tutorials: 0 hours

LEARNING OUTCOMES

Knowledge/cognitive and practical abilities

• Characterization of the application of the Python language


- Python language
- Running Python programs
- Objects in Python
- Numeric and dynamic types
- Management of text strings: lists, dictionaries, tuples and files
- Python statements: assignments, expressions and printing results
- Variable tests, syntax rules
- For and while loops

• Interpretation of the application of API protocols


- Use of remote APIs
- Integration of applications with remote APIs
- Examples of application of remote APIs in Python language

• Programming a modular algorithm in Python language


- Module programming
- Class scheduling fundamentals
- Use of APIs and integration with Python applications

• Distinction of basic Cloud concepts


- Principles of cloud computing (Cloud Computing)
- Service engineering: software as a service, Platform as a service, Infrastructure as a
Service
- Examples of relevant applications in the industry

• Use of NoSQL DBs and new data models (structured and unstructured)
- Fundamentals of the NoSQL paradigm
- Data distribution and parallel processing
- Main data models in the NoSQL world: key-value, orientation to
documents, property graphs, knowledge graphs

• Knowledge of Big Data storage and massive processing tools


- Applications based on the management and analysis of large volumes of data
7
Machine Translated by Google

- Architectural foundations of distributed systems


- Main reference architectures
- New data models
- Distributed file systems
- Document stores
- Graph databases

• Evaluation of the methodologies and techniques applied in solving problems and


justification of the approaches, decisions and proposals made
- Decision-making support systems
- Data analysis: descriptive, predictive and prescriptive analysis
- Use cases: management and analysis of large volumes of data

• Identification of the key factors of a complex problem in the context of a project


of analytics.

- Context of the data society/economy and the applications paradigm


data oriented
- Fundamentals of relational databases: SQL language.
- Need for a paradigm shift: NoSQL. The 'one size does not fit all' principle.
- Main data models in the NoSQL world: Key-Value, Document-oriented, Property Graphs and
Knowledge Graphs
- Architectural fundamentals: distributed systems, scalability, parallelism.
Main reference architectures (shared nothing, shared disk, shared memory)

• Distinction and application of new data models


- Distributed file systems: concepts and principles (distribution, replication, horizontal vs.
Vertical partitioning, specialized file formats)
- Knowledge and use of Hadoop File System (HDFS), Apache Avro, Apache Parquet, Key-
value stores: Apache HBase
- Document stores: concepts and principles (replication mechanisms, sharding, queries
spatial)
- Immersion in MongoDB and the Aggregation Framework
- Graph databases: property and knowledge graphs. Concepts and principles Modeling
in graph, regular queries. Introduction to Neo4j and Cypher
- Knowledge graphs. Concepts and principles: the open / linked data paradigm, RDF and
SPARQL. Introduction to GraphDB

• Identification and analysis of complex problems in the area of data analysis and
solution approach
- Main concepts of data processing flows in large-scale systems
volume
- Main phases of managing large volumes of data and associated challenges
- Data engineer roles in the main phases of data management
- Main limitations of traditional data management models
- New data models

• Planning and execution of a data analysis work with a methodological proposal


- Definition of a set of starting data and a series of business needs that require data aggregation,
external data capture, a process
ETL, data analysis and a final visualization of the results obtained
- Implementation of a distributed file system
- Using Hadoop to store a set of social network activity data.
Storing a data set in an HDFS environment
8
Machine Translated by Google

- Graph modeling: storing a set of data in a database


documentary or graph-oriented.

• Choosing a suitable repository for the problem data and defining a


storage strategy.
- Data life cycle: database design, data flow manager, architecture of data extraction, loading and transformation systems
and distributed storage and processing systems - Data management: model limits relational and data distribution

Management, personal and social skills

• Effectiveness in solving complex problems in the development of knowledge to design prototypes of software solutions in
Python in phases without losing sight of the complexity of the global problem.

• Ability to analyze the important elements of the development project of a


data management solution.
• Development of critical thinking and reasoning of the various techniques to be applied within the framework of the problem to
be solved, balancing the complexity of the solution and its real functioning.

• Identification of the tools to be applied and their cost and the needs of the data cycle
required.
• Development of a positive attitude towards learning and continuous improvement, with the objective of knowing and reviewing
the suppliers of the tools and the installation and updating methods.

• Demonstration of initiative and autonomy in the presentation of prototypes and discussion of problems and solutions to be
discussed in a group, reviewing requirements and their costs.

TRAINING MODULE 2: DATA MANAGEMENT AND PROCESSING

AIM

Identify data management principles for a project with multiple input sources and apply data
model organization techniques from a logical and physical point of view.

DURATION IN ANY MODALITY OF DELIVERY: 80 hours

Teletraining: Duration of face-to-face tutorials: 0 hours

LEARNING OUTCOMES

Knowledge/cognitive and practical abilities


• Critical evaluation of the methodologies and techniques to be applied in solving problems and
justification of the approaches, decisions and proposals made
- Data management fundamentals for a project with multiple input sources
of data
- Techniques for organizing data models from a logical and physical point of view

• Identification of data flows and ETL (Extract Transform Load)


- Fundamentals of Data Warehousing and Business Intelligence
- OLAP concepts and information extraction
- ETL process: extraction, transformation and loading of data
9
Machine Translated by Google

- Types of flows and operations


- Data cleaning
- Data quality
- Application examples

• Design of an ETL process and a multidimensional analysis model.


- Multidimensional modeling
- DFM: Dimensional Fact Model
- Star scheme and derivatives
- OLAP operators
- Implementation of cubes and OLAP operators in relational environments
- Multidimensional modeling tools

• Design of a data load to a NoSQL repository and basic data analysis


using spark
- Design, implementation and maintenance of Date Lake solutions. Concepts and principles
(schema-on-write vs. schema-on-read). Data modeling and governance
- Concepts and principles of distributed data processing (declarative vs. non-declarative
solutions)
- Distributed data processing models: Disk-based and main memory-based

- MapReduce and Apache Spark


- Real-time data processing (streaming). Concepts and principles (models, time windows, time
queries). Stream query languages.
Introduction to streaming tools: Apache Kafka, Apache Spark Streaming
- BigData architectures: Lambda, Kappa and orchestrators. Workflow management tools:
Apache Airflow

• Identification of the key factors of a complex problem in the context of a project


of analytics.

- ETL design and implementation project with NoSQL tools


- Batch data incorporation process with Apache tools.
- Data analysis and data extraction for business model from the set of
data with Spark
- Data analysis with Apache Spark
- Reading and exporting data
- Data quality review
- Filters and data transformations
- Data processing to obtain summaries and groupings
- Combinations, partitions and reformulation of data.
- Configuration, monitoring and error management of Spark applications
Management, personal and social skills

• Demonstration of a critical attitude of strategic thinking, presenting data processing schemes and
allowing discussion with interest groups internal and external to the company to formulate future-
oriented actions.
• Development of design and data analysis activities with social responsibility, intellectual honesty and
scientific integrity.
• Awareness of the need for a responsible attitude committed to results and the limitation of available
resources when making decisions in complex professional environments.

• Assessment of the importance of adaptation to cost, availability, development or implementation


time constraints in the review of an initial data management design.
10
Machine Translated by Google

TRAINING MODULE 3: MACHINE LEARNING AND VISUALIZATION

AIM

Apply the fundamentals of machine learning and visualization to analyze the results of data
processing.

DURATION IN ANY MODALITY OF DELIVERY: 130 hours

Teletraining: Duration of face-to-face tutorials: 0 hours

LEARNING OUTCOMES

Knowledge/cognitive and practical abilities


• Identification of the fundamentals of data analysis and machine learning (Machine
Learning)
- Typology of tasks and learning algorithms (supervised, unsupervised, semi-
supervised)
- Main learning methods
- Validation and evaluation of results

• Distinction of classifier methods.


- Predictive models
- Unsupervised methods. Hierarchical grouping. Partitional clustering (k-means
and derivatives). Dimensionality reduction (PCA and others)
- Supervised methods. K-NN. Decision trees. SVM. Neural networks
- Validation and evaluation of results

• Application of machine learning techniques and the integration of various sources of information
data
- Analysis of sentiments and polarity on the set of tweets collected.
- Construction of a profile analysis using clustering algorithms
unsupervised (clustering).
- Implementation of a polarity analysis (sentiment analysis) on the set of
collected messages.
- Implementation of two alternative approaches to compare the performance obtained:
Dictionary-based approach. Vectorization approach (Word2Vec) and use of a supervised
machine learning model.

• Design, development and evaluation of machine learning methods.


- Data processing

- Machine Learning Fundamentals


- Typology of tasks and learning algorithms
- Validation and evaluation of results

• Design and development of dashboards.


- Data visualization principles.
- Design of control panels and dashboards to define alarms and transmit results
- Integration of visualization with analysis tools and data queries
- Visual and written documentation of the results of data analytics projects
for non-specialized audiences
eleven
Machine Translated by Google

• Using a data visualization tool to design and upload data to a dashboard

- Data visualization tools: Grafana, MS PowerBar, Tableau


- Visualization of business queries and results dashboard in data visualization tools

• Choice, application and quality evaluation of a machine learning algorithm


for a given problem from a set of data.
- Text processing (NLP)
- Dictionary-based polarity analysis
- Analysis based on supervised predictive models
- Feature extraction (Word2Vec)

Management, personal and social skills

• Use of communication skills with interest groups to show the most relevant aspects of the results obtained in the results
of the process and their adaptation to the needs of the project.

• Application of innovative solutions and adaptation to changing environments. • Capacity for continuous
development of projects and communication of results and
decisions with visualization techniques and tools
• Coordination and communication with specialists, non-specialists, supervisors and clients
with the use of communication tools for the design of relevant information on the key aspects of the application.

EVALUATION OF LEARNING IN THE TRAINING ACTION

• The evaluation will have a theoretical-practical nature and will be carried out systematically and continuously,
during the development of each module and at the end of the course.

• It may include an initial diagnostic evaluation to detect the starting level of the
student body.

• The evaluation will be carried out using the most appropriate methods and instruments to verify the different learning results,
and that guarantee its reliability and validity.

• Each evaluation instrument will be accompanied by its corresponding correction and scoring system in which the
measurement criteria to evaluate the results achieved by the students are explained, clearly and unequivocally.

• The final score achieved will be expressed in terms of Pass/No Pass.

12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy