Please ensure Javascript is enabled for purposes of website accessibility

Μάθημα : DECENTRALIZED SYSTEMS FOR BIG DATA MANAGEMENT AND DECISION MAKING

Κωδικός : CEID1175

CEID1175  -  Σπύρος Σιούτας

Μάθημα

The course’s aim is to introduce students to the following two pillars:(1) Foundations of Advanced Decentralized Computing Systems (2) Practical Overview of non-traditional software systems for big data management (with emphasis in Spark, Python, and PySpark).

Especially, it will focus on the following topics:

  1. Hashing, Bloom Filters, Internet Caching Protocols, Distributed Hash Tables.
  2. Decentralized Data Structures and P2P Systems, DHT-based Decentralized Systems (Chord).
  3. Block-Chain and Decentralized Applications (DAPPs): Hashing Data in the Real World, Storing Transaction Data, Using the Data Store, Protecting the Data Store, Distributing the Data Store Among Peers, Verifying and Adding Transactions, Choosing a Transaction History.
  4. Distributed File Systems (HDFS), Map/Reduce Programming Framework and NoSQL Databases, Cluster Architecture, Data Flow Systems, Spark, RDDs.
  5. Overview of Python for big data management: Introduction to libraries and tools (pandas, NumPy, etc.), Introduction to PySpark, Understanding PySpark's architecture and components
  6. Big Data Storage and Processing in Decentralized Systems: Batch processing (MapReduce, Spark), Stream processing (Spark Streaming, Flink)
  7. Large-Scale Machine Learning with PySpark
    • Introduction to Machine Learning
    • Large Scale Machine Learning
    • Introduction to MLlib
      • Overview of MLlib, PySpark's machine learning library
      • Algorithms supported by MLlib (regression, classification, clustering, etc.)
    • Distributed machine learning with PySpark

    8.Advanced Topics and Case Studies

    • Future trends in decentralized systems for big data: IoT and Cloud, Containers, Dockers, Kubernetes
      • Practical: Mini-project combining concepts from previous lectures

 

SYLLABUS:

Week #1: Hashing, Bloom Filters, Internet Caching Protocols, Distributed Hash Tables.

Week #2: Decentralized Data Structures and P2P Systems, DHT-based Decentralized Systems (Chord).

Week #3: Block-Chain and Decentralized Applications (DAPPs).

Week #4: HDFS, Map/Reduce Programming Framework and NoSQL Databases, Cluster Architecture, Data Flow Systems, Spark, RDDs.

Week #5: Overview of Python for data management in Decentralized Systems

(Practical Part: Basic data manipulation with Python and PySpark).

Week #6: Big Data Storage and Processing in Decentralized Systems.

(Practical Part: Batch processing with PySpark).

Week #7: Big Data Storage and Processing in Decentralized Systems (cont’d).

(Practical Part: Batch processing with PySpark).

Week #8: Large Scale Machine Learning with PySpark

(Practical Part: Implementation of a simple machine learning model using Python's scikit-learn)

Week #9: Large Scale Machine Learning with PySpark (Cont’d).

(Practical Part: Implementation of a simple machine learning model using Python's scikit-learn)

Week #10: Large Scale Machine Learning with PySpark (Cont’d).

(Practical Part: Implementation of a machine learning model with PySpark's MLlib).

Week #11: Large Scale Machine Learning with PySpark (Cont’d).

(Practical Part: Implementation of a machine learning model with PySpark's MLlib).

Week #12: Advanced Topics and Case Studies.

(Practical Part: Project combining concepts from previous lectures).

Week #13: Advanced Topics and Case Studies (Cont’d).

(Practical Part: Project combining concepts from previous lectures).

STUDENT PERFORMANCE EVALUATION:

Assignments (100%):

  • Research Paper Presentation (30% - 50%)
  • Project Implementation (50% - 70%)

BIBLIOGRAPHY

  • Spark: The Definitive Guide, by Bill Chambers, Matei Zaharia, Released February 2018, Publisher(s): O'Reilly Media, Inc., ISBN: 9781491912218.

 

  • The Google File System:

<https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf>

  • Map-Reduce:

<https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf>

  • Decentralized Frameworks for Future Power Systems, 1st Edition - May 12, 2022, Mohsen Parsa Moghaddam, Reza Zamani, Hassan Haes Alhelou, Pierluigi Siano, ISBN: 9780323916981

Ενότητες

- Δεν υπάρχουν ενότητες -

Ημερολόγιο

Προθεσμία
Γεγονός μαθήματος
Γεγονός συστήματος
Προσωπικό γεγονός

Ανακοινώσεις

Όλες...