Please ensure Javascript is enabled for purposes of website accessibility

Anteprima

Selected image

ΥΔΑ102: Τεχνικές διαχείρισης και Εξόρυξης για Δεδομένα Μεγάλου Όγκου (Big Data Management and Mining Methods)

(DDCD102) -  Χρήστος Μακρής

Descrizione del Corso

Το μάθημα απευθύνεται σε όσους φοιτητές θέλουν να αποκτήσουν βασικές γνώσεις στην περιοχή των δομών δεδομένων και των αλγοριθμικών τεχνικών που σχετίζονται με τη διαχείριση και επεξεργασία μεγάλου όγκου δεδομένων. Το μάθημα παρουσιάζει μία αρκετά ευρεία (σε σχέση με το πλήθος εφαρμογών και τις χρησιμοποιούμενες τεχνικές) περιοχή στον χώρο της Επιστήμης των Υπολογιστών, και σχετίζεται άμεσα με άλλα προπτυχιακά και μεταπτυχιακά μαθήματα του Προγράμματος Σπουδών όπως: Δομές Δεδομένων, Προχωρημένες Δομές Δεδομένων και Γραφική, Θεωρία Βασικών Δομών Δεδομένων, Αλγόριθμοι, Τεχνολογίες Υλοποίησης Αλγορίθμων. Ψηφιακή Επεξεργασία και Ανάλυση Εικόνας, Ανάκτηση Πληροφορίας, Βάσεις Δεδομένων I, II.

Ουσιαστικά το μάθημα είναι συνέχεια του μ/χ μαθήματος του ΕΤΥ Μέθοδοι και Τεχνολογίες για Διαχείριση Μεγάλου Όγκου Δεδομένων σε συνδυασμό με βασικές έννοιες από το μ/χ μάθημα του ΕΤΥ Θεωρία Βασικών Δομών Δεδομένων, και με επιπλέον διδασκαλία εννοιών από Μηχανική Μάθηση και Εξόρυξη Γνώσης. 

 Τα θέματα που καλύπτονται στο μάθημα είναι τα ακόλουθα:

  •  Μοντέλα Υπολογισμού και μετρικές χρονική και χωρικής πολυπλοκότητας
  • Διαχρονικές Δομές Δεδομένων, Αλγόριθμοι και Δομές  Δεδομένων σε προβλήματα διαχείρισης δενδρικών γραφημάτων
  • Ουρές Προτεραιότητας και Αυτοοργανώμενα Δέντρα
  • Αλγόριθμοι Διάταξης και Δομές Ψαξίματος στο RAM Μοντέλο Υπολογισμού,
  • Μοντέλα Δευτερεύουσας Μνήμης (Μοντέλο δύο επιπέδων, ιεραρχικά μοντέλα, Cache Oblivious μοντέλα)
  • Αλγόριθμοι και Δομές Δεδομένων για μοντέλα δύο επιπέδων (B-Trees, R-trees, Αλγόριθμοι Υπολογιστικής Γεωμετρίας στη Δευτερεύουσα Μνήμη, Αλγόριθμοι Διαχείρισης Συμβολοσειρών στη Δευτερεύουσα Μνήμη)
  • Αλγόριθμοι και Δομές Δεδομένων για Cache Oblivious Μοντέλα (Cache Oblivious B-Tree, Cache Oblivious Ουρές Προτεραιότητας, Γεωμετρικοί Αλγόριθμοι στο Cache Oblivious Μοντέλο Υπολογισμού)
  • Aλγόριθμοι στο Μοντέλο Ροής Δεδομένων
  • Εφαρμογές (Xρονικές Βάσεις Δεδομένων, Χωροχρονικές Βάσεις Δεδομένων, Πολυμεσικές Βάσεις Δεδομένων)
  • Αλγόριθμοι Εξόρυξης Γνώσης (Ομαδοποίηση, Κανόνες Συσχέτισης και Κατηγοριοποίηση) 

Παρατίθεται σχετική βιβλιογραφία που θα χρησιμοποιηθεί στα πλαίσια του μαθήματος:

Creation Date

martedì 29 gennaio 2019

  • Μέθοδοι αξιολόγησης

    Εξετάσεις

    Η εξέταση του μαθήματος συνίσταται σε: (i) προφορική εξέταση πάνω στο περιεχόμενο επιλεγμένων βιβλίων που θα καλύπτουν τα όσα έχουν παρουσιαστεί στο μάθημα, και τα οποία αναφέρονται στη συνέχεια, (ii) σε ωριαία παρουσίαση και γραπτή αναφορά στο περιεχόμενο μίας επιστημονικής δημοσίευσης. H γραπτή αναφορά θα είναι 18-25 σελίδες  με τις βασικές ιδέες του paper, καθώς και ο,τι συμπληρωματικό υλικό στο πρόβλημα που πραγματεύεται το paper έχετε συλλέξει.

    Ο τελικός βαθμός θα είναι το ημιάθροισμα των δύο βαθμολογιών.

    Ύλη Εξέτασης

    Η βασική ύλη εξέτασης είναι το βιβλίο του Vitter, Algorithms and Data Structures for External Memory, Book, (http://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf) KAI χρησιμοποιείται υλικό από τα βιβλία των

    (στη λίστα παρατίθεται και βοηθητικό βιβλιογραφικό υλικό):

    ΠΙΟ ΣΥΓΚΕΚΡΙΜΕΝΑ 

    η ύλη της προφορικής εξέτασης για το μάθημα είναι:

    1.  τα κεφάλαια 1, 2, 3, 4 ,5, 8, 10,11,12, 13, 14, 15. από το βιβλίο J. Vitter, Algorithms and Data Structures for External Memory, Book, (http://www.ittc.ku.edu/~jsv/Papers/Vit.IO_book.pdf) και από αυτά έμφαση στα όσα ειπώθηκαν στις διαλέξεις και υπάρχουν στις διαφάνειες του e_class

    2. επιλεγμένο υλικό από τα βιβλία:

    ΕΝΔΕΙΚΤΙΚΕΣ ΔΗΜΟΣΙΕΥΣΕΙΣ ΠΑΝΩ ΣΤΙΣ ΟΠΟΙΕΣ ΘΑ ΒΑΣΙΣΤΕΙΤΕ ΓΙΑ ΤΗΝ ΕΡΓΑΣΙΑ ΣΑΣ.

    Θα πρέπει να μου στείλετε email από 1.3 μέχρι 10.3 με 3 επιλογές εκ των οποίων μόνο μία θα είναι σε paper κυριας μνήμης,

    External Memory Data Structures

    1. Vitter, J. S. and Shriver, E. 1994a. Algorithms for parallel memory I: Two-level memories. Algorithmica 12, 2-3, 110-147.
    2. Vitter, J. S. and Shriver, E. A.1994b. Algorithms for parallel memory II: Hierarchical multilevel.
    3. S. Lanka, E. Mays, Fully Persistent B+-trees, ACM International Conference on Management of Data, 426-435, 1992.
    4. Varman P. & Verma R. An Efficient Multiversion Access Structure. IEEE Transactions on Knowledge and Data Engineering, 391-409. 1997.
    5. B. Becker, S. Gschwind, T. Ohler, B. Seeger, P. Widmayer, An asymptotically optimal multiversion B-tree, The VLDB Journal, 264-275.
    6. P. Ferragina and R. Grossi. The string B-tree: a new data structure for string search in external memory and its applications. J. ACM, 46(2):236-280, 1999.
    7. Lars Arge, Mikkel Thorup: RAM-Efficient External Memory Sorting. CoRR abs/1312.2018 (2013) 2012
    8. Pankaj K. Agarwal, Lars Arge, Sathish GovindarajanJun YangKe YiEfficient external memory structures for range-aggregate queries. Comput. Geom. 46(3): 358-370 (2013)
    9. Lars Arge, Michael T. GoodrichFreek van WalderveenComputing betweenness centrality in external memory. BigData Conference 2013: 368-375
    10. Lars Arge, Johannes FischerPeter SandersNodari SitchinavaOn (Dynamic) Range Minimum Queries in External Memory. WADS 2013: 37-48R
    11. Rahul ShahCheng ShengSharma V. Thankachan, Jeffrey Scott Vitter, Top-k Document Retrieval in External Memory. ESA 2013: 803-81
    12. Gerth St?lting BrodalKonstantinos TsakalidisSpyros SioutasKostas Tsichlas:
      Fully persistent B-trees. SODA 2012: 602-614
    13. Michael A. BenderMartin Farach-ColtonRob JohnsonSimon MaurasTyler MayerCynthia A. PhillipsHelen Xu:
      Write-Optimized Skip Lists. PODS 201769-78
    14. Michael A. BenderAlex ConwayMartin Farach-ColtonWilliam JannenYizheng JiaoRob JohnsonEric KnorrSara McAllisterNirjhar MukherjeePrashant PandeyDonald E. PorterJun YuanYang Zhan: External-memory Dictionaries in the Affine and PDAM Models. ACM Trans. Parallel Comput. 8(3)15:1-15:20 (2021)
    15. Michael A. Bender, Roozbeh Ebrahimi, Haodong Hu, Bradley C. Kuszmaul:
      B-Trees and Cache-Oblivious B-Trees with Different-Sized Atomic Keys. ACM Trans. Database Syst. 41(3): 19:1-19:33 (2016)
    16. Michael A. BenderJonathan W. BerryRob JohnsonThomas M. KroegerSamuel McCauleyCynthia A. PhillipsBertrand SimonShikha SinghDavid Zage:
      Anti-Persistence on Persistent Storage: History-Independent Sparse Tables and Dictionaries. PODS 2016289-302
    17. Gerth Stølting BrodalRolf FagerbergDavid HammerUlrich MeyerManuel PenschuckHung Tran:
      An Experimental Study of External Memory Algorithms for Connected Components. SEA 202123:1-23:23
    18.  Alex Conway, Abhishek Gupta, Vijay Chidambaran, Martin Farach-Colton§ Rick Spillane¶ Amy Tai, Rob Johnson,
      SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores, 2020 USENIX Annual Technical Conference
    19.  Alex Conway, Martín Farach-Colton , Philip Shilane Optimal Hashing in External Memory, ICALP 2018
    20. Peter Sanders, Connecting MapReduce Computations to Realistic Machine Models, https://arxiv.org/pdf/2002.07553.pdf
    21. Athanasios FevgasLeonidas AkritidisMiltiadis AlamaniotisPanagiota E. TsompanopoulouPanayiotis Bozanis:
      HyR-tree: a spatial index for hybrid flash/3D XPoint storage. Neural Comput. Appl. 35(1): 133-145 (2023)
    22. Athanasios FevgasLeonidas AkritidisPanayiotis BozanisYannis Manolopoulos:
      Indexing in flash storage devices: a survey on challenges, current approaches, and future trends. VLDB J. 29(1): 273-311 (2020)
    23. Chen Luo, Michael J. Carey, LSM-based storage techniques: a survey, The VLDB Journal 19 July 2019

    24. Xin Zhang, Qizhong Mao, Ahmed Eldawy, Vagelis Hristidis, Yihan Sun: Bi-directional Log-Structured Merge Tree. SSDBM 2022: 19:1-19:4

    25. Qizhong Mao, Mohiuddin Abdul Qader, Vagelis Hristidis: Comprehensive Comparison of LSM Architectures for Spatial Data. IEEE BigData 2020: 455-460

    26. S.-W. Jun, S. Xu et al., "Terabyte sort on fpga-accelerated flash storage", Field-Programmable Custom Computing Machines (FCCM) 2017 IEEE 25th Annual International Symposium on, pp. 17-24, 2017.

    27. Wu, Chin-Hsien, and Kuo-Yi Huang. “Data Sorting in Flash Memory.” ACM Transactions on Storage, no. 2, Association for Computing Machinery (ACM), Mar. 2015, pp. 1–25. Crossref, doi:10.1145/2665067.

    28. Jackson, Riley, et al. “Efficient External Sorting for Memory-Constrained Embedded Devices with Flash Memory.” ACM Transactions on Embedded Computing Systems, no. 4, Association for Computing Machinery (ACM), Mar. 2021, pp. 1–21. Crossref, doi:10.1145/3446976.

    29. A. Laga, J. Boukhobza, F. Singhoff and M. Koskas, "Montres: merge on-the-run external sorting algorithm for large data volumes on ssd based storage systems", IEEE Transactions on Computers, vol. 66, no. 10, pp. 1689-1702, 2017.

    30. Yubiao Chen, Jianzhong Li, Hong Gao: FSSort: External Sort For Solid State Drives. CCGRID 2021: 71-80

    31. Nadir Ould-Khessal, Scott Fazackerley, and Ramon Lawrence, An Efficient B-tree Implementation for Memory-Constrained Embedded Systems, https://arxiv.org/abs/2302.07800, 2023
    32. Bender, M.A., Farach-Colton, M., Jannen, W., Johnson, R., Kuszmaul, B.C., Porter, D.E., Yuan, J., Zhan, Y.: An Introduction to B-trees and Write-
      Optimization. Usenix Mag. 40(5) (2015)

    33. Fazackerley, S., Ould-Khessal, N., Lawrence, R.: Efficient flash indexing for time series data on memory-constrained embedded sensor devices. In: Proceedings of
      the 10th International Conference on Sensor Networks, SENSORNETS 2021, pp. 92{99. SCITEPRESS (2021). DOI 10.5220/0010318800920099

    34. Gerth Stølting Brodal, Casper Moldrup Rysgaard, Rolf Svenning: External Memory Fully Persistent Search Trees. STOC 2023: 1410-1423

    35. Gerth Stølting Brodal, Casper Moldrup Rysgaard, Jens Kristian Refsgaard Schou, Rolf Svenning: Space-Efficient Functional Offline-Partially-Persistent Trees with Applications to Planar Point Location. WADS 2023: 644-659

    External Memory Geometric Data Structures

    1. Arge, L. 1995a. The buffer tree: A new technique for optimal I/O-algorithms. In Proceedings of the Workshop on Algorithms and Data Structures, Vol. 955 of Lecture Notes in Computer Science, Springer-Verlag, 334-345, 1995. A complete version appears as BRICS Technical Report RS-96-28, University of Aarhus.
    2. Arge, L. 1997. External-memory algorithms with applications in geographic information systems. In M. van Kreveld, J. Nievergelt, T. Roos, and P. Widmayer, eds, Algorithmic Foundations of GIS, Vol. 1340 of Lecture Notes in Computer Science, Springer-Verlag, 213-254.
    3. Arge, L. and Vahrenhold, J. I/O-efficient dynamic planar point location. In Proceedings of the ACM Symposium on Computational Geometry (June), Vol. 9, 191-200, 2000.
    4. Kanellakis, P. C.,Ramaswamy, S., Vengroff, D. E., and Vitter, J. S. 1996. Indexing for data models with constraints and classes. Journal of Computer and System Sciences 52, 3, 589-612.
    5. Arge, L. and Vitter, J. S. Optimal dynamic interval management in external memory. In Proceedings of the IEEE Symposium on Foundations of Computer Science, 1996.
    6. Arge, L., Samoladas, V., and Vitter, J. S. 1999b Two-dimensional indexability and optimal range search indexing. In Proceedings of the ACMConference Principles of Database Systems (Philadelphia, May-June), Vol. 18, 346-357, 1999
    7. Lars Arge, Gerth St?lting BrodalS. Srinivasa RaoExternal Memory Planar Point Location with Logarithmic Updates. Algorithmica 63(1-2): 457-475 (2012) 
    8. Lars Arge, Vasilis SamoladasKe YiOptimal External Memory Planar Point Enclosure. Algorithmica 54(3): 337-352 (2009)
    9. Peyman Afshani, Lars Arge, Kasper Dalgaard LarsenOrthogonal Range Reporting in Three and Higher Dimensions. FOCS 2009: 149-158
    10. Peyman AfshaniNodari SitchinavaI/O-Efficient Range Minima Queries. SWAT 2014: 1-12
    11. Gerth Stølting BrodalRolf FagerbergDavid HammerUlrich MeyerManuel PenschuckHung Tran:
      An Experimental Study of External Memory Algorithms for Connected Components. SEA 202123:1-23:23

    Foundations of Multidimensional and Metric Data Structures  (ενότητες από το βιβλίο του  Hanan Samet -http://www.cs.umd.edu/~hjs/mkbook/content-page.pdf )

    1. Mutlidimensional Point Data (Quadtrees, k-d trees, Grid Directory, PK-trees) 
    2. Object based and image based image representations .
    3. Intervals and small rectangles
    4. High Dimensional Data 

    BIG DATA MANAGEMENT AND MINING

          1. Michael Nelson, Sridhar Radhakrishnan, Chandra N. Sekharan: Queryable Compression on Time-Evolving Social Networks with Streaming.  IEEE International Conference on Big Data 2018, 146-15
          2. Sungchul Lee, Ju-Yeon Jo, Yoohwan Kim: Key based Deep Data Locality on Hadoop.  IEEE International Conference on Big Data 2018,3889-3898
          3. Jacob D. Moorman, Qinyi Chen, Thomas K. Tu, Zachary M. Boyd, Andrea L. Bertozzi: Filtering Methods for Subgraph Matching on Multiplex Networks. IEEE International Conference on Big Data 2018,3980-3985  
          4. Nils Gruschka, Vasileios Mavroeidis, Kamer Vishi, Meiko Jensen: Privacy Issues and Data Protection in Big Data: A Case Study Analysis under GDPR.  IEEE International Conference on Big Data 2018,5027-5033
          5. Alfredo Cuzzocrea, Fabio Martinelli, Francesco Mercaldo: Improving Machine Learning Tools with Embeddings: Applications to Big Data Security.  IEEE International Conference on Big Data 2018,5086-5092
          6. Xian Wu, Baoxu Shi, Yuxiao Dong, Chao Huang, Nitesh V. Chawla: Neural Tensor Factorization for Temporal Interaction Learning. WSDM 2019, 537-545
          7.Manzil Zaheer, Amr Ahmed, Yuan Wang, Daniel Silva, Marc Najork, Yuchen Wu, Shibani Sanan, Surojit Chatterjee: Uncovering Hidden Structure in Sequence Data via Threading Recurrent Models.  WSDM 2019, 186-194
          8. Branislav Kveton, S. Muthukrishnan, Hoa T. Vu, Yikun Xian: Finding Subcube Heavy Hitters in Analytics Data Streams. 1705-1714, WWW 2018
          9. Xin Luo, Ye Wu, Xin-Shun Xu: Scalable Supervised Discrete Hashing for Large-Scale Search. 1603-1612, WWW 2018
         10. Rajesh Jayaram, Gokarna Sharma, Srikanta Tirthapura, David P. Woodruff: Weighted Reservoir Sampling from Distributed Streams. 218-235, ACM PODS 2019
         11. Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, Maximilian Schleich: In-Database Learning with Sparse Tensors. 325-340 ACM PODS 2018
         12. Rajesh Jayaram, David P. Woodruff: Data Streams with Bounded Deletions. 341-354, ACM PODS 2018
         13. Jiaqi Gu, Yugo H. Watanabe, William A. Mazza, Alexander Shkapsky, Mohan Yang, Ling Ding, Carlo Zaniolo: RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark. 467-484, ACM SIGMOD 2019
         14. Andrew Carter, Andrew Rodriguez, Yiming Yang, Scott Meyer: Nanosecond Indexing of Graph Data With Hash Maps and VLists. 623-635, ACM SIGMOD 2019
         15. Daniel Kocher, Nikolaus Augsten: A Scalable Index for Top-k Subtree Similarity Queries. 1624-1641 ACM SIGMOD 2019
         16. Hwanjun Song, Jae-Gil Lee: RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning. 1173-1187, ACM SIGMOD 2018
          17. Jia Zou, R. Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine:
    PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development. 1189-1204 ACM SIGMOD 2018
         18. Maaz Bin Safeer Ahmad, Alvin Cheung: Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. 1205-1220, ACM SIGMOD 2018

        19. Alex Conway, Abhishek Gupta, Vijay Chidambaran, Martin Farach-Colton§ Rick Spillane¶ Amy Tai, Rob Johnson,
    SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores, 2020 USENIX Annual Technical Conference

         20. Saman Ashkiani, Martin Farach-Colton, John D. Owens, A Dynamic Hash Table for the GPU, 2018,
    https://arxiv.org/pdf/1710.11246.pdf

         21. Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, Better GPU Hash Tables, 2021, https://arxiv.org/pdf/2108.07232.pdf

         22. Peter Sanders, Connecting MapReduce Computations to Realistic Machine Models, https://arxiv.org/pdf/2002.07553.pdf

         23. Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag: Scalable High-Quality Hypergraph Partitioning. ACM Trans. Algorithms 20(1): 9:1-9:54 (2024)

    COMPRESSED DATA STRUCTURES

           1. H?ctor Ferrada, Gonzalo Navarro: Lempel-Ziv compressed structures for document retrieval. Inf. Comput. 265: 1-25 (2019)
           2. Gonzalo Navarro: Document listing on repetitive collections with guaranteed performance. Theor. Comput. Sci. 772: 58-72 (2019)
           3. Alberto Ord??ez Pereira, Gonzalo Navarro, Nieves R. Brisaboa: Grammar compressed sequences with rank/select support. J. Discrete Algorithms 43: 54-71 (2017)
           4. Gonzalo Navarro, Yakov Nekrich: Time-Optimal Top-k Document Retrieval. SIAM J. Comput. 46(1): 80-113 (2017)
           5. Miguel A. Mart?nez-Prieto, Nieves R. Brisaboa, Rodrigo C?novas, Francisco Claude, Gonzalo Navarro: Practical compressed string dictionaries. Inf. Syst. 56: 73-108 (2016)
           6. Cecilia Hern?ndez, Gonzalo Navarro: Compressed representations for web and social graphs. Knowl. Inf. Syst. 40(2): 279-313 (2014)
           7. Gonzalo Navarro, Nora Reyes: New dynamic metric indices for secondary memory. Inf. Syst. 59: 48-78 (2016)   
           8. Francisco Claude, Gonzalo Navarro:  Self-Indexed Grammar-Based Compression. Fundam. Inform. 111(3): 313-337 (2011)

           9, Tejalal Choudhary, Vipul Mishra, Anurag Goswami, Jagannathan Sarangapani, A comprehensive survey on model compression and acceleration
    Artifcial Intelligence Review (2020) 53:5113–5155 https://doi.org/10.1007/s10462-020-09816-7   

          10. Giosuè Cataldo Marinò, Alessandro Petrini, Dario Malchiodi, Marco Frasca Università degli Studi di Milano, Compact representations of convolutional neural networks via weight pruning and quantization arXiv e-prints

          11.  Deep neural networks compression: A comparative survey and choice recommendations. Neurocomputing, 2022.

          12.  Learned Sorted Table Search and Static Indexes in Small-Space Data Models. Data, 2023.

          13. Guillermo de Bernardo, Travis Gagie, Susana Ladra, Gonzalo Navarro, Diego Seco: Faster compressed quadtrees. J. Comput. Syst. Sci. 131: 86-104 (2023)

     

    Cache Oblivious Data Structures

    1. M. Frigo, C.E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th IEEE Symp. on Foundations of Computer Science (FOCS 99), pages 285-297, 1999.
    2. M. A. Bender, E. Demaine, and M. Farach-Colton."Cache Oblivious B-Trees" Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 399-409, 2000.
    3. Gerth Stψlting Brodal and Rolf Fagerberg, Funnel Heap - A Cache Oblivious Priority Queue, In Proc. 13th Annual International Symposium on Algorithms and Computation, volume 2518 of Lecture Notes in Computer Science, pages 219-228. Springer Verlag, Berlin, 2002.
    4. Gianni Franceschini, Roberto Grossi, J. Ian Munro, and Linda Pagli. Implicit B-trees: A New Data Structure for the Dictionary Problem. Journal of Computer and System Sciences, special issue of the 43th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2004.
    5. Gianni Franceschini and Roberto Grossi. Implicit dictionaries supporting searches and amortized updates in O(log n loglog n). In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 670-678. SIAM, 2003.
    6. Gianni Franceschini and Roberto Grossi. Implicit dictionaries supporting searches and amortized updates in O(log n loglog n). In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 670-678. SIAM, 2003. 
    7. Arash Farzan, Paolo Ferragina, Gianni Franceschini, J. Ian Munro:Cache-Oblivious Comparison-Based Algorithms on Multisets. ESA 2005: 305-316
    8.  Gerth St?lting BrodalErik D. DemaineJeremy T. FinemanJohn IaconoStefan LangermanJ. Ian Munro:
      Cache-Oblivious Dynamic Dictionaries with Update/Query Tradeoffs. SODA 2010: 1448-1456
    9. Gerth St?lting Brodal, Casper Kejlberg-RasmussenCache-Oblivious Implicit Predecessor Dictionaries with the Working-Set Property. STACS 2012: 112-123
    10. Lars Arge, Gerth St?lting BrodalJakob TruelsenConstantinos TsirogiannisAn Optimal and Practical Cache-Oblivious Algorithm for Computing Multiresolution Rasters. ESA 2013: 61-72
    11. Peyman AfshaniChris H. HamiltonNorbert Zeh:
      A general approach for cache-oblivious range reporting and approximate range counting. Comput. Geom. 43(8): 700-712 (2010)
    12. Michael A. BenderRezaul Alam ChowdhuryRathish DasRob JohnsonWilliam KuszmaulAndrea LincolnQuanquan C. LiuJayson LynchHelen Xu:
      Closing the Gap Between Cache-oblivious and Cache-adaptive Analysis. SPAA 202063-73
    13. Michael A. BenderMartin Farach-ColtonSándor P. FeketeJeremy T. FinemanSeth Gilbert:
      Cost-Oblivious Storage Reallocation. ACM Trans. Algorithms 13(3)38:1-38:20 (2017)
    14. Gerth Stølting Brodal, Sebastian Wild: Funnelselect: Cache-Oblivious Multiple Selection. ESA 2023: 25:1-25:1

    Data structures in the Cloud

    1. Wang, J., S. Wu, H. Gao, J. Li, and B. C. Ooi (2010). Indexing multi-dimensional data n a cloud system. In Proceedings of the 2010 international conference on Management of data, SIGMOD '10, New York, NY, USA, pp. 591-602. ACM.
    2. 2. Grossman, R. and Y. Gu (2008), Data mining using high performance data clouds: experimental studies using sector and sphere. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, NewYork, NY, USA, pp. 920-927. ACM.
    3. Sai Wu, Dawei Jiang, Beng Chin Ooi, KunLung Wu, Efficient Btree Based Indexing for Cloud Data Processing, Proceedings of VLDB Endowment,1207-1218

    Learned Index data Structures

            1. Tim Kraska, Alex BeutelEd H. ChiJeffrey DeanNeoklis Polyzotis: The Case for Learned Index Structures. SIGMOD Conference 2018: 489-504

            2. Christos Anagnostopoulos, Peter Triantafillou:Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality. TKDD 11(4): 47:1-47:46 (2017)

           3.Kapil Vaidya, Eric Knorr, Michael Mitzenmacher, Tim Kraska: Partitioned Learned Bloom Filters. ICLR 2021

           4. Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, Tim Kraska:
    Benchmarking Learned Indexes. Proc. VLDB Endow. 14(1): 1-13 (2020)

            5, Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska: Learning Multi-Dimensional Indexes. SIGMOD Conference 2020: 985-1000

    LSM Trees and SSDs

    Subhadeep Sarkar, Tarikul Islam Papon, Dimitris Staratzis, Zichen Zhu, Manos Athanassoulis: Enabling Timely and Persistent Deletion in LSM-Engines. ACM Trans. Database Syst. 48(3): 8:1-8:40 (2023)

    Subhadeep Sarkar, Niv Dayan, Manos Athanassoulis: The LSM Design Space and its Read Optimizations. ICDE 2023: 3578-3584

    Subhadeep Sarkar, Manos Athanassoulis: Dissecting, Designing, and Optimizing LSM-based Data Stores. SIGMOD Conference 2022: 2489-2497

    Tarikul Islam Papon, Manos Athanassoulis: The Need for a New I/O Model. CIDR 2021

    Zichen Zhu, Ju Hyoung Mun, Aneesh Raman, Manos Athanassoulis: Reducing Bloom Filter CPU Overhead in LSM-Trees on Modern Storage Devices. DaMoN 2021: 1:1-1:10

    Tarikul Islam Papon, Manos Athanassoulis: A Parametric I/O Model for Modern Storage Devices. DaMoN 2021: 2:1-2:11

    Athanasios FevgasLeonidas AkritidisPanayiotis BozanisYannis Manolopoulos: Indexing in flash storage devices: a survey on challenges, current approaches, and future trends. VLDB J. 29(1): 273-311 (2020)

     

    Pretrained transformers for Information Retrieval Ranking

           1. Antonio Mallia, Joel Mackenzie, Torsten Suel, Nicola Tonellotto: Faster Learned Sparse Retrieval with Guided Traversal. SIGIR 2022: 1901-1905

           2. Antonio Mallia, Omar Khattab, Torsten Suel, Nicola Tonellotto: Learning Passage Impacts for Inverted Indexes. SIGIR 2021: 1723-1727

           3. Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia, ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction,
    Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

          4. Karan Samel, Cheng Li, Weize Kong,Tao Chen, Mingyang Zhang, Shaleen Gupta, Swaraj Khadanga,  Wensong Xu, Xingyu Wang, Kashyap Kolipaka, Michael Bendersky, Marc Najork,
    End-to-End Query TermWeighting, KDD 2023

    Data Structures for Tree and String Manipulation  Search Trees and Priority Queues/sorting (POINTER MACHINE AND RAM ALGORITHMS)

    1. Richard Cole, Ramesh Hariharan: Tree Pattern Matching to Subset Matching in Linear Time. SIAM J. Comput. 32(4): 1056-1066 (2003)
    2. Erik D. Demaine, Gad M. Landau, Oren Weimann: On Cartesian Trees and Range Minimum Queries. Algorithmica 68(3): 610-625 (2014)
    3. Francisco Claude, Gonzalo Navarro, Alberto Ord??ez Pereira: The wavelet matrix: An efficient wavelet tree for large alphabets. Inf. Syst. 47: 15-32 (2015)
    4. J?r?my Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro, Yakov Nekrich: Efficient Fully-Compressed Sequence Representations. Algorithmica 69(1): 232-268 (2014)
    5. Gonzalo Navarro, Yakov Nekrich: Optimal Dynamic Sequence Representations. SIAM J. Comput. 43(5): 1781-1806 (2014) /
    6. Gonzalo Navarro, Sharma V. Thankachan: Encodings for Range Majority Queries. CPM 2014: 262-272
    7. Gonzalo Navarro, Sharma V. Thankachan: Optimal Encodings for Range Majority Queries. CoRR abs/1404.2677 (2014)
    8. Travis Gagie, Juha K?rkk?inen, Gonzalo Navarro, Simon J. Puglisi: Colored range queries and document retrieval. Theor. Comput. Sci. 483: 36-50 (2013)
    9. Roberto Konow, Gonzalo Navarro, Charles L. A. Clarke, Alejandro L?pez-Ortiz: Faster and smaller inverted indices with treaps. SIGIR 2013: 193-202
    10. G. S. Brodal. Finger Search Trees with Constant Insertion Time. In Proc. 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 540-549, 1998.
    11. Gerth St?lting Brodal, George Lagogiannis, Christos Makris, Athanasios K. Tsakalidis, Kostas Tsichlas: Optimal finger search trees in the pointer machine. J. Comput. Syst. Sci. 67(2): 381-418 (2003
    12. Daniel Dominic Sleator and Robert Endre Tarjan. Self-adjusting binary search trees. Journal of the ACM , 32(3):652-686, July 1985.
    13. Gerth St?lting Brodal, George Lagogiannis, Robert Endre Tarjan: Strict fibonacci heaps. STOC 2012: 1177-1184
    14. Haim Kaplan, Robert Endre Tarjan, Uri Zwick: Soft Heaps Simplified. SIAM J. Comput. 42(4): 1660-1673 (201
    15. Arne Andersson, Mikkel Thorup: Dynamic ordered sets with exponential search trees. J. ACM 54(3): 13 (2007)
    16. Yijie Han: Deterministic sorting in O(nloglogn) time and linear space. J. Algorithms 50(1): 96-105 (2004)
    17. Mikkel ThorupOr ZamirUri ZwickDynamic Ordered Sets with Approximate Queries, Approximate Heaps and Soft Heaps. ICALP 2019: 95:1-95:13

     

    Βιβλιογραφία

    ΒΙΒΛΙΑ

     ΣΥΝΔΕΣΜΟΙ

    Σύνδεσμοι

     

    Προτεινόμενα συγγράμματα

    To βασικό σύγγραμμα είναι το βιβλίο του Vitter KAI των Jure LeskovecAnand RajaramanJeff Ullman  (στη λίστα παρατίθεται και βοηθητικό βιβλιογραφικό υλικό):