Big Data Deduplication in Data Lake

Hlavačka, Jakub; Bobák, Martin; Hluchý, Ladislav

Szerző	Hlavačka, Jakub
Szerző	Bobák, Martin
Szerző	Hluchý, Ladislav
Utolsó hozzáférés ideje	2025-08-18T13:34:27Z
Elérhető	2025-08-18T13:34:27Z
Megjelenés ideje	2024
ISSN, e-ISSN	1785-8860	hu_HU
Közvetlen link	http://hdl.handle.net/20.500.14044/32350
Összefoglaló (Abstract)	Data lakes are the next generation of technology to process and store big data. As usual, new challenges and problems arise inevitably with new technologies. One of these problems is the occurrence of duplicate data in the storage. Our paper aims to address this challenge during the data ingestion phase that is currently overlooked or addressed insufficiently. The first part discusses the design of a suitable architecture for the data lake and deduplication workflow for processing structured and unstructured data. The proposed solution is evaluated through experiments that deal with the flexible deduplication window, the scalability of the proposed solution, the suitable hash function, and the advantages of an in-memory pointer repository.	hu_HU
dc.format	PDF	hu_HU
Nyelv	en	hu_HU
Cím és alcím	Big Data Deduplication in Data Lake	hu_HU
Hozzáférés szintje	Open access	hu_HU
Copyright	Óbudai Egyetem	hu_HU
Kiadás helye	Budapest	hu_HU
Egyetem	Óbudai Egyetem	hu_HU
Tudományterület	Műszaki tudományok - informatikai tudományok	hu_HU
Tárgyszó	data lake	hu_HU
Tárgyszó	deduplication	hu_HU
Tárgyszó	big data	hu_HU
Műfaj	Tudományos cikk	hu_HU
A cikket/könyvrészletet tartalmazó dokumentum címe	Acta Polytechnica Hungarica	hu_HU
local.tempfieldCollections	Folyóiratcikkek	hu_HU
Egyéb azonosítók [doi]	10.12700/APH.21.11.2024.11.17
Változat	Kiadói változat	hu_HU
Terjedelem	20 p.	hu_HU
A forrás folyóirat száma	11. sz.	hu_HU
A forrás folyóirat évfolyama	21. évf.	hu_HU
A forrás folyóirat éve	2024	hu_HU
Kiadó	Óbudai Egyetem	hu_HU

A dokumentumhoz tartozó fájlok

Név:: Hlavacka_Bobak_Hluchy_151.pdf
Méret:: 690.3KB
Formátum:: PDF

Megtekintés/Megnyitás

A dokumentum a következő gyűjtemény(ek)ben található meg

2.01. 2024 Volume 21, Issue No. 11. [17]

Rövidített megjelenítés