Big Data Deduplication in Data Lake

Hlavačka, Jakub; Bobák, Martin; Hluchý, Ladislav

Author	Hlavačka, Jakub
Author	Bobák, Martin
Author	Hluchý, Ladislav
xmlui.dri2xhtml.METS-1.0.item-date-accessioned	2025-08-18T13:34:27Z
xmlui.dri2xhtml.METS-1.0.item-date-available	2025-08-18T13:34:27Z
xmlui.dri2xhtml.METS-1.0.item-date-issued	2024
xmlui.dri2xhtml.METS-1.0.item-identifier-issn	1785-8860	hu_HU
xmlui.dri2xhtml.METS-1.0.item-identifier-uri	http://hdl.handle.net/20.500.14044/32350
xmlui.dri2xhtml.METS-1.0.item-description-abstract	Data lakes are the next generation of technology to process and store big data. As usual, new challenges and problems arise inevitably with new technologies. One of these problems is the occurrence of duplicate data in the storage. Our paper aims to address this challenge during the data ingestion phase that is currently overlooked or addressed insufficiently. The first part discusses the design of a suitable architecture for the data lake and deduplication workflow for processing structured and unstructured data. The proposed solution is evaluated through experiments that deal with the flexible deduplication window, the scalability of the proposed solution, the suitable hash function, and the advantages of an in-memory pointer repository.	hu_HU
dc.format	PDF	hu_HU
xmlui.dri2xhtml.METS-1.0.item-language	en	hu_HU
Title	Big Data Deduplication in Data Lake	hu_HU
xmlui.dri2xhtml.METS-1.0.item-rights-access	Open access	hu_HU
xmlui.dri2xhtml.METS-1.0.item-rights	Óbudai Egyetem	hu_HU
xmlui.dri2xhtml.METS-1.0.item-other-containerPublisherPlace	Budapest	hu_HU
xmlui.dri2xhtml.METS-1.0.item-publisher-university	Óbudai Egyetem	hu_HU
xmlui.dri2xhtml.METS-1.0.item-subject-area	Műszaki tudományok - informatikai tudományok	hu_HU
xmlui.dri2xhtml.METS-1.0.item-subject-oszkar	data lake	hu_HU
xmlui.dri2xhtml.METS-1.0.item-subject-oszkar	deduplication	hu_HU
xmlui.dri2xhtml.METS-1.0.item-subject-oszkar	big data	hu_HU
xmlui.dri2xhtml.METS-1.0.item-type-type	Tudományos cikk	hu_HU
xmlui.dri2xhtml.METS-1.0.item-other-containerTitle	Acta Polytechnica Hungarica	hu_HU
local.tempfieldCollections	Folyóiratcikkek	hu_HU
xmlui.dri2xhtml.METS-1.0.item-identifiers [doi]	10.12700/APH.21.11.2024.11.17
xmlui.dri2xhtml.METS-1.0.item-description-version	Kiadói változat	hu_HU
xmlui.dri2xhtml.METS-1.0.item-format-page	20 p.	hu_HU
xmlui.dri2xhtml.METS-1.0.item-other-containerPeriodicalNumber	11. sz.	hu_HU
xmlui.dri2xhtml.METS-1.0.item-other-containerPeriodicalVolume	21. évf.	hu_HU
xmlui.dri2xhtml.METS-1.0.item-other-containerPeriodicalYear	2024	hu_HU
xmlui.dri2xhtml.METS-1.0.item-other-containerPublisher	Óbudai Egyetem	hu_HU

Files in this item

Name:: Hlavacka_Bobak_Hluchy_151.pdf
Size:: 690.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Acta Polytechnica Hungarica [175]

Show simple item record