Humboldt-Universität zu Berlin - Institut für Bibliotheks- und Informations­wissen­schaft

BBK: Engineering Web Archives

Berliner Bibliothekswissenschaftliches Kolloquium

03.12.2019 | 18 Uhr | Raum 123 | Dorotheenstraße 26 | IBI

Engineering Web Archives

Dr. Helge Holzmann, Internet Archive


Creative Commons Lizenzvertrag


Folien zum Vortrag


Web archives materialize and preserve captures of webpages that may or may not exist anymore. In addition to the great historic value, its meaning for data scientists is just as big, since these archives provide the web in a form that can be processed and engineered. Web Data Engineering refers to the task of transforming this dataset in a way that it is more useful for what should be achieved with it. The Internet Archive as the provider of the world's largest web archive recently invested new, additional resources in their web data engineering efforts to provide added value to their customers and partners. However, Web data features some very specific traits that raise new challenges to deal with when providing exclusive services based on the information contained in our holdings. Due to the vast data size and its temporal dimension, this often requires going beyond the pure application of existing workflows and spend some substantial work on researching and experimenting with novel methods and tools to accomplish a job more effectively as well as efficiently in terms of resources and time.

<< 19.11. ICILS Zum Gesamtprogramm 17.12 Antonomasien