Alberto Mendelzon Workshop School 2016

AMW SCHOOL

Panama City-Panama

6-10 June, 2016

ABOUT

Alberto Mendelzon Workshop School (AMWS):

For the past three years, AMW has been preceded by a two-day “Summer School”, with international speakers invited to present three-hour tutorials to a mix of students and other interested attendees.

The high-level goals of the AMW School are two-fold:

To host tutorials targeted at students (advanced undergraduate or postgraduate level) or other early- term researchers interested in the area of Data Management.

To provide a venue where young Latin American researchers can meet, discuss, learn and seek feedback on their topics, thus reinforcing research networks (of the future) in the area.

This year, the 4rd AMW School is organized by Juan Sequeda (Capsenta Labs, USA) and Domagoj Vrgoc (Pontificia Universidad Católica de Chile).

Please note that since Computer Science is an area in development in Panama, this year the first day of the School will consist of four introductory classes held in Spanish in order to attract local students. In previous years we have witnessed that younger students are not inclined to attend classes held exclusively in

English and this year we hope to avoid this by holding the first day of the school in Spanish. This will be followed by a day in English which will cater both to international visitors and to students who were attracted to the workshop by the prospect of learning about database topics in their native language. At the end of each day, we will also have a poster session which is intended to encourage the exchange of the ideas between students and the researchers participating in the school and the workshop.

The two days of the school are arranged such that the introductory courses from the first day prepare the students for the more advanced topics that will be presented the day after. This order is also maintained the first day, where each lecture builds on top of the previous one. We have a total of four lectures for the first day of the school and two advanced tutorials for day 2. The broad focus of each day is:

Day 1 (Databases and the Semantic Web): An introduction to foundational aspects of Databases, Description Logics, and the Semantic Web.

Day 2 (Big data and Ontologies): Talks about entity resolution in Big Data and how to answer queries in the presence of Ontologies.

We have arranged for seven speakers of international repute in the area of Data Management to provide seven lectures at the school.

Tutorial: Data and Algorithmic Bias in the Web and Distributed Web Search

Ricardo Baeza-Yates (Yahoo Research, Spain)

Speaker Bio: Ricardo Baeza-Yates areas of expertise are information retrieval, web search and data mining, data science and algorithms. He was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from January 2006 to February 2016. He is part time Professor at DTIC of the Universitat Pompeu Fabra, in Barcelona, Spain, as well as at DCC of Universidad de Chile in Santiago. Until 2004 he was Professor and founding director of the Center for Web Research at the later place. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 2011 (2nd ed), that won the ASIST 2012 Book of the Year award. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. Since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow, among other awards and distinctions.

The Web is the largest public big data repository that humankind has created. In this overwhelming data ocean we need to be aware of the quality and in particular, of biases the exist in this data, such as redundancy, spam, etc . These biases affect the algorithms that we design to improve the user experience. This problem is further exacerbated by biases that are added by these algorithms, specially in the context of recommendation systems. We give several examples and their relation to sparsity, novelty, and privacy, stressing the importance of the user context to avoid these biases.

In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of active Web sites continues to grow (190 millions at the beginning of 2016) and there are currently more than hundreds of billion indexed pages. On the other hand, Internet users are above two billion and billions of queries are issued each day.

In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed Web retrieval system and our research in all the components of such web search engine: crawling, indexing, and query processing.

Tutorial: Introduction to the Semantic Web and the Web of Linked Data

Oscar Corcho, Facultad de Informática

Universidad Politécnica de Madrid

The Semantic Web and the Web of Linked Data have become a reality in the last years partly due to the W3C consorcium’s effort to standardize the formats for representing data and ontologies (RDF, RDF Schema, RDFa, JSON-LD, OWL), and to standardize query languages (SPARQL). Likewise, many organisations have decided to start publishing linked data, to annotate their Web content using these formats, and to use public data in all types of applications. This lecture will introduce the fundamentals notions of the Semantic Web and the Web of Linked Data and describe the main research challanges we face in this area.

Speaker Bio: Oscar Corcho is an Associate Professor at Departamento de Inteligencia Artficial (Facultad de Informática , Universidad Politécnica de Madrid) , and he belongs to the Ontology Engineering Group. His research activities are focused on Semantic e-Science and Real World Internet, although he also works in the more general areas of Semantic Web and Ontological Engineering. Previously, he worked as a Marie Curie research fellow at the University of Manchester, and was a research manager at iSOCO. He holds a degree in Computer Science, an MSc in Software

Engineering and a PhD in Computational Science and Artficial Intelligence from UPM. He was awarded the Third National Award by the Spanish Ministry of Education in 2001. He has published several books, from which Ontological Engineering” can be highlighted as it is being used as a reference book in a good number of university lectures worldwide, and more than 100 papers in journals, conferences and workshops. He usually participates in the organisation or in the programme committees of relevant international conferences and workshops.

Tutorial: Introduction to Databases

Juan L. Reutter: Pontficia Universidad Católica de Chile

Tutorial: An introduction to Description Logics and Ontology Languages

Magdalena Ortiz: TU Wien

Databases are at the core of commercial software applications, and are essential for any application that requires storing, updating or consulting volumes of data in an efficient way. The purpose of this talk is to introduce the participants into the theoretical foundations that go along with the development of relational databases. We shall discuss the relationship of the Relational Algebra and Relational Calculus with SQL, the query language used in all relational database systems. We study some of the basic fundamental properties of these languages, and show how these languages are used back in database systems, to solve problems such as query optimisation or data integration.

Speaker Bio: Juan L. Reutter is an assistant professor at the Computer Science Department at Pontificia Universidad Católica de Chile and an associate investigator of the Chilean Center for Semantic Web Research. He received his PhD from the University of Edinburgh in May, 2013. His research interests are in data management and automata theory. He was the recipient of the Ramon Salas Award for the best Chilean work in engineering and the best paper award in ACM-PODS conference on 2011, and in 2014 he won the BCS distinguished dissertation competition and received the Cor Baayen Award from ERCIM, the European Research Consortium for Informatics and Mathematics. He has served on the program committees of various conferences and workshops, including SIGMOD and AAAI

Recent years have seen enormous progress in the development of ontologies, which are sharable, machine- readable domain conceptualizations. Ontologies are making the Web smarter, improving research and prac- tice in life-sciences, and opening door for a new generation of semantic awareness in information systems. Description Logics (DLs) are a well-established family of languages for Knowledge Representation and Rea- soning. They provide the formal foundations of the OWL languages for writing ontologies, and for the automated tools that ara available for creating and using these ontologies. In this tutorial, we will survey the basics of representing knowledge in DLs and get to know the DLs that underly the most popular OWL profiles. We will also give an overview of the main reasoning services needed for creating and using ontologies, and some of the algorithmic techniques that DL researchers have devised for providing these services.

Speaker Bio: Magdalena Ortiz is a tenure-track assistant professor for Knowledge Representation and Reasoning in the Institute of Information systems of TU Wien, where she is also a Hertha Firnberg Scholar and principal investigator in the project ”Recursive Queries over Semantically Enriched Data Repositories”. Her research interests are centered around logics for knowledge representation and reasoning, with focus on description logics and their application to data access and management. She has received several prizes and awards, including the EMCL Distinguished Alumni Award in for outstanding contributions to the field of Computational Logic, the Award of Excellence of the Austrian Federal Ministry for Science and Research, the Frderpreis of the Austrian Computer Society, the OeGAI Prize of the Austrian Society for Artiﬁcial Intelligence, and the Google Europe Anita Borg Memorial Scholarship.

Carlos Buil-Aranda

Universidad Técnica Federico Santa María

The amount of RDF Data available on the Web has increased dramatically over the last years. These RDF data is stored in distributed database systems allowing everybody to access it via dedicated query services called SPARQL endpoints. Examples of these endpoints are DBpedia (the Wikipedia in RDF format) or Bio2RDF a set of more than 30 distributed RDF databases storing dfferent types of biomedical data (such as medical publications data or gene data). This tutorial aims to provide participants with an overview SPARQL Query Federation, i.e. how to query all these RDF databases as if they were a single one. We will discuss the foundations of the W3C SPARQL Federated Query recommendation, we will provide several algorithms that can be implemented in order to allow such data federation andfinally we will present an overview of the systems implementing such algorithms.

Tutorial: Federation in SPARQL

Lise Getoor: University of California Santa Cruz

Speaker Bio: Carlos Buil-Aranda is an assistant professor researcher at Universidad Técnica Federico Santa María, Chile. His work is focused on federated query processing for SPARQL and trying to improve the SPARQL user queries by looking at the endpoints query logs. Carlos received his Ph.D. degree in

2012 from Universidad Politécnica de Madrid, and obtained the best Computer Science Ph.D. Thesis Award from that university. He received the Best Paper Award at the Extended Semantic Web Conference 2011 and the Best Evaluation Paper Award at the International Semantic Web Conference 2013.

Tutorial: Entity Resolution in Big Data

Lise Getoor: University of California Santa Cruz

Tutorial: Ontology Based Data Access

Diego Calvanese: Free University of Bozen-Bolzano

Entity resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a long-standing challenge in database management, information retrieval, machine learning, natural language processing and statistics. Accurate and fast entity resolution has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on entity resolution, there is still a surprising diversity of approaches, and lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is growing, as we are inundated with more and more data, all of which needs to be integrated, aligned and matched, before further utility can be extracted. In this tutorial, I’ll bring together perspectives on entity resolution from a variety of fields, including databases, information retrieval, natural language processing and machine learning, to provide, in one setting, a survey of a large body of work. I’ll discuss both the practical aspects and theoretical underpinnings of ER. I’ll describe existing solutions, current challenges and open research problems. In addition to giving attendees a thorough understanding of existing ER models, algorithms and evaluation methods, the tutorial will cover important research topics such as scalable ER, active and lightly supervised ER, and query-driven ER.

Speaker Bio: Lise Getoor is a Professor in the Computer Science Department at UC Santa Cruz. Her research areas include machine learning, data integration and reasoning under uncertainty, with an emphasis on graph and network data. She is a AAAI Fellow, serves on the Computing Research Association and International Machine Learning Society Boards, was co-chair of ICML 2011, is a recipient of an NSF Career Award and ten best paper and best student paper awards. She received her PhD from Stanford University, her MS from UC Berkeley, and her BS from UC Santa Barbara, and was a Professor at the University of Maryland, College Park from 2001-2013

Speaker Bio: Diego Calvanese is a full professor at the Research Centre for Knowledge and Data (KRDB), Faculty of Computer Science, Free University of Bozen-Bolzano, where he teaches graduate and undergrad- uate courses on knowledge bases and databases, ontologies, theory of computing, and formal languages. He received a PhD from Sapienza University of Rome in 1996. His research interests include formalisms for knowledge representation and reasoning, ontology based data acces and integration, description logics, Semantic Web, graph data management, data-aware process verification, and service modeling and synthesis.

He has been actively involved in several national and international research projects in the above areas (including FP6-7603 TONES, FP7-257593 ACSI, FP7-318338 Optique).

He is the author of more than 300 refereed publications, including ones in the most prestigious international journals and conferences in Databases and Artificial Intelligence, with more than 22000 citations and an h-index of 62, according to Google Scholar. He is one of the editors of the Description Logic Handbook. He has served in over 100 program committee roles for international events, and he is a member of the editorial board of JAIR and of Big Data Research.

In 2012-2013 he has been a visiting researcher at the Technical University of Vienna as Pauli Fellow of the ”Wolfgang Pauli Institute”. He is the program chair of the 34th ACM Symposium on Principles of Database Systems (PODS 2015), program co-chair of the 28th Description Logic Workshop (DL 2015), and the general chair of the 28th European Summer School in Logic, Language and Information (ESSLLI 2016). He has been nominated ECCAI Fellow in 2015.

Table 1: Schedule for AMW School 2016

______ ____________________________

______ ______________________________

Time Day 1
09:00–09:15 Introduction
09:15–10:30 Introduction to the Semantic Web

and the Web of Linked Data (Oscar Corcho)
10:30–10:45 Coffee
10:45-12:00 Introduction to Databases (Juan L. Reutter)
12:00-13:30 Lunch
13:30-14:45 An introduction to Description Logics

and Ontology Languages (Magdalena Ortiz)
14:45-16:00 Federation in SPARQL (Carlos Buil-Aranda)
16:00-16:15 Coffee
16:15-17:30 Big Data en la Web (Ricardo Baeza-Yates)

17:30-18:30 Poster

Time Day 2

09:00–10:30 Entity Resolution in Big Data I (Lise Getoor) (1.5)
10:30-10:45 Coffee
10:45-11:30 Entity Resolution in Big Data II

(Lise Getoor) (.75)
11:30-12:15 Ontology Based Data Access I

(Diego Calvanese) (.75)
12:30-14:00 Lunch
14:00-15:30 Ontology Based Data Access II

(Diego Calvanese) (1.5)
15:30-16:15 Distributed Web Search I (Ricardo Baeza-Yates) (.75)
16:15-16:30 Coffee
16:30-18:00 Distributed Web Search II (Ricardo Baeza-Yates) (1.5)

Schedule

Each tutorial will last three hours with a 15 minute break in the middle. Lunch is given additional length to allow for discussion amongst participants. On the first evening, there will be one hour of lightening talks, where all participants will present a short slot on the topic of their interest; this slot was very successful last year, where all postgraduate students in attendence spoke briefly about their topics, later connecting with more senior researchers to talk further. On the second evening, we plan to hold an informal social event.

If you have any questions about the event, please don’t hesitate to contact the AMWS chairs:

Juan Sequeda (mailto: juanfederico@gmail.com ) and Domagoj Vrgoc (mailto: domagojvrgoc@gmail.com ).