Ontology matching (2nd edition)

Ontologies tend to be found everywhere. They are viewed as the silver bullet for many applications, such as database integration, peer-to-peer systems, e-commerce, semantic web services, or social networks. However, in open or evolving systems, such as the semantic web, different parties would, in general, adopt different ontologies. Thus, merely using ontologies, like using XML, does not reduce heterogeneity: it just raises heterogeneity problems to a higher level.

Euzenat and Shvaiko's book is devoted to ontology matching as a solution to the semantic heterogeneity problem faced by computer systems. Ontology matching aims at finding correspondences between semantically related entities of different ontologies. These correspondences may stand for equivalence as well as other relations, such as consequence, subsumption, or disjointness, between ontology entities. Many different matching solutions have been proposed so far from various viewpoints, e.g., databases, information systems, and artificial intelligence.

The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more 150 than pages of new content. In particular, the book includes a new chapter dedicated to the methodology for performing ontology matching. It also covers emerging topics, such as data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user involvement in matching, to mention a few. More than 100 state-of-the-art matching systems and frameworks were reviewed.

With Ontology Matching, researchers and practitioners will find a reference book that presents currently available work in a uniform framework. In particular, the work and the techniques presented in this book can be equally applied to database schema matching, catalog integration, XML schema matching and other related problems. The objectives of the book include presenting (i) the state of the art and (ii) the latest research results in ontology matching by providing a systematic and detailed account of matching techniques and matching systems from theoretical, practical and application perspectives.

1.1 Ontology engineering

1.1.1 Ontology editing and import

1.1.2 Ontology evolution and versioning

1.2 Information integration

1.2.1 Schema integration

1.2.2 Catalogue integration

1.2.3 Data integration

1.3 Linked data

1.4 Peer-to-peer information sharing

1.4.1 Semantic P2P systems

1.4.2 Emergent semantics between peers

1.5 Web service composition

1.6 Autonomous communication systems

1.6.1 Multiagent communication

1.6.2 Matching contexts in ambient computing

1.7 Navigation and query answering on the web

1.7.1 Navigation on the semantics web

1.7.2 Query answering on the web

1.7.3 Query answering on the deep web

1.8 Summary

2.1 Vocabularies, schemas and ontologies

2.1.1 Tags and folksonomies

2.1.2 Directories

2.1.3 Relational database schemas

2.1.4 XML schemas

2.1.5 Conceptual models

2.1.6 Ontologies

2.2 Ontology language

2.2.1 Ontology entities

2.2.2 Ontology language semantics

2.3 Types of heterogeneity

2.4 Terminology

2.5 The ontology matching problem

2.5.1 The matching process

2.5.2 Structure of an alignment

2.5.3 Towards a semantics for matching and alignments

2.6 Summary

3.1 The alignment life cycle

3.2 Identifying ontologies and characterising needs

3.3 Retrieving existing alignments

3.4 Selecting and composing a matcher

3.5 Matching ontologies

3.6 Evaluating alignments

3.7 Enhancing alignments

3.8 Storing and sharing

3.9 Rendering and processing alignments

3.10 Summary

4.1 Matching dimensions

4.1.1 Input dimensions

4.1.2 Process dimensions

4.1.3 Output dimensions

4.2 Classification of matching approaches

4.2.1 Methodology

4.2.2 Granularity/Input interpretation

4.2.3 Origin/Kind of input

4.3 Classes of concrete techniques

4.3.1 Element-level techniques

String-based techniques

Language-based techniques

Constraint-based techniques

Informal resource-based techniques

Formal resource-based techniques

4.3.2 Structure-level techniques

Graph-based techniques

Taxonomy-based techniques

Model-based techniques

Instance-based techniques

4.4 Other classifications

4.5 Summary

5.1 Similarity, distances and other measures

5.2 Name-based techniques

5.2.1 String-based methods

Normalisation

String equality

Substring test

Edit distance

Token-based distances

Path comparison

Summary on string-based methods

5.2.2 Language-based methods

Intrinsic methods: Linguistic normalisation

Extrinsic methods

Multilingual methods

Summary on linguistic methods

5.3 Internal structure-based techniques

5.3.1 Property comparison and keys

5.3.2 Data type comparison

5.3.3 Domain comparison

5.3.4 Comparing multiplicities and properties

5.3.5 Other features

Summary on internal structure-based techniques

5.4 Extensional techniques

5.4.1 Common extension comparison

Formal concept analysis

5.4.2 Instance identification techniques

Linkkey extraction

Similarity-based instance matching

5.4.3 Disjoint extension comparison

Statistical approach

Similarity-based extension comparison

Matching-based comparison

Summary on extensional techniques

5.5 Summary

6.1 Relational techniques

6.1.1 Taxonomic structure

6.1.2 Mereologic structure

6.1.3 Relations

6.1.4 Pattern-based matching

Summary on relational techniques

6.2 Iterative similarity computation

6.2.1 Similarity flooding

6.2.2 Similarity equation fixed point

Summary on global similarity computation

6.3 Matching as optimisation

6.3.1 Expectation maximisation

6.3.2 Particle swarm optimisation

Summary on optimisation techniques

6.4 Probabilistic matching

6.4.1 Bayesian networks

6.4.2 Markov networks and Markov logic networks

Summary on probabilistic matching

6.5 Semantic techniques

6.5.1 Propositional techniques

6.5.2 Description logic techniques

Summary on semantic techniques

6.6 Summary

7.1 Ontology partitioning and search-space pruning

7.1.1 Partitioning

7.1.2 Search-space pruning

7.2 Matcher composition

7.3 Context-based matching

7.4.1 Weighting

Triangular norms

Multidimensional distances and weighted sums

Fuzzy aggregation and weighted average

Harmonic adaptive weighted sum

Ordered weighted average

7.4.2 Voting

Dempster-Shafer theory

7.4.3 Arguing

Summary on similarity and alignment aggregation

7.5 Matching learning

7.5.1 Bayes learning

7.5.2 WHIRL learner

7.5.3 Neural networks

7.5.4 Support vector machines

7.5.5 Decision trees

Summary on matcher learning

7.6 Matcher tuning

7.6.1 Stacked generalisation

7.6.2 Genetic algorithms

Summary on matcher tuning

7.7 Alignment extraction

7.7.1 Thresholds

7.7.2 Strengthening and weakening

7.7.3 Optimising the result

Summary on alignment extraction

7.8 Alignment improvement

7.8.1 Alignment disambiguation

7.8.2 Alignment debugging

Summary on alignment improvement

7.9 Summary

8.1 Schema-based systems

8.1.1 DELTA (The MITRE Corporation)

8.1.2 Hovy (University of Southern California)

8.1.3 TransScm (Tel Aviv University)

8.1.4 DIKE (Università di Reggio Calabria and Università di Calabria)

8.1.5 SKAT and ONION (Stanford University)

8.1.6 Artemis (Università di Milano and Università di Modena e Reggio Emilia)

8.1.7 H-Match (Università degli Studi di Milano)

8.1.8 Tess (University of Massachusetts)

8.1.9 Anchor-Prompt (Stanford Medical Informatics)

8.1.10 OntoBuilder (Technion Israel Institute of Technology)

8.1.11 Cupid (University of Washington, Microsoft Corporation and University of Leipzig)

8.1.12 COMA and COMA++ (University of Leipzig)

8.1.13 QuickMig (SAP, Universität Leipzig)

8.1.14 Similarity flooding (Stanford University and University of Leipzig)

8.1.15 XClust (National University of Singapore)

8.1.16 MapOnto (University of Toronto and Rutgers University)

8.1.17 CtxMatch and CtxMatch2 (University of Trento and ITC-IRST)

8.1.18 S-Match (University of Trento)

8.1.19 HCONE (University of the Aegean)

8.1.20 MoA (Electronics and Telecomunication Research Institute, ETRI)

8.1.21 ASCO (INRIA Sophia-Antipolis)

8.1.22 Stroulia & Wang (University of Alberta)

8.1.23 MWSDI (University of Georgia)

8.1.24 SeqDisc (University of Leipzig, Queensland University of Technology, University of Magdeburg)

8.1.25 BayesOWL and BN mapping (University of Maryland)

8.1.26 OMEN (The Pennsylvania State University and Stanford University)

8.1.27 DCM framework (University of Illinois at Urbana-Champaign)

8.1.28 HSM (Hong Kong University of Science and Technology, City University of Hong Kong)

8.1.29 CBW (Sharif University of Technology, Tehran Institute for Studies in Theoretical Physics and Mathematics)

8.1.30 GeRoMeSuite (RWTH Aachen University)

8.1.31 AOAS (US National Library of Medicine)

8.1.32 Scarlet (The Open University)

8.1.33 OMviaUO (Università di Genova, Universidad Politécnica de Valencia)

8.1.34 BLOOMS/BLOOMS+ (Wright State University, Accenture Technology Labs and Ontotext AD)

8.1.35 CIDER (Universidad Politécnica de Madrid, University of Zaragoza)

8.1.36 Elmeleegy and colleagues (Purdue University)

8.1.37 BeMatch (Versailles Saint-Quentin en Yvelines, University of Cauca)

8.1.38 PORSCHE (University of Montpellier, ETH Zurich)

8.1.39 MatchPlanner (University of Montpellier)

8.1.40 Anchor-Flood (Toyohashi University of Technology)

8.1.41 Lily (Southeast University, Nanjing University)

8.1.42 AgreementMaker (University of Illinois at Chicago)

8.1.43 Homolonto (University of Lausanne, Swiss Institute of Bioinformatics)

8.1.44 DSSim (Open University, Poznan University of Economics)

8.1.45 MapPSO (FZI Research Center for Information Technology, Griffith University)

8.1.46 TaxoMap (University of Paris-Sud 11, INRIA)

8.1.47 iMatch (Ben-Gurion University)

8.2 Instance-based systems

8.2.1 T-tree (INRIA Rhône-Alpes)

8.2.2 CAIMAN (Technische Universität M¨nchen and Universität Kaiserslautern)

8.2.3 FCA-merge (University of Karlsruhe)

8.2.4 LSD (University of Washington)

8.2.5 GLUE (University of Washington)

8.2.6 iMAP (University of Illinois and University of Washington)

8.2.7 Automatch (George Mason University)

8.2.8 SBI&NB (The Graduate University for Advanced Studies)

8.2.9 Kang and Naughton (University of Wisconsin-Madison)

8.2.10 Dumas (Technische Universität Berlin and Humboldt-Universität zu Berlin)

8.2.11 Wang and colleagues (Hong Kong University of Science and Technology and Microsoft Research Asia)

8.2.12 sPLMap (University of Duisburg-Essen, and ISTI-CNR)

8.2.13 FSM (Poland National Institute of Telecommunications, Humboldt-Universität zu Berlin, Max Plank Institute for ComputerScience)

8.2.14 VSBM & GBM (École Centrale Paris)

8.2.15 ProbaMap (Université de Grenoble)

8.3 Mixed, schema-based and instance-based systems

8.3.1 SEMINT (Northwestern University, NEC and The MITRE Corporation)

8.3.2 IF-Map (University of Southampton and University of Edinburgh)

8.3.3 NOM and QOM (University of Karlsruhe)

8.3.4 oMap (CNR Pisa)

8.3.5 Xu and Embley (Brigham Young University)

8.3.6 Wise-Integrator (SUNY at Binghamton, University of Illinois at Chicago and University of Louisiana at Lafayette)

8.3.7 IceQ (University of Illinois at Urbana-Champaign, University of Illinois at Chicago, SUNY at Binghamton)

8.3.8 OLA (INRIA Rhône-Alpes and Université de Montréal)

8.3.9 Falcon-AO (China Southeast University)

8.3.10 RiMOM (Tsinghua University)

8.3.11 Corpus-based matching (University of Washington, Microsoft Research and University of Illinois)

8.3.12 iMapper (Norwegian University of Science and Technology)

8.3.13 SAMBO (Linköpings University)

8.3.14 AROMA (University of Nantes, INRIA)

8.3.15 ILIADS (University of Maryland, University of Toronto)

8.3.16 SeMap (Georgia Tech, University of British Columbia)

8.3.17 ASMOV (INFOTECH Soft, Inc., University of Miami)

8.3.18 HAMSTER (University of Michigan, Microsoft Research)

8.3.19 Smart Matcher (Vienna University of Technology)

8.3.20 GEM/Optima/Optima+ (University of Georgia, Wright State University)

8.3.21 CSR (University of the Aegean, Institution of Informatics and Telecommunications)

8.3.22 Prior+ (SAP Labs, Yahoo!, University of Pittsburgh)

8.3.23 YAM & YAM++ (University of Montpellier, University of Toronto)

8.3.24 MoTo (University of Bari)

8.3.25 CODI (Universität Mannheim)

8.3.26 LogMap (University of Oxford)

8.3.27 PARIS (INRIA, Télécom ParisTech)

8.4 Metamatching systems

8.4.1 APFEL (University of Karlsruhe and University of Koblenz-Landau)

8.4.2 LCS (Queen's University Belfast)

8.4.3 Besana and Robertson (University of Edinburgh)

8.4.4 eTuner (University of Illinois and The MITRE Corporation)

8.4.5 mSeer (University of Wisconsin-Madison, The MITRE Corporation)

8.4.6 GOALS (Gecad -- Polytechnic of Porto)

8.4.7 ContentMap (Universitat Jaume I, University of Oxford)

8.4.8 SMB (Technion Israel Institute of Technology)

8.4.9 AMC (SAP Research, University of Leipzig)

8.4.10 AMS (SAP Research, Dresden University of Technology, University of Leipzig)

8.5 Summary

9.1 Evaluation principles

9.1.1 Goals

9.1.2 Principles

9.1.3 Examples of evaluations

Text REtrieval Conference

Ontology Alignment Evaluation Initiative

9.1.4 Types of evaluations

9.1.5 Automation

9.2 Data sets for evaluation

9.2.1 Dimensions and variability of alignment evaluation

Input ontologies

Input alignment

Parameters and resources

Output alignment

Matching process

9.2.2 Examples of data sets

OAEI systematic benchmark suite

Large scale ontology sets

Directory sets

Thesauri

Other test collections

9.2.3 Test generation

9.3 Evaluation measures

9.3.1 Compliance measures

9.3.2 Generalising precision and recall

Weighted precision and recall

Relaxed precision and recall

Semantic precision and recall

9.3.3 Sampling and relative precision and recall

9.3.4 Performance measures

Speed

Network

Memory

Scalability

9.3.5 User-related measures

Level of user input effort

Oracle-based measures

General subjective satisfaction

9.4 Application-specific evaluation

9.4.1 Aggregating measures

9.4.2 Evaluation setting

9.5 Summary

10.1 Alignment formats

10.1.1 MAFRA Semantic bridge ontology (SBO)

10.1.2 OWL

10.1.3 Contextualized OWL (C-OWL)

10.1.4 SWRL and RIF

10.1.5 Alignment format

10.1.6 Expressive and declarative ontology alignment language (EDOAL)

10.1.7 SKOS

Concept and relation descriptions

Mapping vocabulary

10.1.8 Comparison of existing formats

10.2 Alignment metadata

10.2.1 Identification metadata

10.2.2 Provenance metadata

10.2.3 Qualification metadata

10.3 Alignment frameworks

10.3.1 Model management

10.3.2 COMA++ (University of Leipzig)

10.3.3 GOMMA (University of Leipzig)

10.3.4 MAFRA (Instituto Politecnico do Porto and University of Karlsruhe)

10.3.5 The Protégé Prompt Suite (Stanford University)

10.3.6 Alignment API and implementation (INRIA)

Classes

Functions

10.3.7 FOAM (University of Karlsruhe)

10.3.8 Harmony (MITRE)

10.3.9 The NeOn Toolkit alignment plug-in

10.4 Summary

11.1 Individual matching

11.1.1 Providing input

11.1.2 Manual matcher composition

11.1.3 Relevance feedback

11.2 Collective matching

11.2.1 Community-driven ontology matching

11.2.2 Crowdsourcing ontology matching

11.3 Explaining alignments

11.3.1 Explanation approaches

The proof presentation approach

The strategic flow approach

The argumentation approach

11.3.2 A default explanation

The S-Match example

The iMAP example

An argumentation example

11.3.3 Explaining basic matchers

11.3.4 Explaining the matching process

Dependency graphs

Explaining logical reasoning

11.4 Alignment editors and visualisers

11.4.1 WSMT (DERI, University of Innsbruck)

11.4.2 Muse (University of California, University of Toronto)

11.4.3 iMerge (Duisbourg U.)

11.4.4 Chimaera (Stanford University)

11.4.5 iPrompt (Stanford University)

11.4.6 AlViz (Vienna University of Technology, Norwegian University of Science and Technology)

11.4.7 CogZ (University of Victoria)

11.5 Summary

12.1 Ontology merging

OntoMerge (Yale University and University of Oregon)

12.2 Ontology transformation

12.3 Data translation

Clio (IBM Almaden and University of Toronto)

Spicy (Università della Basilicata, ICAR-CNR)

12.4 Data interlinking

KnoFuss (The Open University)

Silk (Chemnitz University of Technology, Freie Universität Berlin)

12.5 Mediation

12.6 Reasoning

12.7 Alignment services and repositories

BioPortal (Stanford University)

Alignment server (INRIA)

CATCH (Vrije Universiteit)

12.8 Alignment evolution

ToMAS (University of Toronto and IBM Almaden)

12.9 Summary

13.1 A brief outlook of the trends in the field

13.2 Future challenges

13.2.1 Large-scale and efficient matching

13.2.2 Matching with background knowledge

13.2.3 Matcher selection, combination and tuning

13.2.4 User involvement

13.2.5 Social and collaborative matching

13.2.6 Uncertainty in ontology matching

13.2.7 Reasoning with alignments

13.2.8 Alignment management: infrastructure and support

13.3 Final words

Index

A fully searchable index of the book is available. It covers more terms and more references than the index published with the book.

Ontology Matching

Presentation

Table of contents

Index

Glossary

Errata

First edition

Bibliography

BibTeX entry