A Toolbox for Record Linkage

Authors

  • Rainer Schnell University of Konstanz
  • Tobias Bachteler University of Konstanz
  • Stefan Bender Institute for Employment Research (IAB)

DOI:

https://doi.org/10.17713/ajs.v33i1&2.434

Abstract

We developed a record-linkage toolbox in order to compare the performance of various string-similarity measures for German surnames. This ”Matching Tool-Box” (MTB) is made up by independent, highly portable JAVA-programs. MTB is currently used for prototyping pre-processing tools and the empirical comparison of string-similarity measures. Furthermore, MTB has been used successfully in sociological, economical and epidemiological research projects.

References

Borgman, C. L., & Siegfried, S. L. (1992). Getty’s synoname and its cousins: A survey of applications of personal name-matching algorithms. Journal for the American Society of Information Science, 43(7), 459-476.

Fair, M. (1995). An overview of record linkage in canada. In Proceedings of the social statistics section (p. 25-33). American Statistical Association.

Fürnrohr, M., Rimmelspacher, B., & Roncador, T. von. (2002). Zusammenführung von Datenbeständen ohne numerische Identifikatoren: ein Verfahren im Rahmen der Testuntersuchungen zu einem registergestützten Zensus. Bayern in Zahlen(7), 308-

Gill, L. (2001). Methods for automatic record matching and linkage and their use in national statistics. Norwich: HMSO.

Guth, G. J. A. (1976). Surname spellings and computerized record linkage. Historical Methods Newsletter, 10(1), 10-19.

Hirschberg, D. S. (1977). Algorithms for the longest common subsequence problem. Journal of the Association for Computing Machinery, 24(4), 664-675.

Knuth, D. E. (1998). Sorting and searching. In The art of computer programming (2. ed., Vol. 3, p. 394-395). Reading/Mass.: Addison-Wesley.

Lait, A., & Randell, B. (1996). An assessment of name matching algorithms (Tech. Rep. No. 550). Department of Computing Science, University of Newcastle upon Tyne.

Monge, A. E., & Elkan, C. P. (1996). The field-matching problem. algorithms and applications. In E. Simoudis, J. Han, & U. Fayyad (Eds.), Proceedings of the second international conference on knowledge discovery and data mining: Kdd-96 (p. 267-270). Menlo Park: AAAI Press.

Philips, L. (1990). Hanging on the metaphone. Computer Language, 7(12), 39-43.

Philips, L. (2000). The double metaphone search algorithm. C/C++ Users Journal, 18(6).

Pollock, J. J., & Zamora, A. (1984). Automatic spelling correction in scientific and scholarly text. Communications of the Association of Computer Machinery, 27(4), 358-368.

Porter, E. H., & Winkler, W. E. (1997). Approximate string comparison and its effect on an advanced record linkage system. In W. Alvey & B. Jamerson (Eds.), Record linkage techniques: Proceedings of an international workshop and exposition. (p.

-199). Arlington, VA.: Office of Management and Budget.

Postel, H. J. (1969). Die Kölner Phonetik. Ein Verfahren zur Identifizierung von Personennamen auf der Grundlage der Gestaltanalyse. IBM-Nachrichten, 19, 925-931.

Reth, H.-P. von, & Schek, H.-J. (1977). Eine Zugriffsmethode für die phonetische Ähnlichkeitssuche (technical report No. 77.03.002). Heidelberg: IBM Scientific Center.

Schnell, R., Bachteler, T., & Bender, S. (2003). Record linkage using error prone strings. In Proceedings of the joint statistical meeting (p. 3713-3717). American Statistical Association.

Taft, R. L. (1970). Name searching techniques. Albany, N.Y.: Bureau of Systems Development.

Ukkonen, E. (1985). Algorithms for approximate string matching. Information and Control, 64(1-3), 100-118.

Ukkonen, E. (1992). Approximate string matching with q-grams and maximal matches. Theoretical Computer Science, 92(1), 191-211.

Downloads

Published

2016-04-03

How to Cite

Schnell, R., Bachteler, T., & Bender, S. (2016). A Toolbox for Record Linkage. Austrian Journal of Statistics, 33(1&2), 125–133. https://doi.org/10.17713/ajs.v33i1&2.434

Issue

Section

Articles