A Toolbox for Record Linkage
DOI:
https://doi.org/10.17713/ajs.v33i1&2.434Abstract
We developed a record-linkage toolbox in order to compare the performance of various string-similarity measures for German surnames. This ”Matching Tool-Box” (MTB) is made up by independent, highly portable JAVA-programs. MTB is currently used for prototyping pre-processing tools and the empirical comparison of string-similarity measures. Furthermore, MTB has been used successfully in sociological, economical and epidemiological research projects.References
Borgman, C. L., & Siegfried, S. L. (1992). Getty’s synoname and its cousins: A survey of applications of personal name-matching algorithms. Journal for the American Society of Information Science, 43(7), 459-476.
Fair, M. (1995). An overview of record linkage in canada. In Proceedings of the social statistics section (p. 25-33). American Statistical Association.
Fürnrohr, M., Rimmelspacher, B., & Roncador, T. von. (2002). Zusammenführung von Datenbeständen ohne numerische Identifikatoren: ein Verfahren im Rahmen der Testuntersuchungen zu einem registergestützten Zensus. Bayern in Zahlen(7), 308-
Gill, L. (2001). Methods for automatic record matching and linkage and their use in national statistics. Norwich: HMSO.
Guth, G. J. A. (1976). Surname spellings and computerized record linkage. Historical Methods Newsletter, 10(1), 10-19.
Hirschberg, D. S. (1977). Algorithms for the longest common subsequence problem. Journal of the Association for Computing Machinery, 24(4), 664-675.
Knuth, D. E. (1998). Sorting and searching. In The art of computer programming (2. ed., Vol. 3, p. 394-395). Reading/Mass.: Addison-Wesley.
Lait, A., & Randell, B. (1996). An assessment of name matching algorithms (Tech. Rep. No. 550). Department of Computing Science, University of Newcastle upon Tyne.
Monge, A. E., & Elkan, C. P. (1996). The field-matching problem. algorithms and applications. In E. Simoudis, J. Han, & U. Fayyad (Eds.), Proceedings of the second international conference on knowledge discovery and data mining: Kdd-96 (p. 267-270). Menlo Park: AAAI Press.
Philips, L. (1990). Hanging on the metaphone. Computer Language, 7(12), 39-43.
Philips, L. (2000). The double metaphone search algorithm. C/C++ Users Journal, 18(6).
Pollock, J. J., & Zamora, A. (1984). Automatic spelling correction in scientific and scholarly text. Communications of the Association of Computer Machinery, 27(4), 358-368.
Porter, E. H., & Winkler, W. E. (1997). Approximate string comparison and its effect on an advanced record linkage system. In W. Alvey & B. Jamerson (Eds.), Record linkage techniques: Proceedings of an international workshop and exposition. (p.
-199). Arlington, VA.: Office of Management and Budget.
Postel, H. J. (1969). Die Kölner Phonetik. Ein Verfahren zur Identifizierung von Personennamen auf der Grundlage der Gestaltanalyse. IBM-Nachrichten, 19, 925-931.
Reth, H.-P. von, & Schek, H.-J. (1977). Eine Zugriffsmethode für die phonetische Ähnlichkeitssuche (technical report No. 77.03.002). Heidelberg: IBM Scientific Center.
Schnell, R., Bachteler, T., & Bender, S. (2003). Record linkage using error prone strings. In Proceedings of the joint statistical meeting (p. 3713-3717). American Statistical Association.
Taft, R. L. (1970). Name searching techniques. Albany, N.Y.: Bureau of Systems Development.
Ukkonen, E. (1985). Algorithms for approximate string matching. Information and Control, 64(1-3), 100-118.
Ukkonen, E. (1992). Approximate string matching with q-grams and maximal matches. Theoretical Computer Science, 92(1), 191-211.
Downloads
Published
How to Cite
Issue
Section
License
The Austrian Journal of Statistics publish open access articles under the terms of the Creative Commons Attribution (CC BY) License.
The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.
Copyright on any research article published by the Austrian Journal of Statistics is retained by the author(s). Authors grant the Austrian Journal of Statistics a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its original authors, citation details and publisher are identified.
Manuscripts should be unpublished and not be under consideration for publication elsewhere. By submitting an article, the author(s) certify that the article is their original work, that they have the right to submit the article for publication, and that they can grant the above license.