stringdist

From MaRDI portal
Revision as of 21:55, 5 March 2024 by Import240305080343 (talk | contribs) (Created automatically from import240305080343)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Software:45722



swMATH34013CRANstringdistMaRDI QIDQ45722

Approximate String Matching, Fuzzy Text Search, and String Distance Functions

Mark P. J. van der Loo

Last update: 28 November 2023

Copyright license: GNU General Public License, version 3.0

Software version identifier: 0.9.10, 0.4-0, 0.4-2, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.8.2, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4.1, 0.9.4.2, 0.9.4.4, 0.9.4.5, 0.9.4.6, 0.9.4.7, 0.9.4, 0.9.5.0, 0.9.5.1, 0.9.5.2, 0.9.5.3, 0.9.5.5, 0.9.6.3, 0.9.6, 0.9.7, 0.9.8, 0.9.9, 0.9.12


Source code repository: https://github.com/cran/stringdist

Implements an approximate string matching version of R's native 'match' function. Also offers fuzzy text search based on various string distance measures. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using 'openMP'. An API for C or C++ is exposed as well. Reference: MPJ van der Loo (2014) <doi:10.32614/RJ-2014-011>.