refinr

From MaRDI portal
Software:110667



CRANrefinrMaRDI QIDQ110667FDOQ110667

Cluster and Merge Similar Values Within a Character Vector

Chris Muir

Last update: 12 November 2023

Copyright license: GNU General Public License, version 3.0

Software version identifier: 0.3.2, 0.2.0, 0.3.0, 0.3.1, 0.3.3

These functions take a character vector as input, identify and cluster similar values, and then merge clusters together so their values become identical. The functions are an implementation of the key collision and ngram fingerprint algorithms from the open source tool Open Refine <https://openrefine.org/>. More info on key collision and ngram fingerprint can be found here <https://openrefine.org/docs/technical-reference/clustering-in-depth>.





This page was built for software: refinr