How to: PostgreSQL Fuzzy String Matching In YugabyteDB

Before analyzing a large dataset that contains textual information, it’s important to scrub it and eliminate duplicates when necessary. To remove duplicates, you may need to compare strings referring to the same thing, but that may be written slightly different, have typos or were misspelled. Alternatively, you might need to join two tables on a column (let’s say on company name), and these can appear slightly different in both tables.

Fuzzy String Matching (or Approximate String Matching) is the process of finding strings that approximately match a pattern.

