Python: find closest string (from a list) to another string

Python: find closest string (from a list) to another string

Use difflib.get_close_matches.

>>> words = [hello, Hallo, hi, house, key, screen, hallo, question, format]
>>> difflib.get_close_matches(Hello, words)
[hello, Hallo, hallo]

Please look at the documentation, because the function returns 3 or less closest matches by default.

There is an awesome article with a complete source code (21 lines) provided by Peter Norvig on spelling correction.

http://norvig.com/spell-correct.html

The idea is to build all possible edits of your word,

hello - helo   - deletes    
hello - helol  - transpose    
hello - hallo  - replaces    
hello - heallo - inserts    


def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

Now, look up each of these edits in your list.

Peters article is a great read and worth reading.

Python: find closest string (from a list) to another string

Create a sorted list of your words and use the bisect module to identify the point in the sorted list where your word would fit according to the sorting order. Based on that position you can give the k nearest neighbours above and below to find the 2k closest words.

Leave a Reply

Your email address will not be published. Required fields are marked *