string – how to get database of all peoples names (or at least English common ones)?

string – how to get database of all peoples names (or at least English common ones)?

You can use a statistical Named Entity Recognizer (NER), such as Stanfords NER, or LingPipes. These are machine learning-based recognizers, that do not require huge dictionaries of names as input.

Alternatively, you can get a list of person names from the Web (there are plenty), and use the Aho-Corasick string searching algorithm to efficiently extract names from the list from text.

If youre on a *nix system, try looking at /usr/share/dict/propernames. Mac OS X has it, and I think at least Ubuntu does too.

You could use this with grep:

grep -f /usr/share/dict/propernames short_text.txt

string – how to get database of all peoples names (or at least English common ones)?

I found this reference: Extracting people’s names from RSS feeds using WordNet

Leave a Reply

Your email address will not be published.