Westbury Lab: Orthographic Neighborhoods for over 111,000 English words

Processing: Orthographic frequencies were counted in the multi-billion word Westbury Lab USENET corpus. Then the Westbury Lab's freely available LINGUA software was used to tabulate the orthographic neighbours for all words in a large dictionary of English words. The standard output of LINGUA includes the following fields for each word in the lexicon:

    WORD:        The word in question.
    ONSIZE:    The number of orthographic neighbors of the word in question.
    ONFREQ:    The average of the orthographic frequencies of all the orthographic neighbors of the word in question.
    [NEIGHBOURFREQS]:   A list of the orthographic frequencies of all the orthographic neighbors of the word in question. [Variable in length]
    [NEIGHBOURS]:     A list of the orthographic neighbors of the word in question. [Variable in length]

List size: 111,624 words

Citation: Westbury, C. & Shaoul, C. (2007) Orthographic Neighborhoods for over 111,000 English words Edmonton, AB: University of Alberta (downloaded from http://www.psych.ualberta.ca/~westburylab/downloads/ON.download.html)

Acknowledgments: This work would not have been possible without the hardware and software provided by the TaPoR project. This research is also supported by NSERC.

If you have any questions about this data, please contact Cyrus Shaoul

PLEASE NOTE:

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.

Download the list:

Please fill out this form so that we can keep track of who has downloaded this file.