|Word Vector Data
Please read the licence information carefully when downloading data.
Word vectors created
from a Wikipedia corpus
This page allows for the download of pre-computed word vector set that is designed to be used with the HiDEx system. It contains co-occurrence vectors for 57,377 words. For more info on HiDEx, or to download the software, please click here.
Processing: The corpus used to create this vector set is the free WestburyLab Wikipedia corpus. It was processed using HiDEx with a 5 word forward window and a 5 word word backward window. We uses a inverse linear ramp for the weighting scheme, and the normalization was done using the PPMI method.
of Transmission: Direct HTTP download.
Vector Size: 57,377 vectors of 20,000 elements each
Source Corpus Size: over 900 million words in over 2 million documents (The WestburyLab Wikipedia corpus)
Citation: Shaoul, C. & Westbury C. (2010) Word Vectors from the 2010 Westbury Lab Wikipedia corpus. Edmonton, AB: University of Alberta (downloaded from http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.vectorset.download.html)
Acknowledgments: This work would not have been possible without the hardware and software provided by the TaPoR project. This research is also supported by NSERC.
If you have any questions about this data, please contact Cyrus Shaoul
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.
Please fill out this form so that we can keep track of who has downloaded this file.
©2005,2006,2007,2008,2009,2010,2011,2012,2013 WestburyLab chrisw at ualberta dot ca