Wikipedia Corpus Please read the licence information carefully when downloading data. |
The Wesbury Lab Wikipedia corpus (2010)
This corpus was created from a snapshot of all the articles in the English part of the Wikipedia that was taken in April 2010. It was processed, as described in detail below, to remove all links and irrelevant material (navigation text, etc) The corpus is untagged, raw text. It may be neccessary to process the corpus further to put the corpus in a format that suits your needs. Method
of Transmission: This file is now hosted on a cloud-based web download service. Please fill in the form below to be taken to the download page.
Data size: over 6Gb raw, 1.8Gb bzip compressed (delivered as a single file) Citation: Shaoul, C. & Westbury C. (2010) The Westbury Lab Wikipedia Corpus, Edmonton, AB: University of Alberta (downloaded from http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html) Also, please read about the Wikipedia CC license. Download the WestburyLab Wikipedia corpus:Please fill out this form so that we can keep track of who has downloaded the corpus. The information that you enter below will be kept completely confidential. Please make sure to enter a valid e-mail address.
|
©2010,2011,2012,2013
WestburyLab chrisw at
ualberta
dot ca
|