Lablogo
Data

Please read the licence information carefully when downloading data.


HiDEx  Current version 0.08 (Released March, 2014)

The High Dimensional Explorer is a piece of software that can process a corpus of text and create word vectors that contain co-occurrence information for all the words in the corpus. It allows the user to build and analyze many variations of the HAL model. It has some features that make it very useful for measuring the similarity of words in terms of their contexts. In particular, it can hand very large corpora and very large word vectors. It also takes advantage of multiple CPUs or cores to speed up processing.  For more informations about HiDEx and to learn about its hardware and software requirements, please read the document below:

HiDEx User manual (READ THIS BEFORE DOWNLOADING!)

What is ithe current version of HiDEx?
In March 2014, a new version of HiDEx was released. It does not contain any new features, but it does contain fixes of all known bugs, and supports upper and lowercase word lists. This new version, v0.08, is the version you can download below.

Some key points from the User Manual to think about before downloading:
Before using HiDEx, there is some other software  that you might want to obtain:
  • OS support: Linux v 2.6+, Mac OS X 10.4.11+ (other Unix systems have not been testing, but they might work. Windows does not work. Period.)
  • For compiling HiDEx, you will need gcc 4.2 or newer. Check your compiler version with the command g++ -v. If the version is less than 4.2, get the latest version of gcc from here.
What do I need besides the HiDEx source code to get started?
HiDEx needs some data to work on before it can produce measurements. We strongly reccomend using HiDEx to process a corpus, but there is a way to begin using HiDEx without a corpus.
  • Do you want to avoid downloading and processing a corpus? For a quick way to get started, download a pre-built set of word vectors. We provide one for free right here on this site. It is small This does not allow much flexibility to set normalization parameters, but can be used to get similarity measures.
  • For the most power and flexibility, you will need a corpus of some kind. We provide two free CC-licensed corpora on this site. The smaller, easier to use corpus is the Wikipedia corpus mentioned above. It is available here. We also have a corpus that contains many billions of words of USENET discussions (split into 200 million word segments). Download the USENET corpus here.

Warning: This software is may contain bugs. We appreciate your patience and support as we work to improve this software. If you would like to collaborate and make changes to HiDEx, we have set up a repository that is available for you to use. Just fork the repository, make changes and then tell us what you did!

The repository is available here:

http://github.com/cyrus/high-dimensional-explorer/

Do you already use git? Then do this to get the source code:

git clone git://github.com/cyrus/high-dimensional-explorer.git

Note: This is the most up-to-date version of HiDEx. It contains new features added in 2014. Please make sure to use this version if you are processing non-ASCII data, as this version contains Unicode support, whereas previous versions did not.

NEW (May 2014): A kind contributor has created a python-based front end for HiDEx. If you love python, try it out!


Citation:
 Shaoul, C. & Westbury, C. (2006). Word Frequency Effects in High-Dimensional Co-Occurrence Models: A New Approach. Behavior Research Methods, 38:2, 190–195

and

Citation:
 Shaoul, C. & Westbury, C. (2010). Exploring lexical co-occurrence space using HiDEx. Behavior Research Methods, 42:2, 393-413.

Acknowledgments: This research was supported by NSERC and TAPoR.

If you have any questions about this software, please contact Cyrus Shaoul

PLEASE NOTE: Software License = GPL v3!
The GPL v3 license for this software in included in the file you will download below.

Please fill out this form so that we can keep track of who has downloaded this file and tell you of future releases.

Full Name:
Email Address:
Organization:
What do you intend to use the HiDEx for?
Comments/Questions:



 


©2005,2006,2007,2008,2009,2010,2011,2012,2013,2014  WestburyLab   chrisw at ualberta dot ca