Software for the n-grams retrieval and analysis. By means of the Risk Ratio or other collocability metrics the software analyzes the corpus and determines which n-grams are characteristic for a given words or lemmata. Engrammer is a single-purpose tool allowing for the extraction of n-grams containing a specific word or lemma, the rest of the slots being open. It was developed with the following questions in mind:

  1. what lexical patterns is a given word involved in; i.e. which n-grams are disproportionately collocated with a given word form/lemma?
  2. what are the contexts of these n-grams?
  3. what other words/lemmata collocate with these specific n-grams and what are their contexts?
The tool operates as a standalone desktop application with no software dependencies, the installation is thus very smooth. It is not necessary to upload corpora via the Internet, all data are stored locally, suiting users working with custom corpora under copyright. We have aimed to make the tool easy to operate, focussing on a user-friendly graphical interface and ensuring a quick real-time response even for medium-sized corpora (e.g. BNC). Its collocation metrics are based on effect size metrics rather than null hypothesis statistical testing. Crucially, Engrammer is designed to accommodate a range of typologically different languages. The application provides a graphic user interface (documentation (not yet)).
The application is a freeware (licence).

Download: Engrammer (Win 64bit) 6 MB.