Source Code Tour

bin/

The bin/ directory contains shell scripts to perform most upkeep tasks in AtD.  The most important file here is bin/all.sh which rebuilds all the AtD models.

data/

The data directory has the data files used to train the AI and build the rule model.

data/corpus_extra

Download the bootstrap data and place your extra writing samples into this folder. Plain text and HTML formatted text are both acceptable.

data/rules/

This folder contains the rules used by the AtD grammar and style checker.

data/tests/

This directory contains formatted text files used to train the AtD spell checker and misused word models.

lib/

The lib/ directory contains most of the stuff used by AtD during runtime.

lib/dictionary.sl

This has functions to load the spell checker dictionary and construct the Trie used to generate suggestions.  The Trie construction happens in Sleep.  Walking the Trie and generating suggestions happens in Java.  The edits2 function exists in this file as an easier to read reference on how the Trie walking algorithm works.

lib/engine.sl

This file contains the checkDocument function that is called when a document is checked.  The heart of iterating through a document and applying the rule engine and spell checker to it are contained here.

lib/fsm.sl

This file contains the abstract rule engine for AtD.  All rules are compiled into a tree structure and the deepest match is what is returned when running a sentence through it.

lib/neural.sl

Generic neural network code used by AtD.  The comments in the file show how to use the API.

lib/nlp.sl

This file contains the sentence and word segmentation code used by AtD.

lib/object.sl

This file contains a simple object system that some parts of AtD use.  This technique is covered in Chapter 5.3 of the Sleep 2.1 Manual.  If you see a function declared as sub someClass::functionName { } then you’ve encountered this system.  The neural network and some of the code in utils/ uses this technique.

lib/quality.sl

This file contains utility functions for generating quality statistics about a document.

lib/spellcheck.sl

The heart of the spell checker and misused word detector are in this file.

lib/tagger.sl

This file contains functions to tag a sentence with its parts-of-speech.

lib/wordforms.sl

This file contains tools to stem words and convert between different tenses and counts.

models/

The models directory is where AtD dumps the models and data it uses during runtime.  The .bin objects are serialized Java objects.

service/

The service/ folder contains the view portion of the AtD service.  These files are used by Moconti (the Sleep Application Server) to implement the AtD API.

service/code/

After the Deadline is written in a combination of Sleep and Java.  Sleep is a Perl-like language that runs on top of Java.  Some methods were ported to Java for performance reasons.  The Java source code is located in service/code. To rebuild the Java code you’ll need Apache Ant and a symbolic link to the lib/ directory from service/code/lib.

service/root

This folder is the root directory for serving files used by Moconti.  If you want your AtD service instance to serve web files, dump them here.  Otherwise you can ignore this folder.

service/src/local.sl

This file should contain your local modifications to AtD. It is loaded by site.sl and the function service_init is called after everything else in site.sl has loaded. Here you can redefine parts of the AtD protocol and add your own functions. We will not touch this file in subversion.

service/src/site.sl

The src/ folder contains the site.sl loaded by the Moconti Application Server.  I recommend reading the docs included with Moconti to understand this setup.  The service/src/site.sl implements the After the Deadline API.  To start understanding how AtD works and tracing functions, you’ll want to start with this file.

service/src/view

This folder contains templates for the AtD XML API.  These files are reference in service/src/site.sl and the info.slp call to get more information on an error.

utils/

This folder is the ugliest part of AtD.  Fortunately it has little to do with the runtime operation of AtD.  Most of the code here is used to train and test models.  A lot of code here is left over from various experiments I tried when building AtD.  Navigate it at your own peril.

utils/bigrams/buildcorpus.sl

This is the code used to construct the bigram and trigram language model used by AtD.

utils/common/

A lot of these files are common to the training and testing parts of the spell checker and misused word detection.  The organization here is poor.

utils/rules/rules.sl

This file is important to the rule engine as it is where all the AtD grammar / style rules are loaded and constructed into the models/rules.bin file.

utils/spell/

Code to train and test the spell checker and misused word detector.

utils/spelldata/

Code to generate data files used by AtD to train and test various features.

utils/tagger/

Code to train and test the part-of-speech tagger

<span>%d</span> bloggers like this: