Bootstrap Data

If you want to modify AtD you’ll need to build your own language models. To do this requires data 🙂 I can’t release all data we use but this package has enough data to get you started.

To install this data (note: archive has atd/ folder):

tar zxvf atd_bootstrap_data.tgz

Data included in the bootstrap data comes from multiple sources.  These sources have their own licenses and each are covered by their respective licenses.  See included in the archive.

If you redistribute AtD, please keep this archive separate.  Thanks.

Remember: You’ll need more data to get the most accuracy out of AtD. Download whatever you have (HTML, plain text) and place it in data/corpus_extra.  AtD will do the rest when you rebuild the models with ./bin/

