Download Pages

Software for Natural Language Processing (with source code)

A recent snapshot of my source code can be found here. These are working versions with the latest features and bugfixes, but they are not systematically tested and may at times fail to compile. For more stable versions see the releases below.

RegAligner - A tool for regularized word alignment

RegAligner is a tool for word alignment that augments the traditional maximum likelihood criterion for single word based models by regularity terms. Right now it handles the models IBM 1-4 and HMM, with (optional) slight variations. In the future we will also go beyond these models.
A recent snapshot is available in the git repository. Get the latest release version 1.21 now with IBM-5 and nondeficient mode for IBM-3 and IBM-4.

VI3 - Computation of IBM-3 Viterbi Alignments (patch for GIZA++)

Computation of IBM-3 Viterbi Alignments has been shown to be NP-hard and the popular toolkit GIZA++ uses a suboptimal hillclimbing strategy. However, we have shown that using a combination of Integer Linear Programming (ILP) and a clever prior reasoning stage, computing exact Viterbi alignments is efficient enough for reasonably large corpora. This patch enables GIZA++ to compute the exact alignments. A recent snapshot is available in the git repository.
This patch requires Coin-OR CBC.

Software for Computer Vision (with source code)

Regioncurv: Region-based Curvature Regularity for Segmentation, Inpainting and Denoising

This is a very innovative toolkit for curvature regularity approaches to image segmentation, inpainting and denoising. The core principle to all approaches is a linear program with surface continuation constraints. A number of different solvers are supported. Regioncurv requires Coin-OR CBC.
A recent snapshot can be found here. Download the latest release 1.1, now with message passing solvers.

Toolboxes for Various Fields (with source code)

C++-Toolbox

The C++-Toolbox is meant to ease software development. It does not provide executable binaries. Instead, it offers the following classes:
  • Container classes for 1-,2- and 3-dimensional access patterns, as well as a class for arbitary dimensional storage patterns. In debug mode, access is checked to be inside the valid bounds. Objects can be assigned names to ease debugging.
  • Based on the container classes, classes for vectors, matrices and (three-dimensional) tensors. These offer mathematical operations in addition to plain storage.
  • Based on matrices and tensors: classes for grayscale and color images, with support for the PGM/PPM format.
  • Functionality for reading files and processing strings.
  • An intuitive application class to ease parsing of command lines.
A recent snapshot is available in the git repository. Get the latest release version 1.03.

Optimization-Toolbox

This is a collection of topics related to optimization problems that is meant as a library (and hence offers no executables). At present it covers the following areas:
  • Solving of discrete labeling problems with the help of message passing approaches. Most of the approaches are strongly related to linear programming relaxations: we offer MSD, MPLP, TRWS and Dual Decomposition. You can choose between singleton and pairwise separators. In addition, we offer Max-Product Belief Propagation.
    For low-order factors you can use generic routines. In addition we implement some factors that scale to high orders, such as 1-of-N and cardinality potentials and binary integer linear constraints.
  • Routines for minimizing a class of submodular functions, including terms of very high order (you will need additional code by Vladimir Kolmogorov).
  • N-best search in directed acyclic graphs.
A recent snapshot is available in the git repository. There is no stable release yet.

Data

Gold Alignments for Europarl De-En

Annotations for 300 sentences of length up to 80. Provided are sure and possible alignments in the (machine-readable) standard format. We also provide visualizations in pdf and png formats. Get the new version 2 with corrections and twice as many sentence pairs here. The associated corpus (version 6) can be found here.
Last modified: May 08 2012 / tosch@phil.uni-duesseldorf.de