About the Congressional Memory Project

Developed by Deb Kumar Roy
Internet Multicasting Service



Overview

This is an experimental text and audio server which enables access to the proceedings of the U.S. House of Representatives. There are three methods for accessing the archives: We have developed a custom speech processing system which attempts to align text congessional records with corresponding audio based on voice analysis.


The Databases

For each day that either houses of congress are in session we archive both the text transcript of the proceedings (which is manually transcribed), and a digital audio recording of the entire proceedings.

Currenly the server only supports proceedings of the U.S. House of Representatives.

On a typical day in which the House of Representatives is in session, the text transcript is about 15,000 lines, and there is about 10 hours of audio.


Audio to Text Alignment

The Audio and text are aligned using automatic speaker identification (speaker ID). The steps involved in performing speaker ID on the House audio archives are: