"Once Under Wraps, Supreme Court Audio Trove Now Online", NPR All Things Considered 4/24/2013:
The court has been releasing audio during the same week as arguments only since 2010. Before that, audio from one term generally wasn't available until the beginning of the next term. But the court has been recording its arguments for nearly 60 years, at first only for the use of the justices and their law clerks, and eventually also for researchers at the National Archives, who could hear — but couldn't duplicate — the tapes. As a result, until the 1990s, few in the public had ever heard recordings of the justices at work.
But as of just a few weeks ago, all of the archived historical audio — which dates back to 1955 — has been digitized, and almost all of those cases can now be heard and explored at an online archive called the Oyez Project.
Some of the funding for the digitization and transcription, and all of the funding for the technology used in text/audio alignment and speaker identification, came from an NSF grant "ITR-SCOTUS: A Resource for Collaborative Research in Speech Technology, Linguistics, Decision Processes and the Law", which was due to run from 2003-2007 but actually ended (after a few no-cost extensions) in 2010. The P.I.s were Jerry Goldman (then at Northwestern), Tim Johnson (University of Minnesota) and me.
The NPR story embeds an instance of oyez.org's very nice flash app for interacting with individual transcripts and recordings:
Documentation of the speech technology involved can be found in Jiahong Yuan and Mark Liberman, "Speaker Identification on the SCOTUS Corpus", Acoustics 2008. Jiahong did most of this work.
You may wonder why speaker identification was needed. Traditionally, transcripts of U.S. Supreme Court oral arguments did not identify individual justices by name, but instead referred to any one of them as "The Court".
In addition to developing the applied speech technology for text/audio alignment and speaker identification, Jiahong and I have used part of the SCOTUS corpus in some phonetic investigations, such as "Investigating /l/ variation in English through forced alignment” InterSpeech 2009. In that analysis, we used only the data from the 2001 term — which contained 21,706 tokens of /l/. The entire 60-year archive will soon be published in a form that will allow phoneticians as well as legal scholars to use it as a basis for research.
↧