[Chugalug] intern

Ed King chevyiinova at bellsouth.net
Wed Sep 19 20:47:05 UTC 2012


sure, I'll take a crack at it.   are the audio recordings already digitized?   I 
can digitize the recordings if they are not already digitized.   


I'm thinking I'd write a script to process the digitized file...
use sox to break the file into small chunks
feed the small audio chunks to google speech-to-text
save the text to a file, append mode
repeat until entire audio file is converted to text

once I get a script that works, I'll share the code, perhaps other folks could 
jump in and we can turn this into a distributed, parallel processing project ;-)


what kind of "volume" are we looking at?  (how many oral histories, how long is 
a typical recording, etc)




________________________________
From: Nate Hill <nathanielhill at gmail.com>
To: Chattanooga Unix Gnu Android Linux Users Group <chugalug at chugalug.org>
Sent: Wed, September 19, 2012 3:19:40 PM
Subject: Re: [Chugalug] intern

Great idea! They are. Ed, are you volunteering? :)

On Wednesday, September 19, 2012, Ed King  wrote:

instead of OCR, I wonder if this project would be a good way to (ab)use Google's 
speech-to-text api.   Are the original audio recordings availabe?
>
>
>
>
>
________________________________
From: Nate Hill <nathanielhill at gmail.com>
>To: CHUGALUG <chugalug at chugalug.org>
>Sent: Wed, September 19, 2012 2:41:16 PM
>Subject: [Chugalug] intern
>
>Hi all,
>Over at the library in our local history department we've got some pretty neat 
>oral histories.
>The transcripts are all typed out on paper and the content is all burned to 
CDs.
>I'd love to find an intern, perhaps a student, who would be interested in OCRing 
>all of those transcripts and making everything accessible on the web.
>If you have experience with this kind of thing and want to take on a project, 
>please drop me a note.
>Thanks
>Nate
>
>
>-- 
>Nate Hill
>nathanielhill at gmail.com
>http://4thfloor.chattlibrary.org/
>http://www.natehill.net
>
>

-- 
Nate Hill
nathanielhill at gmail.com
http://4thfloor.chattlibrary.org/
http://www.natehill.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://chugalug.org/pipermail/chugalug/attachments/20120919/f20781c8/attachment.html>


More information about the Chugalug mailing list