[Chugalug] intern

Sean Brewer seabre986 at gmail.com
Thu Sep 20 14:49:50 UTC 2012


That's what reCAPTCHA is doing, yeah. But as far as I know, they aren't
accepting collections outside of the New York Times and the books in
Google's collection for Google Books.

Basically it would go like this:

Scan items -> automatically extract word images from scans and store them
(not sure how to do this) -> pair unknown words from scans with known ones
for user to digitize and repeat until certain requirements are met

I'd have to check Von Ahn's paper for more details, but that's the gist.

There's also distributed proofreaders: http://www.pgdp.net/c/, that would
be another way to do it.

On Thu, Sep 20, 2012 at 8:40 AM, Nate Hill <nathanielhill at gmail.com> wrote:

> isn't that what CAPTCHA is doing now?  I thought that was the genius
> behind it... that every time you fill one out you are helping with
> character correction in a digitization project.
>
> This would be an interesting thing to make.  A lot of libraries and
> businesses have a 'labs' division.  I'm sort of toying with giving our
> library a 'public labs' division that could meet and work on things like
> this during regular events like this 'Hack the Library' thing I'm sort of
> cooking up right now (stay tuned).
>
> What kind of resources might go into making something like this?
>
>
> On Wed, Sep 19, 2012 at 11:29 PM, Sean Brewer <seabre986 at gmail.com> wrote:
>
>> I wish there was an open source reCAPTCHA. This would be a great way for
>> libraries to digitize their archives easily.
>>
>> On Wed, Sep 19, 2012 at 3:40 PM, Nate Hill <nathanielhill at gmail.com>wrote:
>>
>>> Hi all,
>>> Over at the library in our local history department we've got some
>>> pretty neat oral histories.
>>>  The transcripts are all typed out on paper and the content is all
>>> burned to CDs.
>>> I'd love to find an intern, perhaps a student, who would be interested
>>> in OCRing all of those transcripts and making everything accessible on the
>>> web.
>>> If you have experience with this kind of thing and want to take on a
>>> project, please drop me a note.
>>> Thanks
>>> Nate
>>>
>>> --
>>> Nate Hill
>>> nathanielhill at gmail.com
>>> http://4thfloor.chattlibrary.org/
>>> http://www.natehill.net
>>>
>>>
>>> _______________________________________________
>>> Chugalug mailing list
>>> Chugalug at chugalug.org
>>> http://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>>>
>>>
>>
>> _______________________________________________
>> Chugalug mailing list
>> Chugalug at chugalug.org
>> http://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>>
>>
>
>
> --
> Nate Hill
> nathanielhill at gmail.com
> http://4thfloor.chattlibrary.org/
> http://www.natehill.net
>
>
> _______________________________________________
> Chugalug mailing list
> Chugalug at chugalug.org
> http://chugalug.org/cgi-bin/mailman/listinfo/chugalug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://chugalug.org/pipermail/chugalug/attachments/20120920/284a7adc/attachment.html>


More information about the Chugalug mailing list