Session 1 – Welcome

Aly Conteh of the British Library opened the day, explaining the context of the workshop.

Speaking of his own experience of OCR, before joining the British Library, OCR was just something that was done, and was thought to do a pretty good job. But when digitising historical texts, you start running into problems – the question then is, what to do to improve the quality of the text, and this is what today is about – looking at the technology that is being developed to improve OCR.

An important aspect is the digitisation workflow:

  • Selection of material
  • Capture of images
  • Process – this is where OCR and other enhancement processes come in
  • Access – typically through the web, with full text searching
  • Preservation – particularly for cultural heritage institutions

This workshop is focusing on the process stage, which includes image enhancement, binarisation and segmentation, applying the OCR, which usually runs the text through an internal dictionary to improve accuracy, and providing metadata to match the OCR to the image from which it is taken.

This is a lot to discuss in just one day! Delegates were asked to think of questions throughout the day to ask at the panel session at the end.

Notes by Emma Huber, UKOLN.

About these ads

0 Responses to “Session 1 – Welcome”



  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s





Follow

Get every new post delivered to your Inbox.

%d bloggers like this: