<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>     caa a22        4500</leader>
  <controlfield tag="001">445379642</controlfield>
  <controlfield tag="003">CHVBK</controlfield>
  <controlfield tag="005">20180317143013.0</controlfield>
  <controlfield tag="007">cr unu---uuuuu</controlfield>
  <controlfield tag="008">170323e20110601xx      s     000 0 eng  </controlfield>
  <datafield tag="024" ind1="7" ind2="0">
   <subfield code="a">10.1007/s10032-011-0150-z</subfield>
   <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="035" ind1=" " ind2=" ">
   <subfield code="a">(NATIONALLICENCE)springer-10.1007/s10032-011-0150-z</subfield>
  </datafield>
  <datafield tag="245" ind1="0" ind2="0">
   <subfield code="a">Robust named entity detection from optical character recognition output</subfield>
   <subfield code="h">[Elektronische Daten]</subfield>
   <subfield code="c">[Krishna Subramanian, Rohit Prasad, Prem Natarajan]</subfield>
  </datafield>
  <datafield tag="520" ind1="3" ind2=" ">
   <subfield code="a">In this paper, we focus on information extraction from optical character recognition (OCR) output. Since the content from OCR inherently has many errors, we present robust algorithms for information extraction from OCR lattices instead of merely looking them up in the top-choice (1-best) OCR output. Specifically, we address the challenge of named entity detection in noisy OCR output and show that searching for named entities in the recognition lattice significantly improves detection accuracy over 1-best search. While lattice-based named entity (NE) detection improves NE recall from OCR output, there are two problems with this approach: (1) the number of false alarms can be prohibitive for certain applications and (2) lattice-based search is computationally more expensive than 1-best NE lookup. To mitigate the above challenges, we present techniques for reducing false alarms using confidence measures and for reducing the amount of computation involved in performing the NE search. Furthermore, to demonstrate that our techniques are applicable across multiple domains and languages, we experiment with optical character recognition systems for videotext in English and scanned handwritten text in Arabic.</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
   <subfield code="a">Springer-Verlag, 2011</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2="7">
   <subfield code="a">Optical character recognition</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2="7">
   <subfield code="a">Hidden Markov Model</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2="7">
   <subfield code="a">Information extraction</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2="7">
   <subfield code="a">Named entity detection</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2="7">
   <subfield code="a">OCR : Optical character recognition</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2="7">
   <subfield code="a">HMM : Hidden Markov model</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2="7">
   <subfield code="a">ROC : Receiver operating characteristics</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Subramanian</subfield>
   <subfield code="D">Krishna</subfield>
   <subfield code="u">Raytheon BBN Technologies, 10 Moulton Street, 02138, Cambridge, MA, USA</subfield>
   <subfield code="4">aut</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Prasad</subfield>
   <subfield code="D">Rohit</subfield>
   <subfield code="u">Raytheon BBN Technologies, 10 Moulton Street, 02138, Cambridge, MA, USA</subfield>
   <subfield code="4">aut</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Natarajan</subfield>
   <subfield code="D">Prem</subfield>
   <subfield code="u">Raytheon BBN Technologies, 10 Moulton Street, 02138, Cambridge, MA, USA</subfield>
   <subfield code="4">aut</subfield>
  </datafield>
  <datafield tag="773" ind1="0" ind2=" ">
   <subfield code="t">International Journal on Document Analysis and Recognition (IJDAR)</subfield>
   <subfield code="d">Springer-Verlag</subfield>
   <subfield code="g">14/2(2011-06-01), 189-200</subfield>
   <subfield code="x">1433-2833</subfield>
   <subfield code="q">14:2&lt;189</subfield>
   <subfield code="1">2011</subfield>
   <subfield code="2">14</subfield>
   <subfield code="o">10032</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://doi.org/10.1007/s10032-011-0150-z</subfield>
   <subfield code="q">text/html</subfield>
   <subfield code="z">Onlinezugriff via DOI</subfield>
  </datafield>
  <datafield tag="908" ind1=" " ind2=" ">
   <subfield code="D">1</subfield>
   <subfield code="a">research-article</subfield>
   <subfield code="2">jats</subfield>
  </datafield>
  <datafield tag="950" ind1=" " ind2=" ">
   <subfield code="B">NATIONALLICENCE</subfield>
   <subfield code="P">856</subfield>
   <subfield code="E">40</subfield>
   <subfield code="u">https://doi.org/10.1007/s10032-011-0150-z</subfield>
   <subfield code="q">text/html</subfield>
   <subfield code="z">Onlinezugriff via DOI</subfield>
  </datafield>
  <datafield tag="950" ind1=" " ind2=" ">
   <subfield code="B">NATIONALLICENCE</subfield>
   <subfield code="P">700</subfield>
   <subfield code="E">1-</subfield>
   <subfield code="a">Subramanian</subfield>
   <subfield code="D">Krishna</subfield>
   <subfield code="u">Raytheon BBN Technologies, 10 Moulton Street, 02138, Cambridge, MA, USA</subfield>
   <subfield code="4">aut</subfield>
  </datafield>
  <datafield tag="950" ind1=" " ind2=" ">
   <subfield code="B">NATIONALLICENCE</subfield>
   <subfield code="P">700</subfield>
   <subfield code="E">1-</subfield>
   <subfield code="a">Prasad</subfield>
   <subfield code="D">Rohit</subfield>
   <subfield code="u">Raytheon BBN Technologies, 10 Moulton Street, 02138, Cambridge, MA, USA</subfield>
   <subfield code="4">aut</subfield>
  </datafield>
  <datafield tag="950" ind1=" " ind2=" ">
   <subfield code="B">NATIONALLICENCE</subfield>
   <subfield code="P">700</subfield>
   <subfield code="E">1-</subfield>
   <subfield code="a">Natarajan</subfield>
   <subfield code="D">Prem</subfield>
   <subfield code="u">Raytheon BBN Technologies, 10 Moulton Street, 02138, Cambridge, MA, USA</subfield>
   <subfield code="4">aut</subfield>
  </datafield>
  <datafield tag="950" ind1=" " ind2=" ">
   <subfield code="B">NATIONALLICENCE</subfield>
   <subfield code="P">773</subfield>
   <subfield code="E">0-</subfield>
   <subfield code="t">International Journal on Document Analysis and Recognition (IJDAR)</subfield>
   <subfield code="d">Springer-Verlag</subfield>
   <subfield code="g">14/2(2011-06-01), 189-200</subfield>
   <subfield code="x">1433-2833</subfield>
   <subfield code="q">14:2&lt;189</subfield>
   <subfield code="1">2011</subfield>
   <subfield code="2">14</subfield>
   <subfield code="o">10032</subfield>
  </datafield>
  <datafield tag="900" ind1=" " ind2="7">
   <subfield code="a">Metadata rights reserved</subfield>
   <subfield code="b">Springer special CC-BY-NC licence</subfield>
   <subfield code="2">nationallicence</subfield>
  </datafield>
  <datafield tag="898" ind1=" " ind2=" ">
   <subfield code="a">BK010053</subfield>
   <subfield code="b">XK010053</subfield>
   <subfield code="c">XK010000</subfield>
  </datafield>
  <datafield tag="949" ind1=" " ind2=" ">
   <subfield code="B">NATIONALLICENCE</subfield>
   <subfield code="F">NATIONALLICENCE</subfield>
   <subfield code="b">NL-springer</subfield>
  </datafield>
 </record>
</collection>
