Romance novel expert Sarah Wendell was perusing some romance ebooks rendered by running old romance novels through OCR (Optical Character Recognition) software, when she noticed something unusual. Lines such as “Mrs. Tipton went over to him and put her anus around his neck,” and “When she spotted me, she flung her anus high in the air.”
That’s when Sarah Wendell, or, as she refers to herself, “Smart Bitch Sarah“, because she contributes to the blog Smart Bitches, Trashy Books, realized that the OCR software could not – at least with older texts – distinguish between the words “arms” and “anus”.
In fact, it translated the written ‘arms’ to ‘anus’, replacing “arms” with “anus” in the ebooks.
Of course, it may be telling that Google – perhaps the largest scanner of books in the world (although HarperCollins is not far behind) – attempted to correct a Google search for the line “When she spotted me, she flung her anus high in the air and kept them up until she reached me” not by correcting ‘anus’ to ‘arms’, but inexplicably by wanting to correct “reached” to “reaches“!
Indeed, some of the most unfortunate, or hilarious, depending on your point of view, arms-to-anus oopsies are in Google’s own collection of OCRed texts, such as this gem from an 1887 edition of Peterson Magazine:
Wendell announced her discovery on Twitter, sharing an image of the sort of older romance novel text that causes the problem for the OCR software.
Observed British author Becky Black, “People think OCR is a cheap way to get old books into ebook format. But to do it right means thorough proof reading is needed.”
Feel free to share your finds, below.
|Get notified of new Internet Patrol articles!
You might also like some of our other articles: