I just wanted to give a quick updated on how transcription is going with BR1.
Transcription has slowed down recently as I’ve found the process very difficult.
Here’s an example of what I’m working with in terms of transcription.
So the past week or so, I’ve been trying to find a way to OCR the files in order to make it go much more quickly. As I may have mentioned, I’m transcribing the files into a database, so I’m typing into a CSV file (well, a spreadsheet, currently). What I was hoping to do was find an online process that can take the page and put it into a table. While I’m sure it’s possible, it’s not easy to find one that can do it for free.
Part of the problem is you can’t always chop it up into lines properly, because the photos aren’t always straight, even when the OCR programs have guidelines to chop the image into cells, it’s not possible to- for example- accurately chop the BR numbers so the BR number goes into one column, and the letters/numbers that follow into the next. Also, surprisingly, it’s been quite difficult for any of the OCR algorithms to properly get the R/U column by itself- often multiple rows end up in the same line of the CSV that is output.
Now, if you have the files in Google Drive, you can open in the Word equivalent, and the OCR is actually pretty good, but it doesn’t interpret the images as tables, so the data is very spread out and hard to read. What I use instead is this website https://www.table-reader.com/image-to-excel. The benefit of that service- while limited to 5 pages a day (more than I can get through in terms of transcription) is that you can cut down the part of the image to OCR. And so what I’ve decided to do is I’m only going to OCR Column 4- Title. This is by far the most intricate and tedious part of the document to transcribe- especially when there’s a lot of numbers. The other columns are actually fairly quick to transcribe, so I think this will be the quickest way forward. My goal is to get this done by the end of January, because I’m hoping at that point to have photos of the 1951 BR1 Catalogue (thanks again to Roy Metcalfe) Hopefully transcribing that volume will go much quicker and then I can start comparing the two volumes and see how things change over nearly 20 years.