So I have finished transcribing the BR1 Catalogue (1968), so I can provide you with some of the basic stats.
So in 1968, BR1 had: 1623 unique BR numbers (titles), with the highest number being 6004. There were 3475 individual line items. I can also say that the max character lengths for the different columns are as follows:
Suffixes (following BR number): 20 characters
Main Title : 193 characters
Part/Volume Title: 176 characters
Sponsoring Dept: 21 characters
In-text Notes: 239 characters
Transcription Notes: 148
Given that there are over 3475 line items, you can do some quick maths to figure out how large the database is going to have to be to accommodate all of these text fields.
I can’t provide more details than that until I have it up in the database, although I could do counts, I’d rather do it in the database rather than the spreadsheet. So what comes next?
Database Design.
I’m glad I finished the transcription this way, rather than designing it first and using a web interface to do the transcription, and having to redo the database and interface when I looked at the other BR1 catalogues. So the design that I describe here is intended to accomodate not just BR1(1968), but also the versions from 1951, 1970 and 1972. I think, even if I find more editions, I’m probably going to stick to just these- unless I find an earlier edition, say one from immediately post-war. Or even better, a 1944 edition when BR1 was created.
Anyways, bear with me as I discuss the database design.
Okay, so recall very much like my ADM8 Database project, transcription is effectively done on a line-by-line basis. The process for inputting data into the database is iterative, in fact it is a loop. The file, read in by the script, becomes an array of arrays, and I’ll be stepping through that array line-by-line (array-by-array)
So the List of Tables:
1) catalogues: This will have four entries, and will keep some basic details about documents themselves
2) document_photos: This Table will have a list of all the photos, by file name, so when a page is called up, or even a single line, that page can be shown on the screen. (This may eventually be done instead by NGG plugin or another plugin, but or now I’m keeping it in the database design)
3)br_volumes: This will be a list of all the unique BR numbers, associated with a year. I just don’t want to get myself into a trap where a volume number changes, or numbers are reused for different things in different catalogues. Somehow I’m going to have to use the prefixes here as well, but there’s going to be a problem with data storage in some way since frankly the system is inconsistent in terms of the relationship between numbers and naming.
4)volume_associations: this is the table where connections between volumes in different catalogues will be documented.
5)br_line_items: This will contain the basic details for every line item for each catalogue. it’ll link back to several of the above tables. For many, the volume title will be blank, and the title will be connected back to the br_volumes table where the volume titles will be stored. Though this is tricky, since for some of the larger engineering texts different parts have different numbers
6) establishment_allowances: This table doesn’t matter for 1968 since I’m not transcribing the allowances section of the catalogue (At this point), but it’ll matter for the other volumes where it’s on the same page as the catalogue data.
7) sponsoring_depts: a list of the sponsoring departments, with explanations and longer names
I’m going to write the script for processing the CSV in PHP of course, and I intend to have a number of queries pre-programmed in the research interface, plus hopefully a way to search the database for specific terms (thankfully WPDB is good at not allowing Bobby Drop Tables attacks).
Since the conference is in June, hopefully I’ll get this uploaded into the database soon and can have it available for people to look at. And as soon as the spreadsheet is fully done, I’ll make it available here as well in case people want that form of the data.