The Scottish Post Office directories are annual directories that
provide an alphabetical list of a town’s or county’s inhabitants
including their forename, surname, occupation and address(es); they
provide a solid basis for researching Scotland’s family, trade, and town
history. A large number of these, covering most of Scotland and dating
from 1773 to 1911, can be accessed in digitised form from the National Library of
Scotland. podcleaner
attempts to clean optical
character recognition (OCR) errors in directory records after they’ve
been parsed and saved to “csv” files using a third party tool[1]. The
package further attempts to match records from trades and general
directories. See the tests folder for examples running unexported
functions.
Load general and trades directory samples in memory from “csv” files:
library(podcleaner)
<- c("1861-1862")
directories
<- TRUE; verbose <- FALSE progress
<- utils_make_path("data", "general-directories")
path_directories
<- utils_load_directories_csv(
general_directory type = "general", directories, path_directories, verbose
)
print.data.frame(general_directory)
#> directory page surname forename
#> 1 1861-1862 71 ABOT Wm.
#> 2 1861-1862 71 ABRCROMBIE Alex
#> occupation
#> 1 Wine and spirit mercht — See Advertisement in Appendix.
#> 2
#> addresses
#> 1 1S20 Londn rd; ho. 13<J Queun sq
#> 2 Bkr; I2 Dixon Street, & 29 Auderstn Qu.; res 2G5 Argul st.
<- utils_make_path("data", "trades-directories")
path_directories
<- utils_load_directories_csv(
trades_directory type = "trades", directories, path_directories, verbose
)
print.data.frame(trades_directory)
#> directory page rank occupation
#> 1 1861-1862 71 135 Wine and spirit mercht — See Advertisement in Appendix.
#> 2 1861-1862 71 326 Bkr
#> 3 1861-1862 71 586 Victualer
#> type surname forename address.trade.body address.trade.number
#> 1 OWN ACCOUNT ABOT Wm. Londn rd. 1S20
#> 2 OWN ACCOUNT ABRCROMBIE Alex Dixen pl I2
#> 3 OWN ACCOUNT BLAI Jon Hug High St. 2S0
Clean records from both datasets:
<-
general_directory general_clean_directory(general_directory, progress, verbose)
print.data.frame(general_directory)
#> directory page surname forename occupation
#> 1 1861-1862 71 Abbott William Wine and spirit merchant
#> 2 1861-1862 71 Abercromby Alexander Baker
#> 3 1861-1862 71 Abercromby Alexander Baker
#> address.trade.number address.trade.body address.house.number
#> 1 18, 20 London Road. 136
#> 2 12 Dixon Street. 265
#> 3 29 Anderston Quay. 265
#> address.house.body
#> 1 Queen Square.
#> 2 Argyle Street.
#> 3 Argyle Street.
<-
trades_directory trades_clean_directory(trades_directory, progress, verbose)
print.data.frame(trades_directory)
#> directory page rank surname forename occupation type
#> 1 1861-1862 71 135 Abbott William Wine and spirit merchant OWN ACCOUNT
#> 2 1861-1862 71 326 Abercromby Alexander Baker OWN ACCOUNT
#> 3 1861-1862 71 586 Blair John Hugh Victualler OWN ACCOUNT
#> address.trade.number address.trade.body
#> 1 18, 20 London Road.
#> 2 12 Dixon Place.
#> 3 280 High Street.
Match general to trades directory records:
<- TRUE; matches <- TRUE
distance
<- combine_match_general_to_trades(
directory
trades_directory, general_directory, progress, verbose, distance, matches,method = "osa", max_dist = 5L
)
print.data.frame(directory)
#> directory page rank surname forename occupation type
#> 1 1861-1862 71 135 Abbott William Wine and spirit merchant OWN ACCOUNT
#> 2 1861-1862 71 326 Abercromby Alexander Baker OWN ACCOUNT
#> 3 1861-1862 71 586 Blair John Hugh Victualler OWN ACCOUNT
#> address.trade.number address.trade.body address.house.number
#> 1 18, 20 London Road. 136
#> 2 12 Dixon Place. 265
#> 3 280 High Street.
#> address.house.body distance
#> 1 Queen Square. 0
#> 2 Argyle Street. 5
#> 3 Failed to match with general directory NA
#> match
#> 1 Abbott William - 18, 20, London Road
#> 2 Abercromby Alexander - 12, Dixon Street
#> 3 <NA>
Directory records are compared and eventually matched using a distance metric calculated with the method and corresponding parameters specified in arguments. Under the hood the fuzzyjoin package and the stringdist_left_join function in particular, help with the matching operations.
utils_IO_write(directory, "dev", "post-office-directory")