... | ... | @@ -22,7 +22,7 @@ To possess a folder structure that has a fixed number of levels and yet which ca |
|
|
|
|
|
To be configurable so that it is possible to manage the collections of multiple organisations that each have different requirements for schema, upload target and download source at organisational, departmental, project and tranche level (a tranche being a sub-component of a project, and the “unit” that is processed by the toolkit scripts)
|
|
|
|
|
|
The toolkit is “open-source software” and is released under the [GNU General Public License v3.0](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/blob/master/LICENSE). It is currently written in the Python scripting language.
|
|
|
The toolkit is “open-source software” and is released under the [GNU General Public License v3.0](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/blob/master/LICENSE).
|
|
|
|
|
|
For users of [Arkivm's Digital Preservation Platform (Perpetua)](https://arkivum.com/) this version of the toolkit will work for all V6 customers, although Arkivum will have to configure the customer's system.
|
|
|
|
... | ... | @@ -38,19 +38,19 @@ If legacy collections are migrated into the GFS, there are three different outco |
|
|
|
|
|
- if the filenames contain spaces or have no extension, they can reside in the GFS but the toolkit scripts cannot process them
|
|
|
|
|
|
- if the filenames have no spaces in them and they have an extension, all the scripts except one will be able to process them (see the [Script groups](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#script-groups) section)
|
|
|
- if the filenames have no spaces in them and they have an extension, all the scripts except one will be able to process them (see the [Script groups](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#script-groups) section)
|
|
|
|
|
|
- if the migrated filenames can be renamed using the relevant toolkit script, all the scripts will be able to process them
|
|
|
|
|
|
The advantages and disadvantages of using the GFS file-naming convention for migrated files are indicated in the [The Generic Folder Structure (GFS)](https://git.lse.ac.uk/hub/lse_digitaltoolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
The advantages and disadvantages of using the GFS file-naming convention for migrated files are indicated in the [The Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digitaltoolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
If the toolkit is used to manage large digitisation projects with a digitisation company performing the digitisation, it is advisable to include certain items in the contract. These items are detailed in the [Managing your relationship with your Digitisation Provider](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#managing-your-relationship-with-your-digitisation-provider) section.
|
|
|
If the toolkit is used to manage large digitisation projects with a digitisation company performing the digitisation, it is advisable to include certain items in the contract. These items are detailed in the [Managing your relationship with your Digitisation Provider](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#managing-your-relationship-with-your-digitisation-provider) section.
|
|
|
|
|
|
**Resources**
|
|
|
|
|
|
[v2.0.0 release](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/releases/v2.0.0)
|
|
|
[v2.0.0 release](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/releases/v2.0.0)
|
|
|
|
|
|
[Repository](https://git.lse.ac.uk/hub/lse_digital_toolkit)
|
|
|
[Repository](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit)
|
|
|
|
|
|
[Example Generic Folder Structure](https://drive.google.com/file/d/17F8rtleD-213YfIMEC5hhHSTQ4qwQIr3/view)
|
|
|
|
... | ... | @@ -58,21 +58,21 @@ If the toolkit is used to manage large digitisation projects with a digitisation |
|
|
|
|
|
## Overview of the toolkit
|
|
|
|
|
|
The toolkit consists of a suite of scripts that are executed via command line, plus some configuration files. The scripts operate on [The Generic Folder Structure (GFS)](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) which provides an organisational hierarchy (organisation, department, project, tranche), represented by codes and numbers.
|
|
|
The toolkit consists of a suite of scripts that are executed via command line, plus some configuration files. The scripts operate on [The Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) which provides an organisational hierarchy (organisation, department, project, tranche), represented by codes and numbers.
|
|
|
|
|
|
The use of codes and numbers allows for the automatic creation of unique IDs at every level, and of unique upload slugs (a slug being a string of characters that forms part of a [URL](https://en.wikipedia.org/wiki/URL)). The easiest way to understand the capabilities of the toolkit, and to determine its utility for your organisation, is to follow the instructions in the [Getting started with the toolkit](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) section.
|
|
|
The use of codes and numbers allows for the automatic creation of unique IDs at every level, and of unique upload slugs (a slug being a string of characters that forms part of a [URL](https://en.wikipedia.org/wiki/URL)). The easiest way to understand the capabilities of the toolkit, and to determine its utility for your organisation, is to follow the instructions in the [Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) section.
|
|
|
|
|
|
The levels of the GFS are reflected in an (optional) naming convention for the asset files. If this naming convention is adopted, the files can be manipulated more easily by the scripts. It also ensures that each filename is unique. All but one of the scripts will still function if the file-naming convention is not adopted for a tranche within a project so long as the filenames abide by some minimum requirements that are listed in [The Generic Folder Structure (GFS)](https://git.lse.ac.uk/hub/lse-digital-toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
The levels of the GFS are reflected in an (optional) naming convention for the asset files. If this naming convention is adopted, the files can be manipulated more easily by the scripts. It also ensures that each filename is unique. All but one of the scripts will still function if the file-naming convention is not adopted for a tranche within a project so long as the filenames abide by some minimum requirements that are listed in [The Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
The documentation is currently skewed towards archival processing, and specifically, upload to, and download from, Arkivum’s Digital Preservation Platform (Perpetua), which uses the ISAD(G) schema. However, if external developers wish to write scripts for other upload targets and download sources that use different schema, the documentation could become more generic, with all such scripts given their own sections in the documentation. For example, if there is a need to migrate a legacy collection of images of algae to the GFS, the Darwin Core Schema could be used, and an upload script could be written that has a biological database as a target.
|
|
|
|
|
|
One of the first things to consider when embarking on either a digitisation project, or a migration, is to what level the material should be divided up (the granularity). For example, should a bound volume of pamphlets be treated as a single item, and given just one metadata entry, or should each pamphlet have its own metadata entry? If it is the latter, the discoverability of the material will be improved once it has been uploaded to a website, and the download size of the files will be more convenient. However, it will require more of a cataloguer's time to achieve this outcome. The toolkit provides a mechanism for expressing the required granularity. The smallest division becomes the child of a parent. So in the example mentioned above, each pamphlet would be the child of the parent, a bound volume of pamphlets. The tranche csv files contain columns that allow this parent-child relationship to be expressed.
|
|
|
|
|
|
It is important that the granularity aspect of a project is considered before a tranche folder structure is created and populated because it very time consuming to rectify mistakes made in this aspect of a project post-digitisation. The information gained from assessing the granularity will allow a project manager to take account of the resourcing levels that will be required for a project. See the [Workflows](https://git.lse.ac.uk/hub/lse-digital-toolkit/-/wikis/LSE-Digital-Toolkit#workflows) section for more information about assessing the appropriate level of granularity for material.
|
|
|
It is important that the granularity aspect of a project is considered before a tranche folder structure is created and populated because it very time consuming to rectify mistakes made in this aspect of a project post-digitisation. The information gained from assessing the granularity will allow a project manager to take account of the resourcing levels that will be required for a project. See the [Workflows](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#workflows) section for more information about assessing the appropriate level of granularity for material.
|
|
|
|
|
|
Once the fields in the tranche csv file(s)s have been filled out and validated, a script is run that creates a corresponding folder structure. It is this folder structure (along with the tranche csv file) that can either be given to the Digitisation Provider to populate, or be the receptacle for migrated files.
|
|
|
|
|
|
There is a script to validate that the Digitisation Provider (or migration script) has populated the tranche folder structure correctly, in terms of both the existence of files in the correct folders. It also checkes whether the number of each set of derivative files matches the number of master files. For example, it will check that the number of jpg files matches the number of tif files in each child-item-set.
|
|
|
There is a script to validate that the Digitisation Provider (or migration script) has populated the tranche folder structure correctly, in terms of both the existence of files in the correct folders. It also checks whether the number of each set of derivative files matches the number of master files. For example, it will check that the number of jpg files matches the number of tif files in each child-item-set.
|
|
|
|
|
|
The toolkit is a relatively mature piece of software in some respects. The LSE has used it to process [many collections](https://lse-atom.arkivum.net/informationobject/browse) over the last four years, both for digitised and born-digital material.
|
|
|
|
... | ... | @@ -84,13 +84,13 @@ When the toolkit is used with the ISAD(G) schema, it can be configured for "Libr |
|
|
|
|
|
When an upload target such as Arkivum's Perpetua is used, the content of these columns are combined and formatted within the “isadg.scopeAndContent” column.
|
|
|
|
|
|
Tags, plus their content, can be created “on the fly” by entering them in the “gfs.contextualInformation” column of the tranche csv files. These tags are formatted and added to the content of the “isadg.scopeAndContent” column, as can be seen [here](https://lse-atom.arkivum.net/uklse-ex1zt01001001). This is a dissemination to Perpetua’s Atom module that contains the uploaded content of the example GFS used in the [Getting started with the toolkit](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) guide.
|
|
|
Tags, plus their content, can be created “on the fly” by entering them in the “gfs.contextualInformation” column of the tranche csv files. These tags are formatted and added to the content of the “isadg.scopeAndContent” column, as can be seen [here](https://lse-atom.arkivum.net/uklse-ex1zt01001001). This is a dissemination to Perpetua’s Atom module that contains the uploaded content of the example GFS used in the [Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) guide.
|
|
|
|
|
|
This feature is documented in the the "Library Processing" sub-section of the word document that can be downloaded from of the [Generic Folder Structure (GFS)](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
This feature is documented in the the "Library Processing" sub-section of the word document that can be downloaded from of the [Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
## Getting started with the toolkit
|
|
|
|
|
|
[Getting started with the toolkit](https://git.lse.ac.uk/hub/lse_digital-toolkit/-/wikis/Getting-started-with-the-toolkit)
|
|
|
[Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit)
|
|
|
|
|
|
## The Generic Folder Structure (GFS)
|
|
|
|
... | ... | @@ -98,7 +98,7 @@ This feature is documented in the the "Library Processing" sub-section of the wo |
|
|
|
|
|
## Configuration
|
|
|
|
|
|
[Configuration](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Configuration)
|
|
|
[Configuration](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Configuration)
|
|
|
|
|
|
## Workflows
|
|
|
|
... | ... | @@ -126,11 +126,11 @@ The cataloguing guide that can be downloaded via the link below is the LSE's int |
|
|
|
|
|
You may wish to consult this guide while evaluating the toolkit and then, if you decide to use the toolkit, modify the document so that it suits the requirements of your own organisation.
|
|
|
|
|
|
The content of this guide matches the default configuration of the [example GFS](https://drive.google.com/file/d/17F8rtleD-213YfIMEC5hhHSTQ4qwQIr3/view) that is used in the [Getting started with the toolkit](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) section.
|
|
|
The content of this guide matches the default configuration of the [example GFS](https://drive.google.com/file/d/17F8rtleD-213YfIMEC5hhHSTQ4qwQIr3/view) that is used in the [Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) section.
|
|
|
|
|
|
## Script groups
|
|
|
|
|
|
[Script groups](https://git.lse.ac.uk/hub/lse_digital-toolkit/-/wikis/Script-groups)
|
|
|
[Script groups](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Script-groups)
|
|
|
|
|
|
## Managing your relationship with your Digitisation Provider
|
|
|
|
... | ... | @@ -165,9 +165,9 @@ The Digitisation Provider is not willing to directly populate the folder-types w |
|
|
|
|
|
**Scenario 3**
|
|
|
|
|
|
The requirement is for the Digitisation Provider to directly populate the folder-types within the GFS, but not to create the files with names that accord with the GFS File-naming-convention. This can be done but with the proviso that those scripts listed in the "Scripts that can only be used if all the files in a tranche comply with the GFS File-naming Convention" sub-section within the [Script groups](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Script-groups) section, will not function.
|
|
|
The requirement is for the Digitisation Provider to directly populate the folder-types within the GFS, but not to create the files with names that accord with the GFS File-naming-convention. This can be done but with the proviso that those scripts listed in the "Scripts that can only be used if all the files in a tranche comply with the GFS File-naming Convention" sub-section within the [Script groups](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Script-groups) section, will not function.
|
|
|
|
|
|
If subsequently deemed appropriate, the files could be renamed to comply with the GFS File-naming Convention by using the gfs_rename_tranche_files.py script. This is perhaps not an ideal scenario because of potential issues with renaming files. These issues are indicated in the [Generic Folder Structure (GFS)](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
If subsequently deemed appropriate, the files could be renamed to comply with the GFS File-naming Convention by using the gfs_rename_tranche_files.py script. This is perhaps not an ideal scenario because of potential issues with renaming files. These issues are indicated in the [Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
**Scenario 4**
|
|
|
|
... | ... | @@ -205,7 +205,7 @@ Version 1 of the toolkit was written in the Perl 5 scripting language, but the c |
|
|
|
|
|
I am grateful to the following colleagues for their input into the development of the toolkit:
|
|
|
|
|
|
- **Fabi Barticioti**, whose archival expertise has been invaluable in developing various aspects of the toolkit. Fabi also developed the [LSE's internal cataloguing guide](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#cataloguing-guide)
|
|
|
- **Fabi Barticioti**, whose archival expertise has been invaluable in developing various aspects of the toolkit. Fabi also developed the [LSE's internal cataloguing guide](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#cataloguing-guide)
|
|
|
|
|
|
- **Neil Stewart**, my line manager, who has always proved to be a wise sounding board, and who (along with those further up the management hierarchy) gave me the time to develop a generic toolkit, rather than one that was specific to the LSE's requirements
|
|
|
|
... | ... | |