... | ... | @@ -6,7 +6,7 @@ |
|
|
|
|
|
General:
|
|
|
|
|
|
To provide a generic framework, independent of schema or upload target, in which files can be stored and easily manipulated.
|
|
|
To provide a generic framework, independent of schema or upload target, in which files can be stored, manipulated, and have their fixity validated.
|
|
|
|
|
|
Specific:
|
|
|
|
... | ... | @@ -14,7 +14,7 @@ To provide a mechanism to control the end-to-end workflow for digitised material |
|
|
|
|
|
To provide a mechanism to control the latter part of the workflow for born-digital materials, at scale, from storage to upload.
|
|
|
|
|
|
**Attributes**
|
|
|
**Characteristics**
|
|
|
|
|
|
To be configurable so that any schema, and any upload target, can be accommodated.
|
|
|
|
... | ... | @@ -22,7 +22,9 @@ To be scalable. The maximum number of child-item-sets per project is just under |
|
|
|
|
|
To possess a folder structure that has a fixed number of levels and yet which can accommodate a sufficient number of archival levels to cater for most requirements. This structure is referred to in this document as the [Generic Folder Structure (GFS)](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs).
|
|
|
|
|
|
To be configurable so that it is possible to manage the collections of multiple organisations that each have different requirements for schema, upload target and download source at organisational, departmental, project and tranche level (a tranche being a sub-component of a project, and the “unit” that is processed by the toolkit scripts)
|
|
|
To be configurable so that it is possible to manage the collections of multiple organisations, each having different requirements for schema, upload target and download source at organisational, departmental, project and tranche level (a tranche being a sub-component of a project, and the “unit” that is processed by the toolkit scripts)
|
|
|
|
|
|
**Attributes**
|
|
|
|
|
|
The toolkit is “open-source software” and is released under the [GNU General Public License v3.0](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/blob/main/LICENSE).
|
|
|
|
... | ... | @@ -50,7 +52,7 @@ If the toolkit is used to manage large digitisation projects with a digitisation |
|
|
|
|
|
**Resources**
|
|
|
|
|
|
[v2.0.1 release](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/releases/v2.0.1)
|
|
|
[v2.1.0 release](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/archive/v2.1.0/lse_digital_toolkit-v2.1.0.zip)
|
|
|
|
|
|
[Repository](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit)
|
|
|
|
... | ... | @@ -64,31 +66,31 @@ The toolkit consists of a suite of scripts that are executed via command line, p |
|
|
|
|
|
The use of codes and numbers allows for the automatic creation of unique IDs at every level, and of unique upload slugs (a slug being a string of characters that forms part of a [URL](https://en.wikipedia.org/wiki/URL)). The easiest way to understand the capabilities of the toolkit, and to determine its utility for your organisation, is to follow the instructions in the [Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) section.
|
|
|
|
|
|
The levels of the GFS are reflected in an (optional) naming convention for the asset files. If this naming convention is adopted, the files can be manipulated more easily by the scripts. It also ensures that each filename is unique. All but one of the scripts will still function if the file-naming convention is not adopted for a tranche within a project so long as the filenames abide by some minimum requirements that are listed in [The Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
The levels of the GFS are reflected in an (optional) naming convention for the asset-files. If this naming convention is adopted, the asset-files can be manipulated more easily by the scripts. It also ensures that each asset-filename is unique. All but one of the scripts will still function if the asset-file-naming convention is not adopted for a tranche within a project so long as the asset-filenames abide by some minimum requirements that are listed in [The Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
The documentation is currently skewed towards archival processing, and specifically, upload to, and download from, Arkivum’s Digital Preservation Platform (Perpetua), which uses the ISAD(G) schema. However, if external developers wish to write scripts for other upload targets and download sources that use different schema, the documentation could become more generic, with all such scripts given their own sections in the documentation. For example, if there is a need to migrate a legacy collection of images of algae to the GFS, the Darwin Core Schema could be used, and an upload script could be written that has a biological database as a target.
|
|
|
The documentation is currently skewed towards archival processing, and specifically, upload to, and download from, Arkivum’s Digital Preservation Platform, which uses the ISAD(G) schema. However, if external developers wish to write scripts for other upload targets and download sources that use different schema, the documentation could become more generic, with all such scripts given their own sections in the documentation. For example, if there is a need to migrate a legacy collection of images of algae to the GFS, the Darwin Core Schema could be used, and an upload script could be written that has a biological database as a target.
|
|
|
|
|
|
One of the first things to consider when embarking on either a digitisation project, or a migration relating to born-digital material, is to what level the material should be divided up (the granularity). For example, in the case of a digitisation project, should a bound volume of pamphlets be treated as a single item, and given just one metadata entry, or should each pamphlet have its own metadata entry? If it is the latter, the discoverability of the material will be improved once it has been uploaded to a website, and the download size of the files will be more convenient. However, it will require more cataloguer-time to achieve this outcome. The toolkit provides a mechanism for expressing the required granularity. The smallest division becomes the child of a parent. So in the example mentioned above, each pamphlet would be the child of the parent, which would be the bound volume of pamphlets. The tranche csv files contain columns that allow this parent-child relationship to be expressed.
|
|
|
One of the first things to consider when embarking on either a digitisation project, or a migration of born-digital material, is the level to which the material should be divided up (the granularity). For example, in the case of a digitisation project that involves the digitisation of bound volumes of phamphlets, should a bound volume be treated as a single item, and given just one metadata entry, or should each pamphlet have its own metadata entry? If it is the latter, the discoverability of the material will be improved, once it has been uploaded to a website, and the download size of the files will be smaller. However, it will require more cataloguer-time to achieve this outcome. The toolkit provides a mechanism for expressing the required granularity. The smallest division is the child-folder of a parent-folder. So in the example of the bound volume of pamphlets, if we wished to take the more granular of the two approaches, each pamphlet could correspond to a child-folder, and the bound volume the parent-folder. The tranche csv files contain columns that allow this parent-child relationship to be expressed.
|
|
|
|
|
|
For those familiar with archival terminology, a project might equate to the collection level, a tranche to the series level, a parent to the subseries level, and a child to the file level. The archival "file level" potentially containing one or more digital files.
|
|
|
For those familiar with archival terminology, a project might equate to the collection level, a tranche to the series level, a parent-folder to the subseries level, and a child-folder to the folder level. The archival "folder level" contains one or more digital files.
|
|
|
|
|
|
It is important that the granularity aspect of a project is considered before a tranche folder structure is created and populated because it is very time consuming to rectify mistakes made in this aspect of a project post-digitisation. The information gained from assessing the granularity will allow a project manager to take account of the resourcing levels that will be required for a project. See the [Workflows](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#workflows) section for more information about assessing the appropriate level of granularity for material.
|
|
|
It is important that the granularity aspect of a project is considered before a tranche folder structure is created and populated because it is very time consuming to rectify mistakes made in this aspect of a project, post-digitisation. The information gained from assessing the granularity will allow a project manager to predict the resourcing levels that will be required to complete a project. See the [Workflows](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#workflows) section for more information about assessing the appropriate level of granularity for material.
|
|
|
|
|
|
Once the fields in the tranche csv file(s) have been filled out and validated, a script is run that creates a corresponding folder structure. It is this folder structure (along with the tranche csv file) that can either be given to the Digitisation Provider to populate, or be the receptacle for migrated files.
|
|
|
Once the column-entries in the tranche csv file have been filled out and validated, a script is run that creates a corresponding folder structure. It is this folder structure (along with the tranche csv file) that can either be given to the Digitisation Provider to populate, or be the receptacle for migrated files.
|
|
|
|
|
|
There is a script to validate that the Digitisation Provider (or migration script, or archivist responsible for born-digital material) has populated the tranche folder structure correctly, in terms of both the existence of files in the correct folders and, if the GFS naming convention is used for the files, that the filenames have continuous sequence numbers. It also checks whether the number of each set of derivative files matches the number of master files. For example, it will check that the number of jpg files matches the number of tif files in each child-item-set.
|
|
|
There is a script to validate that the digitisation provider (or migration script, or archivist responsible for born-digital material) has populated the tranche folder structure correctly, in terms of both the existence of asset-files in the correct folders and, if the GFS naming convention is used for the files, that the asset-filenames contain continuous sequence numbers. It also checks whether the number of each set of derivative files matches the number of master files. For example, it will check that the number of jpg files matches the number of tif files in each child-item-set.
|
|
|
|
|
|
The toolkit is a relatively mature piece of software in some respects. The LSE has used it to process [many collections](https://lse-atom.arkivum.net/informationobject/browse) over the last five years, both for digitised and born-digital material.
|
|
|
The toolkit is a relatively mature piece of software in some respects. The LSE has used it to process [many collections](https://lse-atom.arkivum.net/informationobject/browse) over the last six years, both for digitised and born-digital material.
|
|
|
|
|
|
The [Economic History Collection](https://lse-atom.arkivum.net/uklse-dl1eh01) is an example of a large collection that the LSE has processed using the toolkit. It contains around 6300 child-item-sets. Each child-item-set contains ten to twenty alto, jpg, msword, text and tif files, plus one pdf file. It has a total of disk space usage of around 7TB. Only the pdf files are disseminated through to the AtoM module of Arkivmum's Perpetua Platform.
|
|
|
The [Economic History Collection](https://lse-atom.arkivum.net/uklse-dl1eh01) is an example of a large collection that the LSE processed using the toolkit. It contains around 6300 child-item-sets. Each child-item-set, in this particular collection, contains around ten to twenty alto, jpg, msword, text and tif files, plus one pdf file. It has a total of disk space usage of around 7TB. Only the pdf files are disseminated through to the AtoM module of Arkivmum's Digital Preservation Platform.
|
|
|
|
|
|
The toolkit is only mature in the relatively narrow band of activity for which the LSE has used it. The toolkit has fourteen scripts, but only about eight of these are used in day-to-day work.
|
|
|
The toolkit is only mature in the relatively narrow band of activity for which the LSE has used it. The toolkit has nineteen scripts, but only about ten of these are used in day-to-day work.
|
|
|
|
|
|
When the toolkit is used with the ISAD(G) schema, it can be configured for "Library Processing". This allows certain fields, which are commonly used in bibliographic cataloguing, but are not present in the ISAD(G) schema, such as “Personal author”, “Corporate author”, “Publisher”, and “Note” to have their own columns in the tranche csv files.
|
|
|
|
|
|
When an upload target such as Arkivum's Perpetua is used, the content of these columns are combined and formatted within the “isadg.scopeAndContent” column.
|
|
|
|
|
|
Tags, plus their content, can be created “on the fly” by entering them in the “gfs.contextualInformation” column of the tranche csv files. These tags are formatted and added to the content of the “isadg.scopeAndContent” column, as can be seen [here](https://lse-atom.arkivum.net/uklse-ex1zt01001001). This is a dissemination to Perpetua’s Atom module that contains the uploaded content of the example GFS used in the [Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) guide.
|
|
|
Tags, plus their content, can be created “on the fly” by entering them in the “gfs.contextualInformation” column of the tranche csv files. These tags are formatted and added to the content of the “isadg.scopeAndContent” column, as can be seen [here](https://lse-atom.arkivum.net/uklse-ex1zt01001001). This is a dissemination to Arkivum’s Atom module that contains the uploaded content of the example GFS used in the [Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Getting-started-with-the-toolkit) guide.
|
|
|
|
|
|
This feature is documented in the the "Library Processing" sub-section of the word document that can be downloaded from of the [Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
... | ... | @@ -104,38 +106,35 @@ This feature is documented in the the "Library Processing" sub-section of the wo |
|
|
|
|
|
[Configuration](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Configuration)
|
|
|
|
|
|
## File Fixity Validation
|
|
|
|
|
|
[Getting started with the toolkit](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/File-Fixity-Validation)
|
|
|
|
|
|
## Workflows
|
|
|
|
|
|
**Digitisation Workflow**
|
|
|
|
|
|
[Digitisation_workflow.docx](uploads/ec54a964d7e393d792d7e886a222df50/Digitisation_workflow.docx)
|
|
|
[Digitisation workflow](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Digitisation-workflow)
|
|
|
|
|
|
**Migration workflow**
|
|
|
|
|
|
[Migration_workflow.docx](uploads/f19a4ddb9208fe2babcdca6a99542918/Migration_workflow.docx)
|
|
|
[Migration workflow](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Migration-workflow)
|
|
|
|
|
|
**Arkivum tranche-cycle workflow**
|
|
|
|
|
|
[Arkivum_tranche_cycle_workflow.docx](uploads/8860f765b9c692da94d0214ecc99450f/Arkivum_tranche_cycle_workflow.docx)
|
|
|
|
|
|
**Temporary-folder to tranche-cycle workflow**
|
|
|
|
|
|
[Temporary_folder_to_tranche-cycle_workflow.docx](uploads/3b444cd03283e62730561003d59749c9/Temporary_folder_to_tranche-cycle_workflow.docx)
|
|
|
|
|
|
**Note**
|
|
|
There is an aim to upload workflows relating to born-digital material by the end of April 2023.
|
|
|
[Arkivum tranche cycle workflow](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Arkivum-tranche-cycle-workflow)
|
|
|
|
|
|
## Cataloguing guides
|
|
|
|
|
|
The cataloguing guides for the digitisation and born digital workflows can be downloaded via the links below. They are the LSE's internal cataloguing guides and are specific to LSE's own requirements.
|
|
|
|
|
|
[LSE_Digital_Toolkit_Born_Digital_User_Guide_v2_0_1.docx](uploads/acdd08959e5eada4e6a5f514cf98e58e/LSE_Digital_Toolkit_Born_Digital_User_Guide_v2_0_1.docx)
|
|
|
[LSE_Digital_Toolkit_Born_Digital_User_Guide_v2_1_0.docx](uploads/acdd08959e5eada4e6a5f514cf98e58e/LSE_Digital_Toolkit_Born_Digital_User_Guide_v2_1_0.docx)
|
|
|
|
|
|
[LSE_Digital_Toolkit_Digitisation_User_Guide__v2_0_1.docx](uploads/1f0a1e4bb11904a4c55c5c35faffe9b2/LSE_Digital_Toolkit_Digitisation_User_Guide__v2_0_1.docx)
|
|
|
[LSE_Digital_Toolkit_Digitisation_User_Guide__v2_1_0.docx](uploads/1f0a1e4bb11904a4c55c5c35faffe9b2/LSE_Digital_Toolkit_Digitisation_User_Guide__v2_1_0.docx)
|
|
|
|
|
|
[Command_line_examples_v2_0_1.txt](uploads/0ad0f5e945306608272d0034fe02c59a/Command_line_examples_v2_0_1.txt)
|
|
|
[Command_line_examples_v2_1_0.txt](uploads/0ad0f5e945306608272d0034fe02c59a/Command_line_examples_v2_1_0.txt)
|
|
|
|
|
|
[Project_folder_template_v2_0_1.zip](uploads/767d32e27ca51a81697300fed2f36179/Project_folder_template_v2_0_1.zip)
|
|
|
[Project_folder_template_v2_1_0.zip](uploads/767d32e27ca51a81697300fed2f36179/Project_folder_template_v2_1_0.zip)
|
|
|
|
|
|
You may wish to consult these guides while evaluating the toolkit and then, if you decide to use the toolkit, modify the documents so that they correspond with the requirements of your own organisation.
|
|
|
|
... | ... | @@ -145,62 +144,62 @@ The content of the digitisation guide matches the default configuration of the [ |
|
|
|
|
|
[Script groups](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Script-groups)
|
|
|
|
|
|
## Managing your relationship with your Digitisation Provider
|
|
|
## Managing your relationship with your digitisation provider
|
|
|
|
|
|
The Digitisation Provider could be a department within your own organisation, or it could be a company that provides such services. If it is the former, although it will be necessary to communicate a clear set of requirements to the department, contracts are unlikely to be involved.
|
|
|
The digitisation provider could be a department within your own organisation, or it could be a company that provides such services. If it is the former, although it will be necessary to communicate a clear set of requirements to the department, contracts are unlikely to be involved.
|
|
|
|
|
|
If you use this toolkit and sign a contract with a company to provide digitisation services, it is advisable for the contract to state that the company will be expected to populate the relevant folder-types within the tranches of the LSE's Generic Folder Structure with the required master files and derivative files according to the granularity indicated in the tranche csv files.
|
|
|
If you use this toolkit and sign a contract with a company to provide digitisation services, it is advisable for the contract to state that the company will be expected to populate the relevant asset-folders within the tranches of the LSE's Generic Folder Structure with the required master files and derivative files according to the granularity indicated in the tranche csv files.
|
|
|
|
|
|
It should also be stated that the files should be named according the toolkit's Generic File-naming Convention and a table listing required file-types and derivative files should be included in the contract.
|
|
|
|
|
|
Finally, it should be stated that one small test tranche should be populated by the company prior to commencing "production mode". This is so that the personnel of the Digitisation Provider have a chance to develop their own workflows, and any teething problems encountered can be resolved. The customer can verify that the outcome has met with expectations.
|
|
|
Finally, it should be stated that one small test tranche should be populated by the company prior to commencing "production mode". This is so that the personnel of the digitisation provider have a chance to develop their own workflows, and any teething problems encountered can be resolved. The customer is able to verify that the outcome has met with expectations.
|
|
|
|
|
|
The legalistic approach indicated above is not indicative of the LSE having a problem with contracting a Digitisation Provider. However, populating the GFS is likely to require the provider to develop new workflows, so ensuring that the provider is bound into this requirement is advisable.
|
|
|
The legalistic approach indicated above is not indicative of the LSE having a problem with contracting a digitisation provider. However, populating the GFS is likely to require the provider to develop new workflows, so ensuring that the provider is bound into this requirement is advisable.
|
|
|
|
|
|
In fact, the LSE found that its Digitisation Provider took a positive view of the GFS, and requested that the scripts be installed on their own devices so that their personnel could run the validation script to ensure the tranches had been populated correctly.
|
|
|
In fact, the LSE found that its digitisation provider took a positive view of the GFS, and requested that the scripts be installed on their own devices so that their personnel could run the tranche validation script to ensure the tranches had been populated correctly.
|
|
|
|
|
|
Having a validation script can prove beneficial to both parties. In the course of implementing a large digitisation project, with thousands of child-item-sets, it would be easy for the staff of the Digitisation Provider to miss out some items. The Digitisation Provider will not want to have to bring all their equipment and staff back on site a month or two after the project has finished, just to digitise a small number of items that have been missed, so it is in the provider's interest to have the output validated.
|
|
|
Having a validation script can prove beneficial to both parties. In the course of implementing a large digitisation project, with thousands of child-item-sets, it would be easy for the staff of the digitisation provider to miss out some items. The digitisation provider will not want to have to bring all their equipment and staff back on site a month or two after the project has finished, just to digitise a small number of items that have been missed, so it is in the digitisation provider's interest to have the output validated.
|
|
|
|
|
|
The LSE's Digitisation Provider made suggestions for script enhancements, and evolved their own workflows for populating the folder-types with the master and derivative file-types. These workflows involved the use of the following scripts:
|
|
|
The LSE's digitisation provider made suggestions for script enhancements, and evolved their own workflows for populating the asset-folders with the master and derivative asset-files. These workflows involved the use of the following scripts:
|
|
|
|
|
|
- gfs_copy_folder_type_to_target.py
|
|
|
- gfs_copy_asset_folder_to_target.py
|
|
|
- gfs_distribute_files_to_tranche.py
|
|
|
|
|
|
There are a number of scenarios that could be applicable to the population of the folder-types in the tranches of the GFS by the Digitisation Provider:
|
|
|
There are a number of scenarios that could be applicable to the population of the asset-folders in the tranches of the GFS by the digitisation provider:
|
|
|
|
|
|
**Scenario 1**
|
|
|
|
|
|
Direct population of the folder-types in the tranches of the GFS by the Digitisation Provider using the GFS File-naming convention. This is the ideal scenario.
|
|
|
Direct population of the asset-folders in the tranches of the GFS by the Digitisation Provider while applying the GFS File-naming convention. This is the ideal scenario.
|
|
|
|
|
|
**Scenario 2**
|
|
|
|
|
|
The Digitisation Provider is not willing to directly populate the folder-types within the GFS, but is willing to create the files with names that accord with the GFS File-naming Convention . In this scenario the desired outcome can still be achieved if the Digitisation Provider can deliver an entire tranche's worth of digitised files in one folder. The gfs_distribute_files_to_tranche.py script can be used to move the files to the folder-types within the tranche of the GFS. This scenario relies on the staff of the Digitisation Provider being very accurate in their naming of the files.
|
|
|
The digitisation provider is not willing to directly populate the asset-folders within the GFS, but is willing to create the files with names that accord with the GFS File-naming Convention . In this scenario the desired outcome can still be achieved if the digitisation provider can deliver an entire tranche's worth of digitised files in one folder. The gfs_distribute_files_to_tranche.py script can be used to move the asset-files to the asset-folders within the tranche of the GFS. This scenario relies on the staff of the Digitisation Provider being very accurate in their naming of the files.
|
|
|
|
|
|
**Scenario 3**
|
|
|
|
|
|
The requirement is for the Digitisation Provider to directly populate the folder-types within the GFS, but not to create the files with names that accord with the GFS File-naming-convention. This can be done but with the proviso that those scripts listed in the "Scripts that can only be used if all the files in a tranche comply with the GFS File-naming Convention" sub-section within the [Script groups](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Script-groups) section, will not function.
|
|
|
The requirement is for the digitisation provider to directly populate the asset-folders within the GFS, but not to create the asset-files with names that accord with the GFS File-naming-convention. This can be done but with the proviso that those scripts listed in the "Scripts that can only be used if all the files in a tranche comply with the GFS File-naming Convention" sub-section within the [Script groups](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/Script-groups) section, will not function.
|
|
|
|
|
|
If subsequently deemed appropriate, the files could be renamed to comply with the GFS File-naming Convention by using the gfs_rename_tranche_files.py script. This is perhaps not an ideal scenario because of potential issues with renaming files. These issues are indicated in the [Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
If subsequently deemed appropriate, the asset-files could be renamed to comply with the GFS File-naming Convention by using the gfs_rename_tranche_files.py script. This is perhaps not an ideal scenario because of potential issues with renaming asset-files. These issues are indicated in the [Generic Folder Structure (GFS)](https://itsagit.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
**Scenario 4**
|
|
|
|
|
|
The Digitisation Provider cannot directly populate the folder-types within the GFS, or create the files with names that accord with the GFS File-naming Convention. In such a scenario, a migration script would have to be written to move the digitised files to the GFS. The gfs_migrate_tranche_files.py script might form the basis of such a script but the degree of customisation that would be required would depend upon the nature of the folder structure created by the Digitisation Provider, and the file-naming convention that was used.
|
|
|
The digitisation provider cannot directly populate the asset-folders within the GFS, or create the files with names that accord with the GFS File-naming Convention. In such a scenario, a migration script would have to be written to move the digitised asset-files to the GFS. The gfs_migrate_tranche_files.py script might form the basis of such a script but the degree of customisation required would depend upon the nature of the folder structure created by the digitisation provider, and the asset-file-naming convention that was used.
|
|
|
|
|
|
**Note**
|
|
|
|
|
|
When a Digitisation Provider is in "production mode", it may have multiple staff on site digitising the material. It is therefore advisable that any delays to production are minimised by having someone on your team who is familiar with the toolkit, script writing, metadata-wrangling, and character sets, and available, at short notice, to trouble-shoot any problems encountered by the Digitisation Provider.
|
|
|
When a digitisation provider is in "production mode", it may have multiple staff on site digitising the material. It is therefore advisable that any delays to production are minimised by having someone on your team who is familiar with the toolkit, script writing, metadata-wrangling, and character sets, and available, at short notice, to trouble-shoot any problems encountered by the digitisation provider.
|
|
|
|
|
|
## Future developments
|
|
|
|
|
|
Fixity checking - currently in development
|
|
|
A Graphical User Interface (GUI) for the toolkit - this is currently under development.
|
|
|
|
|
|
A Graphical User Interface (GUI) for the toolkit - currently in development
|
|
|
A script that mints Digital Object Identifiers (DOIs) from the metadata in the department, project, and tranche csv files, and writes the DOIs back to those csv files.
|
|
|
|
|
|
Development of a configurable utility to delete and substitute non-standard ascii characters in a file
|
|
|
Development of a configurable utility to delete and substitute non-standard ascii characters in a file.
|
|
|
|
|
|
An attempt to improve the ability of the toolkit to cater for non-English metadata text. Unfortunately, nothing can be guaranteed in this regard.
|
|
|
|
|
|
Facility to cater for archival levels of unlimited depth
|
|
|
A facility to cater for archival levels of unlimited depth.
|
|
|
|
|
|
|
|
|
## Contact
|
... | ... | @@ -211,17 +210,15 @@ Email: n.bywell@lse.ac.uk |
|
|
|
|
|
## Author's note
|
|
|
|
|
|
When I first joined the Digital Library Team at the LSE, five years ago, there was a pressing need to control the workflow for the digitisation of the LSE's [Economic History Collection](https://lse-atom.arkivum.net/uklse-dl1eh01). I looked around for a suitable tool but could not find one, and so began creating this toolkit. At the same time, the library was in the latter stages of a tender process for a hosted digital preservation platform. The library opted for Arkivum's Perpetua, which provided for both the preservation and dissemination of digital assets. Creating a suitable upload package for this platform became a requirement of the toolkit.
|
|
|
|
|
|
Version 1 of the toolkit was written in the Perl 5 scripting language, but the current version (Version 2) is written in Python.
|
|
|
When I first joined the Digital Library Team at the LSE, six years ago, there was a pressing need to control the workflow for the digitisation of the LSE's [Economic History Collection](https://lse-atom.arkivum.net/uklse-dl1eh01). I looked around for a suitable tool to help me to achieve this but could not find one, and so began creating this toolkit. At the same time, the library was in the latter stages of a tender process for a hosted digital preservation platform. The library opted for Arkivum's Perpetua, which provided for both the preservation and dissemination of digital assets. Creating a suitable upload package for this platform became a requirement of the toolkit.
|
|
|
|
|
|
I am grateful to the following colleagues for their input into the development of the toolkit:
|
|
|
|
|
|
- **Fabi Barticioti**, whose archival expertise has been invaluable in developing various aspects of the toolkit. Fabi also developed the LSE's internal cataloguing guide for digitised material.
|
|
|
- **Robert Miles** and **Fabi Barticioti**, our current and former digital assets managers whose archival expertise has been invaluable in developing various aspects of the toolkit. Fabi developed the LSE's internal cataloguing guide for digitised material.
|
|
|
|
|
|
- **Neil Stewart**, my former line manager, who always proved to be a wise sounding board, and who (along with those further up the management hierarchy) gave me the time to develop a generic toolkit, rather than one that was specific to the LSE's requirements
|
|
|
- **Henry Rowsell** and **Neil Stewart**, my current and former line managers, who (along with those further up the management hierarchy) gave me the time to develop a generic toolkit, rather than one that is specific to the LSE's requirements.
|
|
|
|
|
|
- **Silvia Gallotti**, who developed the LSE's internal cataloguing guide for born digital material
|
|
|
- **Silvia Gallotti**, who developed the LSE's internal cataloguing guide for born digital material.
|
|
|
|
|
|
- **Emma Pizarro**
|
|
|
- **Clare Mulhall**
|
... | ... | @@ -231,10 +228,10 @@ I am grateful to the following colleagues for their input into the development o |
|
|
|
|
|
I am also grateful to:
|
|
|
|
|
|
- the staff of our Digitisation Provider, who took a positive approach to the GFS, when I feared that the population of the tranches, using the GFS's naming convention, might prove to be a stumbling block
|
|
|
- the staff of our digitisation provider, who took a positive approach to the GFS, when I feared that the population of the tranches, using the GFS's naming convention, might prove to be a stumbling block
|
|
|
|
|
|
- the staff at Arkivum, who kindly made some modifications to their system so that the upload-file could be processed and displayed appropriately
|
|
|
- the staff at Arkivum, who kindly made some modifications to their system so that the upload-folder could be processed and displayed appropriately.
|
|
|
|
|
|
It would be interesting hear from anyone who starts using the toolkit, or has problems with it, or is willing to give some feedback on its functionality. It would also be encouraging to hear from any developers who wish to contribute new scripts for additional upload targets. I can be contacted at n.bywell@lse.ac.uk
|
|
|
|
|
|
**Nick Bywell** (8th March 2023) |
|
|
\ No newline at end of file |
|
|
**Nick Bywell** (6th Feb 2024) |
|
|
\ No newline at end of file |