|
|
|
[[_TOC_]]
|
|
|
|
|
|
|
|
This guide takes you through the process of:
|
|
|
|
|
|
|
|
- installing the necessary software and the "Generic Folder Structure (GFS)"
|
|
|
|
- running toolkit scripts to validate the installation process
|
|
|
|
- setting up a folder hierarchy within the GFS that corresponds the requirements of your organisation
|
|
|
|
- running toolkit scripts to validate that this has been successful
|
|
|
|
- For those wishing to use the toolkit with Arkivum's Perpetua, a script is run to create an upload package that contains your own files and metadata
|
|
|
|
|
|
|
|
The toolkit has been tested on the Windows operating system. It will probably work with a macOS or Linux operating system but it hasn't been tested on them. The installation instructions are for a Windows system.
|
|
|
|
|
|
|
|
## 1. Install Python
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
## 1.1 Check whether the PC/laptop already has Python installed by opening a command window and typing the following (put a space two hyphens prior to “version”):
|
|
|
|
|
|
|
|
python –version
|
|
|
|
|
|
|
|
If the message returned is "Python 3.10.0" or later, this is a sufficiently advanced version that the user can jump to step 1.3. However, if the command is unrecognised, or an earlier version of Python is present, click [here](https://www.python.org/downloads/) and download the latest version applicable to Windows.
|
|
|
|
|
|
|
|
## 1.2 Click on the downloaded file to install Python on your PC. IMPORTANT! When you see the dialogue box below during the installation procedure, tick the "Add Python <version no> to PATH" box.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Install “Perl Version 5.28” by clicking on [this](https://www.activestate.com/products/perl/downloads/) link (note: please do NOT click on the “Download Perl 5.32” button after clicking on the link).
|
|
|
|
|
|
|
|
Select the link “ActiveState Perl 5.28 for Windows” from the download icons in the middle of the page. (again, please do NOT click on the “Download Perl 5.32” button).
|
|
|
|
|
|
|
|
It is necessary to create an account with ActiveState to download the "Perl V5.28".
|
|
|
|
|
|
|
|
Download the relevant installation “.exe” file for your device, and run the executable file.
|
|
|
|
|
|
|
|
To test whether the installation of Perl has been successful, open a command window (type “cmd” in the field where it states “Type here to search” in the bottom-left of the desktop) and at the prompt in the command window that appears, type “perl -v” - it should report some text that begins “This is perl 5, version 28, subversion 1 (v5.28.1)”.
|
|
|
|
|
|
|
|
## 2. Install 7-Zip
|
|
|
|
|
|
|
|
This section is only mandatory for users that wish to use the scripts that create/process upload and download packages for Arkivum's Digital Preservation Platform (Perpetua).
|
|
|
|
|
|
|
|
Type "7-Zip" in the search-box on the desktop. If the "7-Zip File Manager App" appears, 7-Zip has already been installed.
|
|
|
|
|
|
|
|
If it has not already been installed, open [this link](https://www.7-zip.org/), Download the appropriate executable for your device from the “7-Zip 19.00 (2019-02-21) for Windows:” options. The 7-Zip executable will probably be installed by default at:
|
|
|
|
|
|
|
|
C:\Program Files\7-Zip\7z.exe
|
|
|
|
|
|
|
|
If this is not the path to the 7z.exe file, once the toolkit has been installed in step (3), it will be necessary to edit the toolkit scripts “gfs_create_arkivum_upload.pl” and "gfs_distribute_arkivum_export_to_tranche.pl" according to the instructions found there to change the path to the 7z.exe file, so that it matches the installation path.
|
|
|
|
|
|
|
|
It is advisable to keep the path to the 7z.exe file short because it takes up space on a command line, and the Windows operating system has a limit on the length of a command line.
|
|
|
|
|
|
|
|
## 3. Install the Toolkit
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
Create a folder close to the root of a drive.
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
H:\LSE_TK_PERL\
|
|
|
|
|
|
|
|
Click on the "Source code (zip)" file for the v1.0.2 release from the [GitLab Repository release](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/releases/v1.0.2) and unzip it into the folder. The reason the path to the folder should be short is because the path forms a component of the command lines by which the Perl scripts are executed, and some operating systems have a limit to the length of a command line.
|
|
|
|
|
|
|
|
If you installed 7-Zip, and the path to the executable file was different from:
|
|
|
|
|
|
|
|
C:\Program Files\7-Zip\7z.exe
|
|
|
|
|
|
|
|
Edit line 187 of the toolkit script “gfs_create_arkivum_upload.pl”, and line 109 of toolkit script "gfs_distribute_arkivum_export_to_tranche.pl" using a text editor (such as Notepad). Change the path listed there to match the installation path. Each backslash within the path should be doubled.
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
C:\Program Files\7-Zip\7z.exe
|
|
|
|
|
|
|
|
Should be entered at line 187 as:
|
|
|
|
|
|
|
|
C:\\\Program Files\\\7-Zip\\\7z.ex
|
|
|
|
|
|
|
|
## 4. Install the example Generic Folder Structure (GFS)
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
Either create a folder near the root of a drive, with a short name, to contain the GFS, or preferably, unzip the download file indicated below at the root level of a drive. Paths within the GFS will form part of command line calls, so the shorter the paths, the less likely it is that the length limit of the Windows command line will be exceeded.
|
|
|
|
|
|
|
|
Download the example GFS file from [here](https://drive.google.com/file/d/17F8rtleD-213YfIMEC5hhHSTQ4qwQIr3/view).
|
|
|
|
|
|
|
|
Unzipping the file at the root level should create a folder hierarchy in which the top level is “GFS”.
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
Z:\GFS\\*
|
|
|
|
|
|
|
|
The "Example GFS" contains some digitised journal issues. It only contains a small number of asset files, just a subset of those that would normally be present. This is to limit the download size of the zip file. For the same reason, it contains no tif files (which would normally be the master files for such a digitisation project).
|
|
|
|
|
|
|
|
## 5. Run some scripts against the sample GFS to check that everything is working
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
The next step is to run the script that validates that the project csv file has been set up correctly. This is achieved by typing “cmd” in the search box on the desktop (where it says “Type here to search”). This brings up a command window into which command lines can be entered that execute the toolkit’s scripts.
|
|
|
|
|
|
|
|
The instructions for executing each script can be found in the header information at the top of each script file. To read these instructions, open the script file using a text editor (such as Notepad).
|
|
|
|
|
|
|
|
The example command lines will be like those shown below. The command lines will have to be modified slightly, in accordance with wherever you have installed the toolkit and GFS on your device.
|
|
|
|
|
|
|
|
Execute the script that validates that the project csv file.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_validate_project_csv.pl Z:\GFS\UKLSE\EX1\ZT01 default
|
|
|
|
|
|
|
|
Note that the script creates a “logs” folder within the project code folder, and that a log file is created within the folder by the execution of the script.
|
|
|
|
|
|
|
|
The output in the command window after the script has been executed should be similar to that indicated in this screenshot:
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
Execute the script that validates the tranche csv file.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_validate_tranche_csv.pl Z:\GFS\UKLSE\EX1\ZT01\001 default
|
|
|
|
|
|
|
|
Note that the script creates a “logs” folder within the tranche number folder (folder 001), and that a log file has been created within the folder by the execution of the script.
|
|
|
|
|
|
|
|
The output in the command window after the script has been executed should be similar to that indicated in this screenshot:
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
Execute the script that validates the tranche folder structure.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_validate_tranche_folder.pl y Z:\GFS\UKLSE\EX1\EH01\001 jpg alto text pdf
|
|
|
|
|
|
|
|
The output in the command window after the script has been executed should be similar to that indicated in this screenshot:
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
## 6. Create a folder hierarchy for your own organisation
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
A folder hierarchy, equivalent to the one for the “UKLSE” organisation-code folder, can now be set up for your own organisation.
|
|
|
|
|
|
|
|
Choose a suitable code for your organisation. The code must consist only of uppercase alphabetic characters (plus hyphens if necessary). For the purposes of this exercise, it is not important that you choose a definitive code at this point because you can always change it later.
|
|
|
|
|
|
|
|
However, if your organisation has a[ UK MARC organisational code](https://www.bl.uk/bibliographic/pdfs/marc-codes-directory.pdf) or an [international code](https://www.loc.gov/marc/organizations/org-search.php), it would be appropriate to use one of these. “UKLSE”, is the UK MARC organisation code for the London School of Economics and Political Science.
|
|
|
|
|
|
|
|
If your organisation has not been allocated such a code by either of these bodies, you can just make one up. Although this code is not of a fixed length, it should ideally be from five to eight characters in length.
|
|
|
|
|
|
|
|
It is not advisable to use a code greater than eight characters because the code will be combined with other elements to form various IDs, and very long IDs can create visual clutter when material is disseminated.
|
|
|
|
|
|
|
|
Try to ensure that the code hasn’t already been assigned to another organisation. It is a good idea to have your country code at the start of it, in the way that the “UK” is at the start of “UKLSE”. This increases the likelihood that it will be a unique code.
|
|
|
|
|
|
|
|
Create a folder with the name of your code at the same level of the hierarchy as the “UKLSE” folder.
|
|
|
|
|
|
|
|
Copy the entire folder hierarchy and contents that exists below the “UKLSE” folder into your organisation code folder.
|
|
|
|
|
|
|
|
## 7. Modify the copied folder structure so that it corresponds to a department and project relevant to your organisation
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
- Move down the folder structure renaming the department code and project code folders to those applicable to your organisation.
|
|
|
|
|
|
|
|
- Change the names of the csv files so that the codes that form their names match the folder names you have created.
|
|
|
|
|
|
|
|
- Edit the content of the csv files so that the codes listed there match the folder code names you have created, and the full names that correspond to the codes are appropriate.
|
|
|
|
|
|
|
|
For example, if your chosen organisation code is _XXXXX_, and you have created a folder of that name, and within that folder, you have renamed the _EX1_ department code folder from the copied hierarchy, to _YY1_, you need to rename the _UKLSE_organisation.csv_ file, so that its name is _XXXXX_organisation.csv_, and rename the _UKLSE_EX1_department.csv_ file, that resides in the _YY1_ folder, so that its name is _XXXXX_YY1_department.csv_.
|
|
|
|
|
|
|
|
Similarly, if you have renamed the _ZT01_ project code folder from the copied hierarchy to _ZZ01_ you need to rename the _UKLSE_EX1_ZT01_project.csv_ file, so that its name is _XXXXX_YY1_ZZ1_project.csv_.
|
|
|
|
|
|
|
|
It is also necessary to rename the the _UKLSE_EX1_ZT01_001_tranche.csv_ file so that its name is equivalent to _XXXXX_YY1_ZZ1_001_tranche.csv_
|
|
|
|
|
|
|
|
Continuing with the same renaming logic, edit the equivalent of the file named _XXXXX_organisation.csv_ file, and change the content of the gfs.departmentCode and gfs.dpartmentName columns in row 2 from _EX1_ and “Example department for LSE Digital Toolkit users” to your equivalent of _YY1_ and to a corresponding department name.
|
|
|
|
|
|
|
|
The department code should be three characters in length, the first two alphabetic, the third a sequential number. More rows and folders, for other departments, can be added later, when you have become familiar with the toolkit.
|
|
|
|
|
|
|
|
Continuing with the same renaming logic, edit the file named the equivalent of XXXXX_YY1_department.csv that resides in the equivalent of the _YY1_ folder, and change the content of the gfs.projectCode and gfs.projectName columns in row 2 from _ZT01_ and "Example project for LSE Digital Toolkit users" to the equivalent of _ZZ01_ and corresponding project name (one that is relevant to your organisation).
|
|
|
|
|
|
|
|
The project code should be four characters in length, the first two characters alphabetic, the last two a sequential number.
|
|
|
|
|
|
|
|
Continuing with the same renaming logic, edit the equivalent of the file named
|
|
|
|
_XXXXX_YY1_ZZ01_project.csv_ that resides in the equivalent of the “_ZZ01_” folder, and change the content of the “isadg.title” and “isadg.repository” columns in row 2 to be appropriate for a tranche within your project. A tranche is a sub-component of a project. For example, a tranche could contain all the issues for a particular journal. The project could contain tranches for multiple journals.
|
|
|
|
|
|
|
|
## 8. Check that the project csv file validates correctly in the new hierarchy you have created for your organisation
|
|
|
|
|
|
|
|
Execute the script that validates your project csv file.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_validate_project_csv.pl Z:\GFS\\\<organisation code>\\\<department code>\\\<project code> default
|
|
|
|
|
|
|
|
## 9. Create s new tranche folder structure that reflects your content
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
Delete everything contained within the tranche folder named _001_ (which resides in the project code folder) EXCEPT for the tranche csv file which, according to the example naming logic, is called the equivalent of: _XXXXX_YY1_ZZ01_001_tranche.csv_ (where the codes are actually those relevant to your organisation).
|
|
|
|
|
|
|
|
Change the content of the renamed tranche csv file to reflect the content of a sub-component of the project that is appropriate to your organisation.
|
|
|
|
|
|
|
|
The numerical content of the “gfs.parentFolder” column represents a grouping mechanism for the content of the “gfs.childFolder” column.
|
|
|
|
|
|
|
|
Keep things as simple as possible for the purposes of this exercise.
|
|
|
|
|
|
|
|
After each change, run the gfs_validate_tranche_csv.pl script to check that the tranche csv file still passes validation.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_validate_tranche_csv.pl Z:\GFS\\\<organisation code>\\\<department code>\\\<project code>\001 default
|
|
|
|
|
|
|
|
Once the tranche csv file is complete and validates correctly, decide which folder-types should be created within the tranche (tif, jpg, text, mp4, etc) and run the “gfs_create_tranche_folder.pl” script, quoting the folder-types on the command line parameter list, rather than those indicated in the example below, and substituting in your own codes in the GFS path.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_create_tranche_folder.pl Z:\GFS\\\<organization code>\\\<department code>\\\<project code>\001 jpg text alto pdf
|
|
|
|
|
|
|
|
If it runs without error, you will observe that a folder structure has been created beneath the tranche folder, and that it reflects the content of the tranche csv file.
|
|
|
|
|
|
|
|
If the script reports that the folder-type was not recognised, open the gfs_folder_type_list.csv file in a suitable spreadsheet editor (such as Excel), and insert a new row for each of the folder-types that were not recognised into the alphabetical listing. Enter values in both the "gfs.folderName" and "gfs.fileNameExtension" columns, and then try running the gfs_create_tranche_folder.pl script again.
|
|
|
|
|
|
|
|
## 10. Populate the new tranche folder with asset files
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
In a production environment, the folders might be populated by a Digitisation Provider (which could be either an external contractor, or an internal department). The Digitisation Provider will probably have used the GFS File-naming Convention when creating the asset files. The folders could equally have been be populated by a migration script that transfers the files from a legacy collection to a tranche folder.
|
|
|
|
|
|
|
|
For the purposes of this exercise, manually copy some files from a collection of your own to the appropriate folder-types.
|
|
|
|
|
|
|
|
## 11. Validate that the folder structure has been correctly populated
|
|
|
|
|
|
|
|
This section is mandatory.
|
|
|
|
|
|
|
|
Run the folder validation script as follows, substituting in your chosen folder-types rather than those shown below:
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_validate_tranche_folder.pl n Z:\GFS\\\<organisation code>\\\<department code>\\\<project code>\001 jpg text alto pdf
|
|
|
|
|
|
|
|
View the screen output or script log file to ensure that the outcome meets with expectations.
|
|
|
|
|
|
|
|
## 12. Run the script that renames the files so that they comply with the GFS File-naming Convention for this tranche
|
|
|
|
|
|
|
|
This section is not mandatory.
|
|
|
|
|
|
|
|
All the scripts indicated in this "Getting started with the toolkit" guide will still work if the these files are not renamed to correspond with their location in the new tranche. See the [Script Groups](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/wikis/Script-groups) section to find out which scripts will not, function when the files do not abide by the GFS file-naming convention. There are some minimum filename requirements for files to abide by in order that be processed by the scripts. These requirements are stated in the [The Generic Folder Structure (GFS)](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_rename_tranche_files.pl Z:\GFS\\\<organisation code>\\\<department code>\\\<project code>\001 jpg text alto pdf
|
|
|
|
|
|
|
|
If that script runs with no errors, all the files in the specified folders will have been renamed.
|
|
|
|
|
|
|
|
## 13. Check that the file renaming was successful
|
|
|
|
|
|
|
|
This section is only mandatory if the instructions in section 11 were followed.
|
|
|
|
|
|
|
|
Run the gfs_validate_tranche_folder.pl script again, but this time with the first parameter set to “y”, rather than “n”. This instructs the script to check that the asset files abide by the GFS naming convention, and that the sequence numbers in the filenames in each folder-type are continuous.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_validate_tranche_folder.pl y Z:\GFS\\\<organisation code>\\\<department code>\\\<project code>\001 jpg text alto pdf
|
|
|
|
|
|
|
|
## 14. Create a package suitable for upload to Arkivum’s Digital Preservation Platform (Perpetua)
|
|
|
|
|
|
|
|
This section is not mandatory.
|
|
|
|
|
|
|
|
If you have installed 7-Zip, and made any necessary edits to the path for the 7z.exe executable file in the two relevant scripts, run the script “gfs_create_arkivum_upload.pl” to create the upload package.
|
|
|
|
|
|
|
|
perl H:\LSE_TK_PERL\gfs_create_arkivum_upload.pl y y "Z:\GFS\\\<organisation code>\\\<department code>\\\<project code>\001" preservation_and_access default pdf
|
|
|
|
|
|
|
|
The start of the output to the command window when the script is executed should look similar to that indicated in this screenshot:
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
The end of the output to the command window when the script is executed should look similar to that indicated in this screenshot:
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
In between, there will be output produced by the repeated execution of the 7z.exe file. Each time, the 7-Zip output is preceded by a line output of the form:
|
|
|
|
|
|
|
|
INFO: Processing parent folder: x child folder y
|
|
|
|
|
|
|
|
The upload-package-container-folder will be created automatically within the project folder. The upload-package-container-folder will have the following path:
|
|
|
|
|
|
|
|
Z:\GFS\\\<organisation code>\\\<department code>\\\<project code>\arkivum_v5_<organisation code>_<department code>_<project code>_001_preservation_and_access
|
|
|
|
|
|
|
|
Within this folder can be found a zip file:
|
|
|
|
|
|
|
|
"arkivum_v5_<organisation code>_<department code>_<project code>_001_preservation_and_access_upload.zip"
|
|
|
|
|
|
|
|
This is the zip file that can be uploaded to Perpetua.
|
|
|
|
|
|
|
|
As "preservation_and_access" was specified on the command line, before uploading the file, a corresponding slug would be entered manually into the AtoM module. The slug that should be entered manually into AtoM can be found in the second row of the "atom.qubitParentSlug" column in the metadata.csv file that is contained in the upload zip file.
|
|
|
|
|
|
|
|
If using the default configuration, the slug will take the form "<organisation code>-<department code><project code>". Using the value that was automatically created in the metadata.csv as a slug will ensure that the naming conventions present in the records of the AtoM module are consistent with those in the GFS.
|
|
|
|
|
|
|
|
For further information about slug creation see the [The Generic Folder Structure (GFS)](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/wikis/LSE-Digital-Toolkit#the-generic-folder-structure-gfs) section.
|
|
|
|
|
|
|
|
The way in which the tranche in the example GFS displays when uploaded to Perpetua can be seen [here](https://lse-atom.arkivum.net/uklse-ex1zt01).
|
|
|
|
|
|
|
|
A successful upload of this zip file will only occur if the content of the metadata.csv file matches the way in which the user's Perpetua instance has been configured. It will be necessary to contact Arkivum Technical Support to ensure that this is the case.
|
|
|
|
|
|
|
|
The composition of the metadata.csv file can be controlled by editing the “gfs_arkivum_column_header_info.csv” configuration file. Details of how to configure the toolkit can be found in the [Configuration](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/wikis/Configuration) section.
|
|
|
|
|
|
|
|
## 15. Tidying up your GFS
|
|
|
|
|
|
|
|
This section is optional.
|
|
|
|
|
|
|
|
The “UKLSE” folder, and everything beneath it, can now be deleted, so that the GFS only contains folders that are relevant to your organisation.
|
|
|
|
|
|
|
|
## 16 . What next?
|
|
|
|
|
|
|
|
After some more experimentation with the toolkit scripts (the instructions for each script can be found at the top by opening the scripts with a text editor such as Notepad), it would be appropriate to consult the [Configuration](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/wikis/Configuration) section in order to decide how best to set up the GFS so that it meets the requirements of your organisation. You can then add more departments, projects and tranches as required.
|
|
|
|
|
|
|
|
If you upload to Arkivum's Perpetua, once you are confident that your version of the GFS is well founded, contact Arkivum Technical Support to ask them to configure your system so that upload files produced by the gfs_create_arkivum_upload.pl script can be processed.
|
|
|
|
|
|
|
|
The [Workflows](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/wikis/LSE-Digital-Toolkit#workflows) section indicates how some scripts can be used in conjunction with others to achieve particular outcomes.
|
|
|
|
|
|
|
|
[Return to documentation home page](https://git.lse.ac.uk/bywell/lse-digital-toolkit-perl-version/-/wikis/LSE-Digital-Toolkit) |
|
|
|
\ No newline at end of file |