|
|
|
[[_TOC_]]
|
|
|
|
|
|
|
|
The workflows indicated below assume that file fixity validation is required as part of the processing.
|
|
|
|
|
|
|
|
Instructions on how to form the command line for each script can be found at the top of each script.
|
|
|
|
|
|
|
|
Both of the workflows indicated below begin with the instruction to assess whether a script-based migration of the material is possible. If the current configuration of the asset-files in their pre-migration legacy location lacks structure in terms of: naming-convention of the folders; consistency of the depth of the folder structure; the naming convention associated with the asset-files, a script-based migration may not be feasible, and a manual migration of the asset-files into the asset-folders of the tranche should be considered (assuming someone has the time available to complete such a task).
|
|
|
|
|
|
|
|
## Migration workflow for digitised material
|
|
|
|
|
|
|
|
Assess whether a) a script-based migration is possible and b) whether the asset-files will be renamed to the GFS File-naming Convention as part of the workflow.
|
|
|
|
|
|
|
|
Determine the required “Child Item Set” granularity for the material to be migrated into the project.
|
|
|
|
|
|
|
|
Conceptually divide the project into tranches according to subject matter, manageable size, etc
|
|
|
|
|
|
|
|
Enter rows into the project csv file that correspond to the tranches and add the parent-child metadata to the tranche csv file(s).
|
|
|
|
|
|
|
|
Run gfs_validate_project_csv.py.
|
|
|
|
|
|
|
|
The following section of the workflow should be repeated for each tranche within the project.
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_csv.py.
|
|
|
|
|
|
|
|
Run gfs_create_tranche_folder.py and list all the types of asset-folders that should be in the tranche.
|
|
|
|
|
|
|
|
Run gfs_create_or_delete_tranche_checksum_folder.py with the <action> parameter set to "create_folder" and list the same set of asset-folders that were present in the previous command line.
|
|
|
|
|
|
|
|
It is only possible to migrate asset-files for one type of asset-folder at a time, and the child-folder cells in the "gfs.legacyPath" column must be set, prior to running the gfs_migrate_tranche_folder.py script, to template-values that are appropriate for legacy locations of the asset-files to be migrated.
|
|
|
|
|
|
|
|
Therefore, the following steps should be performed for each asset-folder type that is contained in the tranche.
|
|
|
|
|
|
|
|
- Populate the “gfs.legacyPath” column for the child-folder rows in the tranche csv files according to the instructions at the top of the gfs_migrate_tranche_folder.py script to provide a path to the asset-files in their legacy location.
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_csv.py (this is needed in case you’ve accidentally deleted/modified some metadata within the tranche.csv file while populating the “gfs.legacyPath” column)
|
|
|
|
|
|
|
|
- Run gfs_migrate_tranche_folder.py with <action> set to “test” (this flags up any asset-folder within a child-folder that will not be populated with any asset-files. If this is the case, the relevant entry in the "gfs.legacyPath" column of the tranche csv should be corrected)
|
|
|
|
|
|
|
|
- Run gfs_create_tranche_checksum_file.py with <asset file location> set to “legacy” (this creates checksums in checksum-manifest-files in the checksum-folders that correspond to the content of the asset-files in the their legacy location).
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_checksum.py with <asset file location> set to “legacy”.
|
|
|
|
|
|
|
|
- Run gfs_migrate_tranche_folder.py with <action> set to “copy”.
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_folder.py for the asset-folder, setting the <gfs filenaming convention flag> parameter to “n”.
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_checksum.py with <asset file location> set to “gfs” and <asset-folder to process> set to the asset-folder type.
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_folder.py with the <gfs filenaming convention flag> parameter set to <n> for all the asset-folders types that exist in the tranche (this checks that all the asset-folders have been populated with at least one asset-file)
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_checksum.py with <asset file location> set to “gfs” and listing all the asset-folder types.
|
|
|
|
|
|
|
|
If no errors are reported, it has now been established that all the asset-files have retained their fixity after being copied over from the legacy file location. However, if the requirement is to rename the asset-files so that they accord with the “GFS Filenaming Convention”, it is now necessary to delete all the checksum-files prior to the renaming because the names contained in the checksum-files need to match the names of the actual asset-files.
|
|
|
|
|
|
|
|
Run gfs_create_or_delete_tranche_checksum_folder.py with the <action> parameter set to “delete_only_content_of_folder” and listing all the asset-folder types.
|
|
|
|
|
|
|
|
Run gfs_rename_tranche_files.py for all the asset-folder types contained in the tranche.
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_folder.py with the <gfs filenaming convention flag> parameter set to “y” for all the asset-folder types contained in the tranche.
|
|
|
|
|
|
|
|
Run gfs_create_tranche_checksum_file.py with <asset file location> set to “gfs” and listing all the asset-folder types contained in the tranche.
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_checksum.py with <asset file location> set to “gfs” and listing all the asset-folders types contained in the tranche.
|
|
|
|
|
|
|
|
|
|
|
|
## Migration workflow for born-digital material
|
|
|
|
|
|
|
|
This workflow assumes the user does not wish to rename the asset-files according to the GFS Filenaming Convention, as is the norm for born-digital material because this will retain the archival integrity of the files. However, if the user does wish to do so, once the asset-files have been copied across from the legacy location and the checksums verified, the checksum-files can be deleted and recreated after the renaming of the asset-files (this has not been included in this workflow but can be seen in the equivalent workflow for digitised material).
|
|
|
|
|
|
|
|
Assess whether a script-based migration is possible.
|
|
|
|
|
|
|
|
Determine the required “Child Item Set” granularity for the material to be migrated into the project.
|
|
|
|
|
|
|
|
Conceptually divide the project into tranches according to subject matter, manageable size, etc
|
|
|
|
|
|
|
|
Enter rows into the project csv file that correspond to the tranches and add the parent-child metadata to the tranche csv file(s).
|
|
|
|
|
|
|
|
Run gfs_validate_project_csv.py.
|
|
|
|
|
|
|
|
The following section of the workflow should be repeated for each tranche within the project.
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_csv.py.
|
|
|
|
|
|
|
|
Run gfs_create_tranche_folder.py and list all the types of asset-folders that should be in the tranche on the command line.
|
|
|
|
|
|
|
|
Run gfs_create_or_delete_tranche_checksum_folder.py with the <action> parameter set to "create_folder" and list the same set of asset-folders that were present in the previous command line.
|
|
|
|
|
|
|
|
It is only possible to migrate asset-files for one type of asset-folder at a time, and the child-folder cells in the "gfs.legacyPath" column must be set, prior to running the gfs_migrate_tranche_folder.py script, to template-values that are appropriate for legacy locations of the asset-files to be migrated.
|
|
|
|
|
|
|
|
Therefore, the following steps should be performed for each asset-folder type that is contained in the tranche.
|
|
|
|
|
|
|
|
- Populate the “gfs.legacyPath” column for the child-folder rows in the tranche csv files according to the instructions at the top of the gfs_migrate_tranche_folder.py script to provide a path to the asset-files in their legacy location.
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_csv.py (this is needed in case you’ve accidentally deleted/modified some metadata within the tranche.csv file while populating the “gfs.legacyPath” column)
|
|
|
|
|
|
|
|
- Run gfs_migrate_tranche_folder.py with <action> set to “test” (this flags up any asset-folder within a child-folder that will not be populated with any asset-files. If this is the case, the relevant entry in the "gfs.legacyPath" column of the tranche csv should be corrected)
|
|
|
|
|
|
|
|
- Run gfs_create_tranche_checksum_file.py with <asset file location> set to “legacy” (this creates checksums in checksum-manifest-files in the checksum-folders that correspond to the content of the asset-files in the their legacy location).
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_checksum.py with <asset file location> set to “legacy”.
|
|
|
|
|
|
|
|
- Run gfs_migrate_tranche_folder.py with <action> set to “copy”.
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_folder.py for the asset-folder, setting the <gfs filenaming convention flag> parameter to “n”.
|
|
|
|
|
|
|
|
- Run gfs_validate_tranche_checksum.py with <asset file location> set to “gfs” and <asset-folder to process> set to the asset-folder type.
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_folder.py with the <gfs filenaming convention flag> parameter set to "n" for all the asset-folders types that exist in the tranche (this will indicate any asset folders that have no asset-files in them so that you can judge whether this accords with expectations)
|
|
|
|
|
|
|
|
Run gfs_validate_tranche_checksum.py with <asset file location> set to “gfs” and listing all the asset-folder types (this will indicate any mismatches between the asset-files and their corresponding checksum-files).
|
|
|
|
|
|
|
|
## Possible post-migration actions
|
|
|
|
|
|
|
|
The user might wish to upload the tranches to a preservation and dissemination platform.
|
|
|
|
|
|
|
|
One example of achieving this, in relation to the Arkivum platform, would be to run the gfs_create_arkivum_upload.py script which creates a BagIt folder package that can be ingested into the Arkivum platform.
|
|
|
|
|
|
|
|
The user will also probably wish to periodically validate the file fixity of the asset-files by adding the appropriate lines for the tranches to the gfs_batch_validate_checksum.bat script. This script can be run periodically to validate the file fixity of all the asset-files in the entire GFS. |