... | ... | @@ -45,7 +45,9 @@ For example, inside the checksum-folder named "jpg_checksum_md5" mentioned above |
|
|
An example of the content of a manifest-md5.txt file that has three rows, can be seen below.
|
|
|
|
|
|
116907a4ca1efc40a57d48ab1db7adfc5 UKLSE_EX1_ZT01_001_001_0001_0001.jpg
|
|
|
|
|
|
5501bc5ef3f7dc0d09e7e4d073d4902d7 UKLSE_EX1_ZT01_001_001_0001_0002.jpg
|
|
|
|
|
|
f9fd6bd53b67bf22188ba1597ced3ee7d UKLSE_EX1_ZT01_001_001_0001_0003.jpg
|
|
|
|
|
|
Once the checksum-folders and checksum-manifest-files have both been created for a tranche, the gfs_validate_tranche_checksum.py script can be run to check that newly calculated values for the checksums for the asset-files match the checksum values contained in the checksum-file.
|
... | ... | @@ -54,6 +56,7 @@ If the two values are not identical for any asset-file, the script will report t |
|
|
The gfs_batch_validate_checksum.bat script can be set up so that, with just one click of a file-icon, the gfs_validate_tranche_checksum.py script can be run against all the tranches in the entire GFS.
|
|
|
|
|
|
If the asset-files are located on a legacy drive and are to be migrated into the GFS, using the gfs_migrate_tranche_folder.py script, it is possible to check that the fixity of the asset-files has been retained, as they are copied or moved, by first creating the appropriate checksum-folders, and then running the gfs_create_tranche_checksum_file.py script with the <asset file location> parameter set to "legacy" (rather than "gfs").
|
|
|
|
|
|
This will result in the path for the asset-files being determined by the path specified in the "gfs.legacyPath" column within the tranche.csv file, rather than the asset-folders within the GFS.
|
|
|
So, after the asset-files have been copied or moved into the asset-folders of the GFS, the gfs_validate_tranche_checksum.py script can be run to check that the checksums pre-migration, match the checksums post-migration.
|
|
|
|
... | ... | @@ -61,11 +64,13 @@ So, after the asset-files have been copied or moved into the asset-folders of th |
|
|
|
|
|
It is recommended that fixity checking be applied to all the asset-folders within a tranche.
|
|
|
It could be argued that it is only worth checking the fixity of asset-files such as tiff and wav files because the asset-files that are derived from them (such as jpg and mp3) could be reconstituted, but it will be simpler from an operational point of view to just do them all, and not have to ponder on which types of asset-files merit inclusion.
|
|
|
|
|
|
The storage space required for the checksum-manifest-files is trivial, and the derived-files tend to be small, so the time taken to create the checksums is unlikely to be a significant component of the time taken for the entire process.
|
|
|
|
|
|
Since the md5 checksum was developed, other checksums of greater length have been developed. The need for increased lenght came from applications in which checksums are used in applications relating to security.
|
|
|
|
|
|
File fixity validation is not a process that benefits from having checksums with a greater length and the checksum types that are longer take a greater amount of time to construct.
|
|
|
It is therefore recommended that the md5 checksum type is used because the scripts will take less time to coomplete their processing compared with when using longer checksums.
|
|
|
It is therefore recommended that the md5 checksum type is used because the scripts will take less time to complete their processing compared with when using longer checksums.
|
|
|
The only other factor to take into account in selecting a checksum type is whether the platform into which a tranche may be ingested validates the checksum values and if so, which checksum types are catered for.
|
|
|
|
|
|
[Return to documentation home page](https://git.lse.ac.uk/hub/lse_digital_toolkit/-/wikis/LSE-Digital-Toolkit) |
|
|
\ No newline at end of file |