Use HTAR with your SDA account

On this page:


Before you begin

Important:

Files containing PHI must be encrypted when they are stored (at rest) and when they are transferred between networked systems (in transit). To ensure that files containing PHI are encrypted when they are stored, encrypt them before transferring them to storage. To ensure that files containing PHI remain encrypted during transit, use SFTP/SCP or the IU Globus Web App. For more, see Recommended tools for encrypting data containing HIPAA-regulated PHI.

About HTAR

The HPSS TAR (HTAR) command-line utility lets you create and work with .tar archives in HPSS. With HTAR, you can aggregate files stored on your local filesystem into .tar archives that are written directly HPSS. HTAR writes new archives directly to HPSS without creating intermediate archives on your local system or using HSI (or some other HPSS data transfer tool) to place archives in HPSS.

As HTAR creates each archive, it automatically builds a corresponding external index (.idx) file and stores it in the same HPSS directory as the archive. HTAR also can build (or rebuild) an index file for an HPSS .tar archive that does not have one, either because the archive was created using some other utility, or because the index was accidentally deleted.

Additionally, you can use HTAR to extract the entire contents of an HPSS archive to your local filesystem, or retrieve only certain specified files and/or directories.

HTAR command syntax

The general syntax for HTAR commands is as follows:

htar [action_options] -f [archive_name] [control_options] [file_list]

At least one action option, plus the -f option for specifying the archive's filename, are always required. For [file_list], indicate which files should be archived, extracted, or processed; use a space-delimited list of files or directory names (wildcard characters are accepted). By default, HTAR copies files from your current local directory into an archive file it creates in your HPSS home directory. To target an alternate source or destination directory, specify the path relative to your local or SDA home directory.

HTAR at IU

At Indiana University, HTAR is available on the IU research supercomputers, allowing you to create and work with .tar archives on the Scholarly Data Archive (SDA). To use HTAR, you first must add it to your user environment by loading the hpss module; on the command line, enter:

module load hpss

You can save your customized user environment so that it loads every time you start a new session; for instructions, see Use modules to manage your software environment on IU research supercomputers.

Once the hpss module is loaded, you can execute HTAR commands from the system's command line.

Alternatively, for use on your personal workstation, you may email the UITS Research Storage team (store-admin@iu.edu) to request HTAR bundled with HSI. Bundles are available for Red Hat Enterprise Linux 5 and 6, Ubuntu Linux, macOS, and Windows (running Cygwin).

HTAR limitations

While the archive created by HTAR can be of unlimited size (within the SDA's capacity), be aware of the following limitations:

  • File size: An individual file within the archive may not be larger than 68 GB.
  • Directory paths: The directory path of any file may not exceed 154 characters in length.
  • File names: File names may not exceed 99 characters in length.
  • Number of files: A single HTAR archive may contain a maximum of 1 million files.
Note:
File name and directory path limits in HTAR are separate from (and more restrictive than) similar limits in HPSS on the SDA; for more, see Directory path and file name limits in About the Scholarly Data Archive (SDA) at Indiana University.

Use HTAR to create an archive on the SDA

The following examples demonstrate how to use HTAR on IU research supercomputers to create .tar archives on the SDA.

  • To copy all files in the current local directory into an archive (for example, my_archive.tar) that's created in your SDA home directory, on the command line, enter:
    htar -c -f my_archive.tar *

    In this command, the -c action option opens a connection to the SDA and copies all files in your home directory (denoted by the * wildcard character) into an archive that's created in the current working directory. The -f option assigns the archive a name (my_archive.tar).

    Note:
    HTAR will overwrite a pre-existing archive of the same name without prompting you.
  • To copy every file stored in a local subdirectory (for example, ~/my_files) into an archive (for example, my_files.tar) that's created in a pre-existing SDA subdirectory (for example, my_archives), specify each path relative to the respective home directory; for example:
    htar -c -f my_archives/my_files.tar "my_files"

    In this command, the -c action option opens a connection to HPSS and copies all files from the specified local directory (~/my_files) into an archive that's created in the specified SDA subdirectory; the -f option specifies the path to the archive and its name.

    Note:

    Do not include a tilde to represent your home directory (~/) in the path to your local subdirectory. If you include the tilde (~) representing your local home directory in your HTAR file list, each entry in the resulting archive's index file will be prepended with the absolute path from your local system's root directory. This becomes an issue when you use HTAR to extract files from that archive, as HTAR uses the absolute paths prepended to the archive's index entries to create a new set of nested subdirectories locally, and then stores the extracted files in the bottom-level directory.

    For example, if user darvader on Big Red 200 is archiving files from the ~/death_star directory but includes the tilde in the local path (enters "~/death_star" instead of "death_star") of the HTAR file list, all index entries for the resulting archive will be prepended with N/u/darvader/BigRed200. Afterward, when the user wants to extract files from that archive, HTAR will read the archive's index entries, and consequently save the extracted files locally to ~/N/u/darvader/BigRed200/death_star (and not to ~/death_star).

  • To create an HPSS archive (for example, my_archive.tar) in an SDA directory that does not already exist, add the -P control option to automatically create any non-existing subdirectories included in the archive file's pathname:
    htar -c -f new_directory/new_subdirectory/my_archive.tar -P "local_dir"

Use HTAR to create an index for an HPSS archive

For each archive created in the above examples, HTAR simultaneously creates a corresponding index file (for example, my_archive.tar.idx) and stores it in the same HPSS directory as the archive.

You can use HTAR to recreate an index that has been accidentally deleted, or to create an index for an existing .tar archive that was created with another application.

If the index file for an archive (for example, archive_name.tar) is missing, you will see the following error when you try to list or extract the files it contains:

"No such file: archive_name.idx"

To (re)build an index file for an HPSS .tar archive (for example, old_archive.tar) that's missing its index, on the command line of your local system, enter:

htar -Xf old_archive.tar

In this command, the -X action option opens a connection to HPSS, reads the old_archive.tar file indicated by the -f option, builds an index file for the archive (for example, old_archive.tar.idx), and stores it in the same directory as the archive.

Use HTAR to extract files from an SDA archive

The following examples demonstrate how to use HTAR to extract files from an archive stored on your SDA account.

Note:
HTAR extracts files into the current working directory on your local host. To extract files into a new directory, create the new directory first, and then change (cd) into the new directory before running HTAR.
  • To extract all files from an archive (for example, my_archive) stored in your SDA home directory, on your local system's command line, enter:
    htar -x -f my_archive.tar

    In this command, the -x action option opens a connection to HPSS and extracts the entire contents of the archive specified by the -f option (my_archive.tar).

  • To extract one or more specific files or directories from an archive without retrieving the entire archive, on your local system's command line, enter:
    htar -xvf test.tar file1 file4 file7

    In this command, the -x action option opens a connection to HPSS and, from the archive specified by the -f option (test.tar), extracts only the files listed (file1, file4, and file7).

    Note:

    Because HTAR leaves processing of wildcard characters to the shell, you cannot use * to select multiple filenames when retrieving files from an archive stored in HPSS. To display the names of the files in contained in an archive (for example, archive_10.tar) stored in your HPSS home directory, on your local system's command line, enter:

    htar -tf archive_10.tar

    In this command, the -t action option lists the files contained in the archive specified by the -f option (archive_10.tar). Files are listed in the order in which they appear in the archive.

Alternative authentication methods

By default, HTAR will prompt for login information (known as the "combo" authentication method). You also can set the authentication method explicitly by defining the HPSS_AUTH_METHOD environment variable; for example:

  • In the csh or tcsh, enter:
    setenv HPSS_AUTH_METHOD combo
  • In the ksh or bash shell, enter:
    export HPSS_AUTH_METHOD=combo

Alternatively, if your binaries are built with the appropriate method, you can use the HPSS_AUTH_METHOD environment variable to enable authentication based on either existing Kerberos credentials (known as the "Kerberos" method) or Kerberos keytabs (known as the "keytab" method):

  • Kerberos: To define the HPSS_AUTH_METHOD environment variable to enable the "kerberos" authentication method:
    • In the csh or tcsh shell, enter:
      setenv HPSS_AUTH_METHOD kerberos
    • In the ksh or bash shell, enter:
      export HPSS_AUTH_METHOD=kerberos
  • Keytab: To use the "keytab" method, you also must define the HPSS_KEYTAB_PATH environment variable (using the path to your keytab file) and the HPSS_PRINCIPAL environment variable (using the appropriate login name). For example, to define the required environment variables to enable the "keytab" authentication method:
    • In the csh or tcsh shell, enter the following (replace username with the appropriate login name and path/to/my_keytab with the path to your keytab file):
      setenv HPSS_PRINCIPAL username
      setenv HPSS_AUTH_METHOD keytab
      setenv HPSS_KEYTAB_PATH /path/to/my_keytab
    • In the ksh or bash shell, enter the following (replace username with the appropriate login name and path/to/my_keytab with the path to your keytab file):
      export HPSS_PRINCIPAL=username
      export HPSS_AUTH_METHOD=keytab
      export HPSS_KEYTAB_PATH=/path/to/my_keytab

This is document awgg in the Knowledge Base.
Last modified on 2023-10-03 09:55:10.