Verify where files are stored

The following script prints the storage locations of all files in a Dataset. This includes the cloud storage locations for private cloud data, and Encord storage location for local data in the Dataset. Knowing where your files are storage helps to cross-verify that all data from a cloud bucket has been added to a Dataset.

ℹ️

Note

To learn how to view the storage locations of all files in a Project, see our documentation here.

In the following script, ensure that you:

  • Replace <private_key_path> with the path to your private key.
  • Replace <dataset_hash> with the hash of the Dataset you want to know the storage locations for.
# Import dependencies
from encord import EncordUserClient, Project,Dataset
from encord.objects.project import ProjectDataset
from encord.orm.dataset import DatasetAccessSettings

# Instantiate client
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
)

# Print URLs of all files in the Dataset
dataset_level_file_links = []
dataset: Dataset = user_client.get_dataset("<dataset_hash>")
for data in dataset.list_data_rows():
    dataset_level_file_links.append(data.file_link)
print(dataset_level_file_links)
['https/my-aws-bucket/iaDJxNNrMuQMtFPcP9oszw2OCHm2/0f31636f-54cc-4f8f-b556-baba47bfbda1', 'https/my-aws-bucket/iaDJxNNrMuQMtFPcP9oszw2OCHm2/9c202a5a-7c79-4a95-91ef-37adcb331d92']