2. Import Data using the SDK

Clickable Div Import data from AWS Import data from Azure Import data from GCP Import data from OTC

Upload private cloud data

All types of data (videos, images, image groups, image sequences, and DICOM) from a private cloud are added to a Dataset in the exact same way.

Use the script below to upload your private cloud data to a specified Dataset.

The script has several possible outputs:

  • "Upload is still in progress, try again later!": The upload has not finished. Run this script again later to check if the upload has finished.

  • "Upload completed": The upload completed. If any files failed to upload, the URLs are listed.

  • "Upload failed": The entire upload failed, and not just individual files. Ensure your JSON file is formatted correctly.


# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus

# Instantiate user client. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")

# Specify the dataset you want to upload data to by replacing <dataset_hash> with the dataset hash
dataset = user_client.get_dataset("<dataset_hash>")

# Specify the integration you want to upload data to by replacing <integration_title> with the integration title
integrations = user_client.get_cloud_integrations()
integration_idx = [i.title for i in integrations].index("<integration_title>")
integration = integrations[integration_idx].id

# Initiate cloud data upload. Replace path/to/json/file.json with the path to your JSON file
upload_job_id = dataset.add_private_data_to_dataset_start(
    integration, "path/to/json/file.json", ignore_errors=True
)

# timeout_seconds determines how long the code will wait after initiating upload until continuing and checking upload status
res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")

if res.status == LongPollingStatus.PENDING:
    print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
    print("Upload completed")
else:
    print(f"Upload failed: {res.errors}")
add_private_data_to_dataset job started with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
SDK process can be terminated, this will not affect successful job execution.
You can follow the progress in the web app via notifications.
add_private_data_to_dataset job completed with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
Execution result: DatasetDataLongPolling(status=<LongPollingStatus.DONE: 'DONE'>, data_hashes_with_titles=[DatasetDataInfo(data_hash='cd42333d-8014-46q7-837b-5bf68b9b5', title='funny_image.jpg')], errors=[], units_pending_count=0, units_done_count=1, units_error_count=0)
Upload completed

Check data upload

If the code returns "Upload is still in progress, try again later!", run the following code to query the Encord server again. Ensure that you replace <upload_job_id> with the output by the previous code. In the example above upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.

The script has several possible outputs:

  • "Upload is still in progress, try again later!": The upload has not finished. Run this script again later to check if the upload has finished.

  • "Upload completed": The upload completed. If any files failed to upload, the URLs are listed.

  • "Upload failed": The entire upload failed, and not just individual files. Ensure your JSON file is formatted correctly.

# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus

upload_job_id = <upload_job_id>

# Authenticate with Encord using the path to your private key. 
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Specify the dataset you want to upload data to by replacing <dataset_hash> with the dataset hash
dataset = user_client.get_dataset("<dataset_hash>")

res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")

if res.status == LongPollingStatus.PENDING:
    print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
    print("Upload completed")
    if res.data_unit_errors:
        print("The following URLs failed to upload:")
        for e in res.data_unit_errors:
            print(e.object_urls)
else:
    print(f"Upload failed: {res.errors}")

👍

Tip

Omitting the timeout_seconds argument from the add_private_data_to_dataset_get_result() method performs status checks until the status upload has finished.