All Datasets are identified using a unique ID called a <dataset_hash>
, which can be found in the Encord platform.
Tip
To learn how to import cloud data into Encord, see our documentation here.
Upload private cloud data
All types of data (videos, images, image groups, image sequences, and DICOM) from a private cloud are added to a Dataset in the exact same way.
Use the script below to upload your private cloud data to a specified Dataset.
- Replace <dataset_hash> with the ID of the Dataset you want to upload your data to.
- Replace <integration_title> with the title of the integration you want to use. You can see all available integrations in the Encord platform, or using the SDK.
- Replace
path/to/json/file.json
with the path to your JSON file.
Tip
If the following script returns
"Upload is still in progress, try again later!"
, check the upload status at a later time.
# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus
# Instantiate user client. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to upload data to by replacing <dataset_hash> with the dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Specify the integration you want to upload data to by replacing <integration_title> with the integration title
integrations = user_client.get_cloud_integrations()
integration_idx = [i.title for i in integrations].index("<integration_title>")
integration = integrations[integration_idx].id
# Initiate cloud data upload. Replace path/to/json/file.json with the path to your JSON file
upload_job_id = dataset.add_private_data_to_dataset_start(
integration, "path/to/json/file.json"
)
# timeout_seconds determines how long the code will wait after initiating upload until continuing and checking upload status
res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")
if res.status == LongPollingStatus.PENDING:
print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
print("Upload completed without errors")
else:
print(f"Errors: {res.errors}")
add_private_data_to_dataset job started with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
SDK process can be terminated, this will not affect successful job execution.
You can follow the progress in the web app via notifications.
add_private_data_to_dataset job completed with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
Execution result: DatasetDataLongPolling(status=<LongPollingStatus.DONE: 'DONE'>, data_hashes_with_titles=[DatasetDataInfo(data_hash='cd42333d-8014-46q7-837b-5bf68b9b5', title='funny_image.jpg')], errors=[], units_pending_count=0, units_done_count=1, units_error_count=0)
Upload completed without errors
Check data upload
If the code returns "Upload is still in progress, try again later!"
, run the following code to query the Encord server again. Replace upload_job_id
with the output by the previous code. In the example above upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727
.
# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus
# Instantiate user client
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="/Users/encord/.ssh/new-key-db-private-key.txt")
# Check upload status
res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")
if res.status == LongPollingStatus.PENDING:
print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
print("Upload completed without errors")
else:
print(f"Errors: {res.errors}")
Tip
Omitting the
timeout_seconds
argument from the add_private_data_to_dataset_get_result() method performs status checks until the status upload has finished.
Local data
Uploading videos
Use the upload_video() method to upload a video to a Dataset specified using the <dataset_hash>
.
# Import dependencies
from encord import Dataset, EncordUserClient
# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
ssh_private_key_path="<private_key_path>"
)
# Specify the Dataset you want to upload your video(s) to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
"<dataset_hash>"
)
# Upload the video to the Dataset by specifying the file path to the video
dataset.upload_video(
"path/to/your/video.mp4"
)
Uploading single images
Use the upload_image() method to upload a single image to a dataset specified using the <dataset_hash>
.
# Import dependencies
from encord import Dataset, EncordUserClient
# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
ssh_private_key_path="<private_key_path>"
)
# Specify the Dataset you want to upload your images to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
"<dataset_hash>"
)
# Upload the image to the Dataset by specifying the file path to the image
dataset.upload_image(
"path/to/your/image.jpeg"
)
Uploading image groups & image sequences
Tip
Confused about the difference between image groups and image sequences? Click here to learn more!
Use the create_image_group() method to combine images into image groups and image sequences, and add it to a Dataset.
Image groups
Image groups are created using the create_image_group()
method with create_video=False
as an argument. Specify the file paths of each image you want to include in the image group in the script below.
Tip
Images in an image group will be assigned a
data_sequence number
, which is based on the order or the files listed in the argument tocreate_image_group()
. If the ordering is important to you, make sure that your filenames are listed in the correct order.
# Import dependencies
from encord import Dataset, EncordUserClient
# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
ssh_private_key_path="<private_key_path>"
)
# Specify the Dataset you want to upload your image group to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
"<dataset_hash>"
)
# Create the image group. Include the paths of all images that are to be included in the image group.
# The create_video flag must to be set to False
dataset.create_image_group(
[
"path/to/your/img1.jpeg",
"path/to/your/img2.jpeg",
],
create_video=False
)
Image sequences
Image sequences are created using the create_image_group()
method. Image sequences can only be composed of images that have the same dimensions. Images with different dimensions are made into separate image sequences. Learn more about image sequences here.
Note
create_video
is set toTrue
by default and can therefore be omitted when creating an image sequence.
Tip
Learn the difference between image groups and image sequences here.
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
ssh_private_key_path="<private_key_path>"
)
# Specify the Dataset you want to upload your image sequence to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
"<dataset_hash>"
)
# Create the image sequence. Include the paths of all images that are to be included in the image sequence.
# The create_video flag must to be set to False
dataset.create_image_group(
[
"path/to/your/img1.jpeg",
"path/to/your/img2.jpeg",
],
create_video=True
)
Note
Image sequences are composed of images with the same resolution. If
img1.jpeg
andimg2.jpeg
are of shape [1920, 1080] and [1280, 720], respectively, each ends up in their own image sequence.
Uploading DICOM series
In the following script, replace path/to/your/dicom-img1.jpeg
and the other example file paths with the paths to the files you want to include in your DICOM series.
# Import dependencies
from encord import Dataset, EncordUserClient
# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
ssh_private_key_path="<private_key_path>"
)
# Specify the Dataset you want to upload your DICOM files to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
"<dataset_hash>"
)
# Add a DICOM series to the Dataset by specifying the file path to all files to include.
dataset.create_dicom_series(
[
"path/to/your/dicom-img1.jpeg",
"path/to/your/dicom-img2.jpeg",
"path/to/your/dicom-img3.jpeg"
]
)
Reading and updating data
To inspect data within a dataset use the .data_rows()
property in the Dataset class. .data_rows()
returns a list of DataRows. Check our documentation for the DataRow class for information on which fields can be accessed and updated.