darwin.dataset packageο
Submodulesο
darwin.dataset.download_manager moduleο
Holds helper functions that deal with downloading videos and images.
- darwin.dataset.download_manager.download_all_images_from_annotations(api_key: str, api_url: str, annotations_path: Path, images_path: Path, force_replace: bool = False, remove_extra: bool = False, annotation_format: str = 'json', use_folders: bool = False, video_frames: bool = False, force_slots: bool = False, ignore_slots: bool = False) Tuple[Callable[[], Iterable[Any]], int] [source]ο
Downloads the all images corresponding to a project.
- Parameters:
api_key (str) β API Key of the current team
api_url (str) β Url of the darwin API (e.g. βhttps://darwin.v7labs.com/api/β)
annotations_path (Path) β Path where the annotations are located
images_path (Path) β Path where to download the images
force_replace (bool, default: False) β Forces the re-download of an existing image
remove_extra (bool, default: False) β Removes existing images for which there is not corresponding annotation
annotation_format (str, default: "json") β Format of the annotations. Currently only JSON and xml are expected
use_folders (bool, default: False) β Recreate folders
video_frames (bool, default: False) β Pulls video frames images instead of video files
force_slots (bool) β Pulls all slots of items into deeper file structure ({prefix}/{item_name}/{slot_name}/{file_name})
- Returns:
generator (function) β Generator for doing the actual downloads
count (int) β The files count
- Raises:
ValueError β If the given annotation file is not in darwin (json) or pascalvoc (xml) format.
Deprecated since version 0.7.5: This will be removed in 0.8.0. The api_url parameter will be removed.
- darwin.dataset.download_manager.download_image_from_annotation(api_key: str, api_url: str, annotation_path: Path, images_path: Path, annotation_format: str, use_folders: bool, video_frames: bool, force_slots: bool, ignore_slots: bool = False) None [source]ο
Dispatches functions to download an image given an annotation.
- Parameters:
api_key (str) β API Key of the current team
api_url (str) β Url of the darwin API (e.g. βhttps://darwin.v7labs.com/api/β)
annotation_path (Path) β Path where the annotation is located
images_path (Path) β Path where to download the image
annotation_format (str) β Format of the annotations. Currently only JSON is supported
use_folders (bool) β Recreate folder structure
video_frames (bool) β Pulls video frames images instead of video files
force_slots (bool) β Pulls all slots of items into deeper file structure ({prefix}/{item_name}/{slot_name}/{file_name})
- Raises:
NotImplementedError β If the format of the annotation is not supported.
Deprecated since version 0.7.5: This will be removed in 0.8.0. The api_url parameter will be removed.
- darwin.dataset.download_manager.lazy_download_image_from_annotation(api_key: str, annotation_path: Path, images_path: Path, annotation_format: str, use_folders: bool, video_frames: bool, force_slots: bool, ignore_slots: bool = False) Iterable[Callable[[], None]] [source]ο
Returns functions to download an image given an annotation. Same as download_image_from_annotation but returns Callables that trigger the download instead fetching files interally.
- Parameters:
api_key (str) β API Key of the current team
annotation_path (Path) β Path where the annotation is located
images_path (Path) β Path where to download the image
annotation_format (str) β Format of the annotations. Currently only JSON is supported
use_folders (bool) β Recreate folder structure
video_frames (bool) β Pulls video frames images instead of video files
force_slots (bool) β Pulls all slots of items into deeper file structure ({prefix}/{item_name}/{slot_name}/{file_name})
- Raises:
NotImplementedError β If the format of the annotation is not supported.
- darwin.dataset.download_manager.download_image_from_json_annotation(api_key: str, api_url: str, annotation_path: Path, image_path: Path, use_folders: bool, video_frames: bool) None [source]ο
Downloads an image given a
.json
annotation path and renames the json after the imageβs filename.- Parameters:
api_key (str) β API Key of the current team
api_url (str) β Url of the darwin API (e.g. βhttps://darwin.v7labs.com/api/β)
annotation_path (Path) β Path where the annotation is located
image_path (Path) β Path where to download the image
use_folders (bool) β Recreate folders
video_frames (bool) β Pulls video frames images instead of video files
Deprecated since version 0.7.5: This will be removed in 0.8.0. Use the
download_image_from_annotation
instead.
- darwin.dataset.download_manager.download_image(url: str, path: Path, api_key: str) None [source]ο
Helper function: downloads one image from url.
- Parameters:
url (str) β Url of the image to download
path (Path) β Path where to download the image, with filename
api_key (str) β API Key of the current team
Deprecated since version 0.7.5: This will be removed in 0.8.0. Use the
download_image_from_annotation
instead.
- darwin.dataset.download_manager.download_manifest_txts(urls: List[str], api_key: str, folder: Path) List[Path] [source]ο
- darwin.dataset.download_manager.get_segment_manifests(slot: Slot, parent_path: Path, api_key: str) List[SegmentManifest] [source]ο
darwin.dataset.identifier moduleο
- class darwin.dataset.identifier.DatasetIdentifier(dataset_slug: str, team_slug: str | None = None, version: str | None = None)[source]ο
Bases:
object
Formal representation of a dataset identifier for the SDK.
A dataset identifier is a string that uniquely identifies a dataset on Darwin. A dataset identifier is made of the following substrings:
<team-slug>/<dataset-slug>:<version>
.If
version
is missing, it defaults tolatest
.- Parameters:
dataset_slug (str) β The slugified name of the dataset.
team_slug (Optional[str], default: None) β The slugified name of the team.
version (Optional[str], default: None) β The version of the identifier.
- dataset_slugο
The slugified name of the dataset.
- Type:
str
- team_slugο
The slugified name of the team.
- Type:
Optional[str], default: None
- versionο
The version of the identifier.
- Type:
Optional[str], default: None
- classmethod parse(identifier: str | DatasetIdentifier) DatasetIdentifier [source]ο
Parses the given identifier and returns the corresponding DatasetIdentifier.
- Parameters:
identifier (Union[str, DatasetIdentifier]) β The identifier to be parsed.
- Returns:
The SDK representation of a
DatasetIdentifier
.- Return type:
- Raises:
ValueError β If the
identifier
given is invalid.
darwin.dataset.local_dataset moduleο
- class darwin.dataset.local_dataset.LocalDataset(dataset_path: Path, annotation_type: str, partition: str | None = None, split: str = 'default', split_type: str = 'random', release_name: str | None = None, keep_empty_annotations: bool = False)[source]ο
Bases:
object
Base class representing a V7 Darwin dataset that has been pulled locally already. It can be used with PyTorch dataloaders. See
darwin.torch
module for more specialized dataset classes, extending this one.- Parameters:
dataset_path (Path) β Path to the location of the dataset on the file system.
annotation_type (str) β The type of annotation classes
["tag", "bounding_box", "polygon"]
.partition (Optional[str], default: None) β Selects one of the partitions
["train", "val", "test"]
.split (str, default: "default") β Selects the split that defines the percentages used (use βdefaultβ to select the default split).
split_type (str, default: "random") β Heuristic used to do the split
["random", "stratified"]
.release_name (Optional[str], default: None) β Version of the dataset.
- dataset_pathο
Path to the location of the dataset on the file system.
- Type:
Path
- annotation_typeο
The type of annotation classes
["tag", "bounding_box", "polygon"]
.- Type:
str
- partitionο
Selects one of the partitions
["train", "val", "test"]
.- Type:
Optional[str], default: None
- splitο
Selects the split that defines the percentages used (use βdefaultβ to select the default split).
- Type:
str, default: βdefaultβ
- split_typeο
Heuristic used to do the split
["random", "stratified"]
.- Type:
str, default: βrandomβ
- release_nameο
Version of the dataset.
- Type:
Optional[str], default: None
- Raises:
ValueError β
If
partition
,split_type
orannotation_type
have an invalid value. - If an annotation has no corresponding image - If an image has multiple extensions (meaning it is present in multiple formats) - If no images are found
- get_img_info(index: int) Dict[str, Any] [source]ο
Returns the annotation information for a given image.
- Parameters:
index (int) β The index of the image.
- Returns:
A dictionary with the imageβs class and annotaiton information.
- Return type:
Dict[str, Any]
- Raises:
ValueError β If there are no annotations downloaded in this machine. You can pull them by using the command
darwin dataset pull $DATASET_NAME --only-annotations
in the CLI.
- get_height_and_width(index: int) Tuple[float, float] [source]ο
Returns the width and height of the image with the given index.
- Parameters:
index (int) β The index of the image.
- Returns:
A tuple where the first element is the
height
of the image and the second is thewidth
.- Return type:
Tuple[float, float]
- extend(dataset: LocalDataset, extend_classes: bool = False) LocalDataset [source]ο
Extends the current dataset with another one.
- Parameters:
dataset (Dataset) β Dataset to merge
extend_classes (bool, default: False) β Extend the current set of classes by merging it with the set of classes belonging to the given dataset.
- Returns:
This
LocalDataset
extended with the classes of the give one.- Return type:
- Raises:
ValueError β
If the
annotation_type
of thisLocalDataset
differs from theannotation_type
of the given one. - If the set of classes from thisLocalDataset
differs from the set of classes from the given one ANDextend_classes
isFalse
.
- get_image(index: int) Image [source]ο
Returns the correspoding
PILImage.Image
.- Parameters:
index (int) β The index of the image in this
LocalDataset
.- Returns:
The image.
- Return type:
PILImage.Image
- get_image_path(index: int) Path [source]ο
Returns the path of the image with the given index.
- Parameters:
index (int) β The index of the image in this
LocalDataset
.- Returns:
The
Path
of the image.- Return type:
Path
- parse_json(index: int) Dict[str, Any] [source]ο
Load an annotation and filter out the extra classes according to what is specified in
self.classes
and theannotation_type
.- Parameters:
index (int) β Index of the annotation to read.
- Returns:
A dictionary containing the index and the filtered annotation.
- Return type:
Dict[str, Any]
- measure_mean_std(multi_processed: bool = True) Tuple[ndarray, ndarray] [source]ο
Computes mean and std of trained images, given the train loader.
- Parameters:
multi_processed (bool, default: True) β Uses multiprocessing to download the dataset in parallel.
- Returns:
mean (ndarray[double]) β Mean value (for each channel) of all pixels of the images in the input folder.
std (ndarray[double]) β Standard deviation (for each channel) of all pixels of the images in the input folder.
- darwin.dataset.local_dataset.get_annotation_filepaths(release_path: Path, annotations_dir: Path, annotation_type: str, split: str, partition: str | None = None, split_type: str = 'random') Iterator[str] [source]ο
Returns a list of annotation filepaths for the given release & partition.
- Parameters:
release_path (Path) β The path of the
Release
saved locally.annotations_dir (Path) β The path for a directory where annotations.
annotation_type (str) β The type of the annotations.
split (str) β The split name.
partition (Optional[str], default: None) β
How to partition files. If no partition is specified, then it takes all the json files in the annotations directory. The resulting generator prepends parent directories relative to the main annotation directory.
E.g.:
["annotations/test/1.json", "annotations/2.json", "annotations/test/2/3.json"]
:annotations/test/1
annotations/2
annotations/test/2/3
str (split_type) β The type of split. Can be
"random"
or"stratified"
.default ("random") β The type of split. Can be
"random"
or"stratified"
.
- Returns:
An iterator with the path for the stem files.
- Return type:
Iterator[str]
- Raises:
ValueError β If the provided
split_type
is invalid.FileNotFoundError β If no dataset partitions are found.
darwin.dataset.release moduleο
- class darwin.dataset.release.Release(dataset_slug: str, team_slug: str, version: str, name: str, url: str | None, export_date: datetime, image_count: int | None, class_count: int | None, available: bool, latest: bool, format: str)[source]ο
Bases:
object
Represents a release/export. Releases created this way can only contain items with βcompletedβ status.
- Parameters:
dataset_slug (str) β The slug of the dataset.
team_slug (str) β the slug of the team.
version (str) β The version of the
Release
.name (str) β The name of the
Release
.url (Optional[str]) β The full url used to download the
Release
.export_date (datetime.datetime) β The
datetime
of when this release was created.image_count (Optional[int]) β Number of images in this
Release
.class_count (Optional[int]) β Number of distinct classes in this
Release
.available (bool) β If this
Release
is downloadable or not.latest (bool) β If this
Release
is the latest one or not.format (str) β Format for the file of this
Release
should it be downloaded.
- dataset_slugο
The slug of the dataset.
- Type:
str
- team_slugο
the slug of the team.
- Type:
str
- versionο
The version of the
Release
.- Type:
str
- nameο
The name of the
Release
.- Type:
str
- urlο
The full url used to download the
Release
.- Type:
Optional[str]
- export_dateο
The
datetime
of when this release was created.- Type:
datetime.datetime
- image_countο
Number of images in this
Release
.- Type:
Optional[int]
- class_countο
Number of distinct classes in this
Release
.- Type:
Optional[int]
- availableο
If this
Release
is downloadable or not.- Type:
bool
- latestο
If this
Release
is the latest one or not.- Type:
bool
- formatο
Format for the file of this
Release
should it be downloaded.- Type:
str
- classmethod parse_json(dataset_slug: str, team_slug: str, payload: Dict[str, Any]) Release [source]ο
Given a json, parses it into a
Release
object instance.- Parameters:
dataset_slug (str) β The slug of the dataset this
Release
belongs to.team_slug (str) β The slug of the team this
Release
βs dataset belongs to.payload (Dict[str, Any]) β
A Dictionary with the
Release
information. It must have a minimal format similar to:{ "version": "a_version", "name": "a_name" }
If no
format
key is found inpayload
, the default will bejson
.Optional
payload
has nodownload_url
key, thenurl
,available
,image_count
,class_count
andlatest
will default to eitherNone
orFalse
depending on the type.A more complete format for this parameter would be similar to:
{ "version": "a_version", "name": "a_name", "metadata": { "num_images": 1, "annotation_classes": [] }, "download_url": "http://www.some_url_here.com", "latest": false, "format": "a_format" }
- Returns:
A
Release
created from the given payload.- Return type:
- download_zip(path: Path) Path [source]ο
Downloads the release content into a zip file located by the given path.
- Parameters:
path (Path) β The path where the zip file will be located.
- Returns:
Same
Path
as provided in the parameters.- Return type:
Path
- Raises:
ValueError β If this
Release
object does not have a specified url.
- property identifier: DatasetIdentifierο
The
DatasetIdentifier
for thisRelease
.- Type:
darwin.dataset.remote_dataset moduleο
- class darwin.dataset.remote_dataset.RemoteDataset(*, client: Client, team: str, name: str, slug: str, dataset_id: int, item_count: int = 0, progress: float = 0, version: int = 1, release: str | None = None)[source]ο
Bases:
ABC
Manages the remote and local versions of a dataset hosted on Darwin. It allows several dataset management operations such as syncing between remote and local, pulling a remote dataset, removing the local files, β¦
- Parameters:
client (Client) β Client to use for interaction with the server.
team (str) β Team the dataset belongs to.
name (str) β Name of the datasets as originally displayed on Darwin. It may contain white spaces, capital letters and special characters, e.g. Bird Species!.
slug (str) β This is the dataset name with everything lower-case, removed specials characters and spaces are replaced by dashes, e.g., bird-species. This string is unique within a team.
dataset_id (int) β Unique internal reference from the Darwin backend.
item_count (int, default: 0) β Dataset size (number of items).
progress (float, default: 0) β How much of the dataset has been annotated 0.0 to 1.0 (1.0 == 100%).
- teamο
Team the dataset belongs to.
- Type:
str
- nameο
Name of the datasets as originally displayed on Darwin. It may contain white spaces, capital letters and special characters, e.g. Bird Species!.
- Type:
str
- slugο
This is the dataset name with everything lower-case, removed specials characters and spaces are replaced by dashes, e.g., bird-species. This string is unique within a team.
- Type:
str
- dataset_idο
Unique internal reference from the Darwin backend.
- Type:
int
- item_countο
Dataset size (number of items).
- Type:
int, default: 0
- progressο
How much of the dataset has been annotated 0.0 to 1.0 (1.0 == 100%).
- Type:
float, default: 0
- abstract push(files_to_upload: Sequence[str | Path | LocalFile] | None, *, blocking: bool = True, multi_threaded: bool = True, max_workers: int | None = None, fps: int = 0, as_frames: bool = False, extract_views: bool = False, files_to_exclude: List[str | Path] | None = None, path: str | None = None, preserve_folders: bool = False, progress_callback: Callable[[int, float], None] | None = None, file_upload_callback: Callable[[str, int, int], None] | None = None) UploadHandler [source]ο
- split_video_annotations(release_name: str = 'latest') None [source]ο
Splits the video annotations from this
RemoteDataset
using the given release.- Parameters:
release_name (str, default: "latest") β The name of the release to use.
- pull(*, release: Release | None = None, blocking: bool = True, multi_processed: bool = True, only_annotations: bool = False, force_replace: bool = False, remove_extra: bool = False, subset_filter_annotations_function: Callable | None = None, subset_folder_name: str | None = None, use_folders: bool = False, video_frames: bool = False, force_slots: bool = False, ignore_slots: bool = False) Tuple[Callable[[], Iterator[Any]] | None, int] [source]ο
Downloads a remote dataset (images and annotations) to the datasets directory.
- Parameters:
release (Optional[Release], default: None) β The release to pull.
blocking (bool, default: True) β If False, the dataset is not downloaded and a generator function is returned instead.
multi_processed (bool, default: True) β Uses multiprocessing to download the dataset in parallel. If blocking is False this has no effect.
only_annotations (bool, default: False) β Download only the annotations and no corresponding images.
force_replace (bool, default: False) β Forces the re-download of an existing image.
remove_extra (bool, default: False) β Removes existing images for which there is not corresponding annotation.
subset_filter_annotations_function (Optional[Callable], default: None) β This function receives the directory where the annotations are downloaded and can perform any operation on them i.e. filtering them with custom rules or else. If it needs to receive other parameters is advised to use functools.partial() for it.
subset_folder_name (Optional[str], default: None) β Name of the folder with the subset of the dataset. If not provided a timestamp is used.
use_folders (bool, default: False) β Recreates folders from the dataset.
video_frames (bool, default: False) β Pulls video frames images instead of video files.
force_slots (bool) β Pulls all slots of items into deeper file structure ({prefix}/{item_name}/{slot_name}/{file_name})
- Returns:
generator (function) β Generator for doing the actual downloads. This is None if blocking is
True
.count (int) β The number of files.
- Raises:
UnsupportedExportFormat β If the given
release
has an invalid format.ValueError β If darwin in unable to get
Team
configuration.
- abstract fetch_remote_files(filters: Dict[str, str | List[str]] | None = None, sort: str | ItemSorter | None = None) Iterator[DatasetItem] [source]ο
Fetch and lists all files on the remote dataset.
- Parameters:
filters (Optional[Dict[str, Union[str, List[str]]]], default: None) β The filters to use. Files excluded by the filter wonβt be fetched.
sort (Optional[Union[str, ItemSorter]], default: None) β A sorting direction. It can be a string with the values βascβ, βascendingβ, βdescβ, βdescendingβ or an
ItemSorter
instance.
- Yields:
Iterator[DatasetItem] β An iterator of
DatasetItem
.
- abstract archive(items: Iterator[DatasetItem]) None [source]ο
Archives (soft-deletion) the given
DatasetItem
s belonging to thisRemoteDataset
.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be archived.
- abstract restore_archived(items: Iterator[DatasetItem]) None [source]ο
Restores the archived
DatasetItem
s that belong to thisRemoteDataset
.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be restored.
- abstract move_to_new(items: Iterator[DatasetItem]) None [source]ο
Changes the given
DatasetItem
s status tonew
.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s whose status will change.
- abstract reset(items: Iterator[DatasetItem]) None [source]ο
Resets the given
DatasetItem
s.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be reset.
- abstract complete(items: Iterator[DatasetItem]) None [source]ο
Completes the given
DatasetItem
s.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be completed.
- abstract delete_items(items: Iterator[DatasetItem]) None [source]ο
Deletes the given
DatasetItem
s.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be deleted.
- fetch_annotation_type_id_for_name(name: str) int | None [source]ο
Fetches annotation type id for a annotation type name, such as
bounding_box
.- Parameters:
name (str) β The name of the annotation we want the id for.
- Returns:
The id of the annotation type or
None
if it doesnβt exist.- Return type:
Optional[int]
- create_annotation_class(name: str, type: str, subtypes: List[str] = []) Dict[str, Any] [source]ο
Creates an annotation class for this
RemoteDataset
.- Parameters:
name (str) β The name of the annotation class.
type (str) β The type of the annotation class.
subtypes (List[str], default: []) β Annotation class subtypes.
- Returns:
Dictionary with the server response.
- Return type:
Dict[str, Any]
- Raises:
ValueError β If a given annotation type or subtype is unknown.
- add_annotation_class(annotation_class: AnnotationClass | int) Dict[str, Any] | None [source]ο
Adds an annotation class to this
RemoteDataset
.- Parameters:
annotation_class (Union[AnnotationClass, int]) β The annotation class to add or its id.
- Returns:
Dictionary with the server response or
None
if the annotations class already exists.- Return type:
Optional[Dict[str, Any]]
- Raises:
ValueError β If the given
annotation_class
does not exist in thisRemoteDataset
βs team.
- fetch_remote_classes(team_wide=False) List[Dict[str, Any]] [source]ο
Fetches all the Annotation Classes from this
RemoteDataset
.- Parameters:
team_wide (bool, default: False) β If
True
will return all Annotation Classes that belong to the team. IfFalse
will only return Annotation Classes which have been added to the dataset.- Returns:
List of Annotation Classes (can be empty).
- Return type:
List[Dict[str, Any]]
- fetch_remote_attributes() List[Dict[str, Any]] [source]ο
Fetches all remote attributes on the remote dataset.
- Returns:
A List with the attributes, where each attribute is a dictionary.
- Return type:
List[Dict[str, Any]]
- abstract export(name: str, annotation_class_ids: List[str] | None = None, include_url_token: bool = False, include_authorship: bool = False, version: str | None = None) None [source]ο
Create a new release for this
RemoteDataset
.- Parameters:
name (str) β Name of the release.
annotation_class_ids (Optional[List[str]], default: None) β List of the classes to filter.
include_url_token (bool, default: False) β Should the image url in the export include a token enabling access without team membership or not?
include_authorship (bool, default: False) β If set, include annotator and reviewer metadata for each annotation.
version (Optional[str], default: None, enum: ["1.0", "2.0"]) β When used for V2 dataset, allows to force generation of either Darwin JSON 1.0 (Legacy) or newer 2.0. Omit this option to get your teamβs default.
- abstract get_report(granularity: str = 'day') str [source]ο
Returns a String representation of a CSV report for this
RemoteDataset
.- Parameters:
granularity (str, default: "day") β The granularity of the report, can be βdayβ, βweekβ or βmonthβ.
- Returns:
A CSV report.
- Return type:
str
- abstract get_releases() List[Release] [source]ο
Get a sorted list of releases with the most recent first.
- Returns:
Returns a sorted list of available
Release
s with the most recent first.- Return type:
List[βReleaseβ]
- get_release(name: str = 'latest') Release [source]ο
Get a specific
Release
for thisRemoteDataset
.
- split(val_percentage: float = 0.1, test_percentage: float = 0, split_seed: int = 0, make_default_split: bool = True, release_name: str | None = None) None [source]ο
Creates lists of file names for each split for train, validation, and test. Note: This functions needs a local copy of the dataset.
- Parameters:
val_percentage (float, default: 0.1) β Percentage of images used in the validation set.
test_percentage (float, default: 0) β Percentage of images used in the test set.
split_seed (int, default: 0) β Fix seed for random split creation.
make_default_split (bool, default: True) β Makes this split the default split.
release_name (Optional[str], default: None) β Version of the dataset.
- Raises:
NotFound β If this
RemoteDataset
is not found locally.
- classes(annotation_type: str, release_name: str | None = None) List[str] [source]ο
Returns the list of
class_type
classes.- Parameters:
annotation_type (str) β The type of annotation classes, e.g. βtagβ or βpolygonβ.
release_name (Optional[str], default: None) β Version of the dataset.
- Returns:
classes β List of classes in the dataset of type
class_type
.- Return type:
List[str]
- annotations(partition: str, split: str = 'split', split_type: str = 'stratified', annotation_type: str = 'polygon', release_name: str | None = None, annotation_format: str | None = 'darwin') Iterable[Dict[str, Any]] [source]ο
Returns all the annotations of a given split and partition in a single dictionary.
- Parameters:
partition (str) β Selects one of the partitions [train, val, test].
split (str, default: "split") β Selects the split that defines the percentages used (use βsplitβ to select the default split.
split_type (str, default: "stratified") β Heuristic used to do the split [random, stratified].
annotation_type (str, default: "polygon") β The type of annotation classes [tag, polygon].
release_name (Optional[str], default: None) β Version of the dataset.
annotation_format (Optional[str], default: "darwin") β Re-formatting of the annotation when loaded [coco, darwin].
- Yields:
Dict[str, Any] β Dictionary representing an annotation from this
RemoteDataset
.
- abstract workview_url_for_item(item: DatasetItem) str [source]ο
Returns the darwin URL for the given
DatasetItem
.- Parameters:
item (DatasetItem) β The
DatasetItem
for which we want the url.- Returns:
The url.
- Return type:
str
- abstract post_comment(item: DatasetItem, text: str, x: float, y: float, w: float, h: float) None [source]ο
Adds a comment to an item in this dataset. The comment will be added with a bounding box. Creates the workflow for said item if necessary.
- Parameters:
item (DatasetItem) β The
DatasetItem
which will receive the comment.text (str) β The text of the comment.
x (float) β The x coordinate of the bounding box containing the comment.
y (float) β The y coordinate of the bounding box containing the comment.
w (float) β The width of the bounding box containing the comment.
h (float) β The height of the bounding box containing the comment.
- abstract import_annotation(item_id: str | int, payload: Dict[str, Any]) None [source]ο
Imports the annotation for the item with the given id.
- Parameters:
item_id (ItemId) β Identifier of the Item that we are import the annotation to.
payload (Dict[str, Any]) β A dictionary with the annotation to import. The default format is: {βannotationsβ: serialized_annotations, βoverwriteβ: βfalseβ}
- property remote_path: Pathο
Returns an URL specifying the location of the remote dataset.
- property local_path: Pathο
Returns a Path to the local dataset.
- property local_releases_path: Pathο
Returns a Path to the local dataset releases.
- property local_images_path: Pathο
Returns a local Path to the images folder.
- property identifier: DatasetIdentifierο
The
DatasetIdentifier
of thisRemoteDataset
.
darwin.dataset.remote_dataset_v2 moduleο
- class darwin.dataset.remote_dataset_v2.RemoteDatasetV2(*, client: Client, team: str, name: str, slug: str, dataset_id: int, item_count: int = 0, progress: float = 0)[source]ο
Bases:
RemoteDataset
Manages the remote and local versions of a dataset hosted on Darwin. It allows several dataset management operations such as syncing between remote and local, pulling a remote dataset, removing the local files, β¦
- Parameters:
client (Client) β Client to use for interaction with the server.
team (str) β Team the dataset belongs to.
name (str) β Name of the datasets as originally displayed on Darwin. It may contain white spaces, capital letters and special characters, e.g. Bird Species!.
slug (str) β This is the dataset name with everything lower-case, removed specials characters and spaces are replaced by dashes, e.g., bird-species. This string is unique within a team.
dataset_id (int) β Unique internal reference from the Darwin backend.
item_count (int, default: 0) β Dataset size (number of items).
progress (float, default: 0) β How much of the dataset has been annotated 0.0 to 1.0 (1.0 == 100%).
- teamο
Team the dataset belongs to.
- Type:
str
- nameο
Name of the datasets as originally displayed on Darwin. It may contain white spaces, capital letters and special characters, e.g. Bird Species!.
- Type:
str
- slugο
This is the dataset name with everything lower-case, removed specials characters and spaces are replaced by dashes, e.g., bird-species. This string is unique within a team.
- Type:
str
- dataset_idο
Unique internal reference from the Darwin backend.
- Type:
int
- item_countο
Dataset size (number of items).
- Type:
int, default: 0
- progressο
How much of the dataset has been annotated 0.0 to 1.0 (1.0 == 100%).
- Type:
float, default: 0
- get_releases() List[Release] [source]ο
Get a sorted list of releases with the most recent first.
- Returns:
Returns a sorted list of available
Release
s with the most recent first.- Return type:
List[βReleaseβ]
- push(files_to_upload: Sequence[str | Path | LocalFile] | None, *, blocking: bool = True, multi_threaded: bool = True, max_workers: int | None = None, fps: int = 0, as_frames: bool = False, extract_views: bool = False, files_to_exclude: List[str | Path] | None = None, path: str | None = None, preserve_folders: bool = False, progress_callback: Callable[[int, float], None] | None = None, file_upload_callback: Callable[[str, int, int], None] | None = None) UploadHandler [source]ο
Uploads a local dataset (images ONLY) in the datasets directory.
- Parameters:
files_to_upload (Optional[List[Union[PathLike, LocalFile]]]) β List of files to upload. Those can be folders.
blocking (bool, default: True) β If False, the dataset is not uploaded and a generator function is returned instead.
multi_threaded (bool, default: True) β Uses multiprocessing to upload the dataset in parallel. If blocking is False this has no effect.
max_workers (int, default: None) β Maximum number of workers to use for parallel upload.
fps (int, default: 0) β When the uploading file is a video, specify its framerate.
as_frames (bool, default: False) β When the uploading file is a video, specify whether itβs going to be uploaded as a list of frames.
extract_views (bool, default: False) β When the uploading file is a volume, specify whether itβs going to be split into orthogonal views.
files_to_exclude (Optional[PathLike]], default: None) β Optional list of files to exclude from the file scan. Those can be folders.
path (Optional[str], default: None) β Optional path to store the files in.
preserve_folders (bool, default: False) β Specify whether or not to preserve folder paths when uploading
progress_callback (Optional[ProgressCallback], default: None) β Optional callback, called every time the progress of an uploading files is reported.
file_upload_callback (Optional[FileUploadCallback], default: None) β Optional callback, called every time a file chunk is uploaded.
- Returns:
handler β Class for handling uploads, progress and error messages.
- Return type:
- Raises:
ValueError β
If
files_to_upload
isNone
. - If a path is specified when uploading a LocalFile object. - If there are no files to upload (because path is wrong or the exclude filter excludes everything).
- fetch_remote_files(filters: Dict[str, str | List[str]] | None = None, sort: str | ItemSorter | None = None) Iterator[DatasetItem] [source]ο
Fetch and lists all files on the remote dataset.
- Parameters:
filters (Optional[Dict[str, Union[str, List[str]]]], default: None) β The filters to use. Files excluded by the filter wonβt be fetched.
sort (Optional[Union[str, ItemSorter]], default: None) β A sorting direction. It can be a string with the values βascβ, βascendingβ, βdescβ, βdescendingβ or an
ItemSorter
instance.
- Yields:
Iterator[DatasetItem] β An iterator of
DatasetItem
.
- archive(items: Iterator[DatasetItem]) None [source]ο
Archives (soft-deletion) the given
DatasetItem
s belonging to thisRemoteDataset
.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be archived.
- restore_archived(items: Iterator[DatasetItem]) None [source]ο
Restores the archived
DatasetItem
s that belong to thisRemoteDataset
.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be restored.
- move_to_new(items: Iterator[DatasetItem]) None [source]ο
Changes the given
DatasetItem
s status tonew
.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s whose status will change.
- reset(items: Iterator[DatasetItem]) None [source]ο
Deprecated Resets the given
DatasetItem
s.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be resetted.
- complete(items: Iterator[DatasetItem]) None [source]ο
Completes the given
DatasetItem
s.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be completed.
- delete_items(items: Iterator[DatasetItem]) None [source]ο
Deletes the given
DatasetItem
s.- Parameters:
items (Iterator[DatasetItem]) β The
DatasetItem
s to be deleted.
- export(name: str, annotation_class_ids: List[str] | None = None, include_url_token: bool = False, include_authorship: bool = False, version: str | None = None) None [source]ο
Create a new release for this
RemoteDataset
.- Parameters:
name (str) β Name of the release.
annotation_class_ids (Optional[List[str]], default: None) β List of the classes to filter.
include_url_token (bool, default: False) β Should the image url in the export include a token enabling access without team membership or not?
include_authorship (bool, default: False) β If set, include annotator and reviewer metadata for each annotation.
version (Optional[str], default: None, enum: ["1.0", "2.0"]) β When used for V2 dataset, allows to force generation of either Darwin JSON 1.0 (Legacy) or newer 2.0. Omit this option to get your teamβs default.
- get_report(granularity: str = 'day') str [source]ο
Returns a String representation of a CSV report for this
RemoteDataset
.- Parameters:
granularity (str, default: "day") β The granularity of the report, can be βdayβ, βweekβ or βmonthβ.
- Returns:
A CSV report.
- Return type:
str
- workview_url_for_item(item: DatasetItem) str [source]ο
Returns the darwin URL for the given
DatasetItem
.- Parameters:
item (DatasetItem) β The
DatasetItem
for which we want the url.- Returns:
The url.
- Return type:
str
- post_comment(item: DatasetItem, text: str, x: float, y: float, w: float, h: float, slot_name: str | None = None)[source]ο
Adds a comment to an item in this dataset, Tries to infer slot_name if left out.
- import_annotation(item_id: str | int, payload: Dict[str, Any]) None [source]ο
Imports the annotation for the item with the given id.
- Parameters:
item_id (ItemId) β Identifier of the Item that we are import the annotation to.
payload (Dict[str, Any]) β A dictionary with the annotation to import. The default format is: {βannotationsβ: serialized_annotations, βoverwriteβ: βfalseβ}
- register(object_store: ObjectStore, storage_keys: List[str], fps: str | float | None = None, multi_planar_view: bool = False, preserve_folders: bool = False) Dict[str, List[str]] [source]ο
Register files in the dataset in a single slot.
- Parameters:
object_store (ObjectStore) β Object store to use for the registration.
storage_keys (List[str]) β List of storage keys to register.
fps (Optional[str], default: None) β When the uploading file is a video, specify its framerate.
multi_planar_view (bool, default: False) β Uses multiplanar view when uploading files.
preserve_folders (bool, default: False) β Specify whether or not to preserve folder paths when uploading
- Returns:
A dictionary with the list of registered files.
- Return type:
Dict[str, List[str]]
- Raises:
ValueError β If
storage_keys
is not a list of strings.TypeError β If the file type is not supported.
- register_multi_slotted(object_store: ObjectStore, storage_keys: Dict[str, List[str]], fps: str | float | None = None, multi_planar_view: bool = False, preserve_folders: bool = False) Dict[str, List[str]] [source]ο
Register files in the dataset in multiple slots.
- Parameters:
object_store (ObjectStore) β Object store to use for the registration.
storage_keys (Dict[str, List[str]) β Storage keys to register. The keys are the item names and the values are lists of storage keys.
fps (Optional[str], default: None) β When the uploading file is a video, specify its framerate.
multi_planar_view (bool, default: False) β Uses multiplanar view when uploading files.
preserve_folders (bool, default: False) β Specify whether or not to preserve folder paths when uploading
- Returns:
A dictionary with the list of registered files.
- Return type:
Dict[str, List[str]]
- Raises:
ValueError β If
storage_keys
is not a dictionary with keys as item names and values as lists of storage keys.TypeError β If the file type is not supported.
darwin.dataset.split_manager moduleο
- class darwin.dataset.split_manager.Split(random: Dict[str, Path] | None = None, stratified: Dict[str, Dict[str, Path]] | None = None)[source]ο
Bases:
object
A Split object holds the state of a split as a set of attributes. For each split type (namely, random and stratified), the Split object will keep a record of paths were the splits are going to be stored as files.
If a dataset can be split randomly, then the
random
attribute will be set as a dictionary between a particular partition (e.g.:train
,val
,test
) and thePath
of the file where that partition split file is going to be stored.{ "train": Path("/path/to/split/random_train.txt"), "val": Path("/path/to/split/random_val.txt"), "test": Path("/path/to/split/random_test.txt") }
If a dataset can be split with a stratified strategy based on a given annotation type, then the
stratified
attribute will be set as a dictionary between a particular annotation type and a dictionary between a particular partition (e.g.:train
,val
,test
) and thePath
of the file where that partition split file is going to be stored.{ "polygon": { "train": Path("/path/to/split/stratified_polygon_train.txt"), "val": Path("/path/to/split/stratified_polygon_val.txt"), "test": Path("/path/to/split/stratified_polygon_test.txt") }, "tag": { "train": Path("/path/to/split/stratified_tag_train.txt"), "val": Path("/path/to/split/stratified_tag_val.txt"), "test": Path("/path/to/split/stratified_tag_test.txt") } }
- random: Dict[str, Path] | None = Noneο
Stores the type of split (e.g.
train
,val
,test
) and the file path where the split is stored if the split is of typerandom
.
- stratified: Dict[str, Dict[str, Path]] | None = Noneο
Stores the relation between an annotation type and the partition-filepath key value of the split if its type is
stratified
.
- darwin.dataset.split_manager.split_dataset(dataset_path: str | Path, release_name: str | None = None, val_percentage: float = 0.1, test_percentage: float = 0.2, split_seed: int = 0, make_default_split: bool = True, stratified_types: List[str] = ['bounding_box', 'polygon', 'tag']) Path [source]ο
Given a local a dataset (pulled from Darwin), split it by creating lists of filenames. The partitions to split the dataset into are called train, val and test.
The dataset is always split randomly, and can be additionally split according to the stratified strategy by providing a list of stratified types.
Requires
scikit-learn
to split a dataset.- Parameters:
dataset_path (PathLike) β Local path to the dataset.
release_name (Optional[str], default: None) β Version of the dataset.
val_percentage (float, default: 0.1) β Percentage of images used in the validation set.
test_percentage (float, default: 0.2) β Percentage of images used in the test set.
split_seed (int, default: 0) β Fix seed for random split creation.
make_default_split (bool, default: True) β Makes this split the default split.
stratified_types (List[str], default: ["bounding_box", "polygon", "tag"]) β List of annotation types to split with the stratified strategy.
- Returns:
Keys are the different splits (random, tags, β¦) and values are the relative file names.
- Return type:
Path
- Raises:
ImportError β If
sklearn
is not installed.
darwin.dataset.upload_manager moduleο
- class darwin.dataset.upload_manager.ItemPayload(*, dataset_item_id: int, filename: str, path: str, reason: str | None = None, slots: any | None = None)[source]ο
Bases:
object
Represents an itemβs payload.
- Parameters:
dataset_item_id (int) β The id of the dataset this item belongs to.
filename (str) β The filename of where this
ItemPayload
βs data is.path (str) β The path to
filename
.reason (Optional[str], default: None) β A reason to upload this
ItemPayload
.
- dataset_item_idο
The id of the dataset this item belongs to.
- Type:
int
- filenameο
The filename of where this
ItemPayload
βs data is.- Type:
str
- pathο
The path to
filename
.- Type:
str
- reasonο
A reason to upload this
ItemPayload
.- Type:
Optional[str], default: None
- property full_path: strο
The full
Path
(with filename inclduded) to the file.
- class darwin.dataset.upload_manager.UploadStage(value)[source]ο
Bases:
DocEnum
The different stages of uploading a file.
- REQUEST_SIGNATURE = 0ο
- UPLOAD_TO_S3 = 1ο
- CONFIRM_UPLOAD_COMPLETE = 2ο
- OTHER = 3ο
- exception darwin.dataset.upload_manager.UploadRequestError(file_path: Path, stage: UploadStage, error: Exception | None = None)[source]ο
Bases:
Exception
Error throw when uploading a file fails with an unrecoverable error.
- file_path: Pathο
The
Path
of the file being uploaded.
- stage: UploadStageο
The
UploadStage
when the failure happened.
- error: Exception | None = Noneο
The
Exception
that triggered this unrecoverable error.
- class darwin.dataset.upload_manager.LocalFile(local_path: str | Path, **kwargs)[source]ο
Bases:
object
Represents a file locally stored.
- Parameters:
local_path (PathLike) β The
Path
of the file.kwargs (Any) β Data relative to this file. Can be anything.
- local_pathο
The
Path
of the file.- Type:
PathLike
- dataο
Dictionary with metadata relative to this file. It has the following format:
{ "filename": "a_filename", "path": "a path" }
data["filename"]
will hold the value passed asfilename
fromkwargs
or default toself.local_path.name
data["path"]
will hold the value passed aspath
fromkwargs
or default to"/"
- Type:
Dict[str, str]
- property full_path: strο
The full
Path
(with filename inclduded) to the file.
- class darwin.dataset.upload_manager.FileMonitor(io: BinaryIO, file_size: int, callback: Callable[[FileMonitor], None])[source]ο
Bases:
object
Monitors the progress of a :class:
BufferedReader
.To use this monitor, you construct your :class:
BufferedReader
as you normally would, then construct this object with it as argument.- Parameters:
io (BinaryIO) β IO object used by this class. Depency injection.
file_size (int) β The fie of the file in bytes.
callback (Callable[["FileMonitor"], None]) β Callable function used by this class. Depency injection via constructor.
- ioο
IO object used by this class. Depency injection.
- Type:
BinaryIO
- callbackο
Callable function used by this class. Depency injection.
- Type:
Callable[[βFileMonitorβ], None]
- bytes_readο
Amount of bytes read from the IO.
- Type:
int
- lenο
Total size of the IO.
- Type:
int
- read(size: int = -1) Any [source]ο
Reads given amount of bytes from configured IO and calls the configured callback for each block read. The callback is passed a reference this object that can be used to get current self.bytes_read.
- Parameters:
size (int, default: -1) β The number of bytes to read. Defaults to -1, so all bytes until EOF are read.
- Returns:
data β Data read from the IO.
- Return type:
Any
- class darwin.dataset.upload_manager.UploadHandler(dataset: RemoteDataset, local_files: List[LocalFile])[source]ο
Bases:
ABC
Holds responsibilities for file upload management and failure into
RemoteDataset
s.- Parameters:
dataset (RemoteDataset) β Target
RemoteDataset
where we want to upload our files to.local_files (List[LocalFile]) β List of
LocalFile
s to be uploaded.
- datasetο
Target
RemoteDataset
where we want to upload our files to..- Type:
- errorsο
List of errors that happened during the upload process.
- Type:
List[UploadRequestError]
- blocked_itemsο
List of items that were not able to be uploaded.
- Type:
List[ItemPayload]
- pending_itemsο
List of items waiting to be uploaded.
- Type:
List[ItemPayload]
- static build(dataset: RemoteDataset, local_files: List[LocalFile])[source]ο
- property dataset_identifier: DatasetIdentifierο
The
DatasetIdentifier
of thisUploadHander
'sRemoteDataset
.
- property blocked_count: intο
Number of items that could not be uploaded successfully.
- property error_count: intο
Number of errors that prevented items from being uploaded.
- property pending_count: intο
Number of items waiting to be uploaded.
- property total_count: intο
Total number of blocked and pending items.
- property progressο
Current level of upload progress.
- class darwin.dataset.upload_manager.UploadHandlerV2(dataset: RemoteDataset, local_files: List[LocalFile])[source]ο
Bases:
UploadHandler
darwin.dataset.utils moduleο
- darwin.dataset.utils.get_release_path(dataset_path: Path, release_name: str | None = None) Path [source]ο
Given a dataset path and a release name, returns the path to the release.
- Parameters:
dataset_path (Path) β Path to the location of the dataset on the file system.
release_name (Optional[str], default: None) β Version of the dataset.
- Returns:
Path to the location of the dataset release on the file system.
- Return type:
Path
- Raises:
NotFound β If no dataset is found in the location provided by
dataset_path
.
- darwin.dataset.utils.extract_classes(annotations_path: Path, annotation_type: str | List[str]) Tuple[Dict[str, Set[int]], Dict[int, Set[str]]] [source]ο
Given the GT as json files extracts all classes and maps images index to classes.
- Parameters:
annotations_files (Path) β Path to the json files with the GT information of each image.
annotation_type (Union[str, List[str]]) β Type(s) of annotation to use to extract the GT information.
- Returns:
A Tuple where the first element is a
Dictionary
where keys are the classes found in the GT and values are a list of file numbers which contain it; and the second element isDictionary
where keys are image indices and values are all classes contained in that image.- Return type:
Tuple[Dict[str, Set[int]], Dict[int, Set[str]]]
- darwin.dataset.utils.make_class_lists(release_path: Path) None [source]ο
Support function to extract classes and save the output to file.
- Parameters:
release_path (Path) β Path to the location of the dataset on the file system.
- darwin.dataset.utils.get_classes_from_file(path: Path) List[str] [source]ο
Helper function to read class names from a file.
- darwin.dataset.utils.available_annotation_types(release_path: Path) List[str] [source]ο
Returns a list of available annotation types based on the existing files.
- darwin.dataset.utils.get_classes(dataset_path: str | Path, release_name: str | None = None, annotation_type: str | List[str] = 'polygon', remove_background: bool = True) List[str] [source]ο
Given a dataset and an annotation_type returns the list of classes.
- Parameters:
dataset_path (PathLike) β Path to the location of the dataset on the file system.
release_name (Optional[str], default: None) β Version of the dataset.
annotation_type (str, default: "polygon") β The type of annotation classes [tag, polygon, bounding_box].
remove_background (bool, default: True) β Removes the background class (if exists) from the list of classes.
- Returns:
List of classes in the dataset of type classes_type.
- Return type:
List[str]
- darwin.dataset.utils.exhaust_generator(progress: Generator, count: int, multi_processed: bool, worker_count: int | None = None) Tuple[List[Dict[str, Any]], List[Exception]] [source]ο
Exhausts the generator passed as parameter. Can be done multi processed if desired. Creates and returns a coco record from the given annotation.
Uses
BoxMode.XYXY_ABS
fromdetectron2.structures
if available, defaults tobox_mode = 0
otherwise. :param annotation_path:Path
to the annotation file. :type annotation_path: Path :param annotation_type: Type of the annotation we want to retrieve. :type annotation_type: str = βpolygonβ :param image_path:Path
to the image the annotation refers to. :type image_path: Optional[Path], default: None :param image_id: Id of the image the annotation refers to. :type image_id: Optional[Union[str, int]], default: None :param classes: Classes of the annotation. :type classes: Optional[List[str]], default: None- Returns:
A coco record with the following keys: .. code-block:: python
- {
βheightβ: 100, βwidthβ: 100, βfile_nameβ: βa file nameβ, βimage_idβ: 1, βannotationsβ: [ β¦ ]
}
- Return type:
Dict[str, Any]
- darwin.dataset.utils.get_coco_format_record(annotation_path: Path, annotation_type: str = 'polygon', image_path: Path | None = None, image_id: str | int | None = None, classes: List[str] | None = None) Dict[str, Any] [source]ο
- darwin.dataset.utils.get_annotations(dataset_path: str | Path, partition: str | None = None, split: str | None = 'default', split_type: str | None = None, annotation_type: str = 'polygon', release_name: str | None = None, annotation_format: str | None = 'coco', ignore_inconsistent_examples: bool = False) Iterator[Dict[str, Any]] [source]ο
Returns all the annotations of a given dataset and split in a single dictionary.
- Parameters:
dataset_path (PathLike) β Path to the location of the dataset on the file system.
partition (Optional[str], default: None) β Selects one of the partitions
[train, val, test]
.split (Optional[str], default: "default") β Selects the split that defines the percentages used (use βdefaultβ to select the default split).
split_type (Optional[str], default: None) β Heuristic used to do the split
[random, stratified, None]
.annotation_type (str, default: "polygon") β The type of annotation classes
[tag, bounding_box, polygon]
.release_name (Optional[str], default: None) β Version of the dataset.
annotation_format (Optional[str], default: "coco") β Re-formatting of the annotation when loaded
[coco, darwin]
.ignore_inconsistent_examples (bool, default: False) β Ignore examples for which we have annotations, but either images are missing, or more than one images exist for the same annotation. If set to
True
, then filter those examples out of the dataset. If set toFalse
, then raise an error as soon as such an example is found.
- Returns:
Dictionary containing all the annotations of the dataset.
- Return type:
Iterator[Dict[str, Any]]
- Raises:
ValueError β
If the
partition
given is not valid. - If thesplit_type
given is not valid. - If theannotation_type
given is not valid. - If an annotation has no corresponding image. - If an image is present with multiple extensions.
FileNotFoundError β If no dataset in
dataset_path
is found.
- darwin.dataset.utils.load_pil_image(path: Path, to_rgb: bool | None = True) Image [source]ο
Loads a PIL image and converts it into RGB (optional).
- Parameters:
path (Path) β Path to the image file.
to_rgb (Optional[bool], default: True) β Converts the image to RGB.
- Returns:
The loaded image.
- Return type:
PILImage.Image
- darwin.dataset.utils.convert_to_rgb(pic: Image) Image [source]ο
Converts a PIL image to RGB.
- Parameters:
pic (PILImage.Image) β The image to convert.
- Returns:
Values between 0 and 255.
- Return type:
PIL Image
- Raises:
TypeError β If the image given via
pic
has an unsupported type.
- darwin.dataset.utils.compute_max_density(annotations_dir: Path) int [source]ο
Calculates the maximum density of all of the annotations in the given folder. Density is calculated as the number of polygons / complex_polygons present in an annotation file.
- Parameters:
annotations_dir (Path) β Directory where the annotations are present.
- Returns:
The maximum density.
- Return type:
int
- darwin.dataset.utils.compute_distributions(annotations_dir: Path, split_path: Path, partitions: List[str] = ['train', 'val', 'test'], annotation_types: List[str] = ['polygon']) Dict[str, Dict[str, Counter]] [source]ο
- Builds and returns the following dictionaries:
class_distribution: count of all files where at least one instance of a given class exists for each partition
instance_distribution: count of all instances of a given class exist for each partition
Note that this function can only be used after a dataset has been split with βstratifiedβ strategy.
- Parameters:
annotations_dir (Path) β Directory where the annotations are.
split_path (Path) β Path to the split.
partitions (List[str], default: ["train", "val", "test"]) β Partitions to use.
annotation_types (List[str], default: ["polygon"]) β Annotation types to consider.
- Returns:
class_distribution: count of all files where at least one instance of a given class exists for each partition
instance_distribution: count of all instances of a given class exist for each partition
- Return type:
Dict[str, AnnotationDistribution]
- darwin.dataset.utils.is_relative_to(path: Path, *other) bool [source]ο
Returns
True
if the path is relative to another path orFalse
otherwise. It also returnsFalse
in the event of an exception, makingFalse
the default value.- Parameters:
path (Path) β The path to evaluate.
other (Path) β The other path to compare against.
- Returns:
bool
True
if the path is relative toother
orFalse
otherwise.
- darwin.dataset.utils.sanitize_filename(filename: str) str [source]ο
Sanitizes the given filename, removing/replacing forbiden characters.
- Parameters:
filename (str) β The filename to sanitize.
- Returns:
The sanitized filename.
- Return type:
str
- darwin.dataset.utils.get_external_file_type(storage_key: str) str | None [source]ο
Returns the type of file given a storage key.
- Parameters:
storage_key (str) β The storage key to get the type of file from.
- Returns:
The type of file, or
None
if the file type is not supported.- Return type:
Optional[str]
- darwin.dataset.utils.parse_external_file_path(storage_key: str, preserve_folders: bool) str [source]ο
Returns the Darwin dataset path given a storage key.
- Parameters:
storage_key (str) β The storage key to parse.
preserve_folders (bool) β Whether to preserve folders or place the file in the Dataset root.
- Returns:
The parsed external file path.
- Return type:
str
- darwin.dataset.utils.get_external_file_name(storage_key: str) str [source]ο
Returns the name of the file given a storage key.
- Parameters:
storage_key (str) β The storage key to get the file name from.
- Returns:
The name of the file.
- Return type:
str
- darwin.dataset.utils.chunk_items(items: List[Any], chunk_size: int = 500) Iterator[List[Any]] [source]ο
Splits the list of items into chunks of specified size.
- Parameters:
items (List[Any]) β The list of items to split.
chunk_size (int, default: 500) β The size of each chunk.
- Returns:
An iterator that yields lists of items, each of length
chunk_size
.- Return type:
Iterator[List[Any]]