V7 Darwin Python SDK ==================== .. image:: https://static.pepy.tech/personalized-badge/darwin-py?period=total&units=international_system&left_color=black&right_color=blue&left_text=Downloads :target: https://pepy.tech/project/darwin-py :alt: Downloads .. image:: https://static.pepy.tech/personalized-badge/darwin-py?period=month&units=international_system&left_color=black&right_color=blue&left_text=This%20month :target: https://pepy.tech/project/darwin-py :alt: Downloads .. image:: https://img.shields.io/github/stars/v7labs/darwin-py?style=social :target: https://github.com/v7labs/darwin-py/stargazers :alt: GitHub Repo stars .. image:: https://img.shields.io/twitter/follow/V7Labs?style=social :target: https://twitter.com/V7Labs :alt: Twitter Follow .. image:: https://api.scorecard.dev/projects/github.com/v7labs/darwin-py/badge :target: https://scorecard.dev/viewer/?uri=github.com/v7labs/darwin-py :alt: OpenSSF Scorecard ⚡️ Official library to annotate, manage datasets, and models on `V7's Darwin Training Data Platform `_. ⚡️ Darwin-py can both be used from the `command line <#usage-as-a-command-line-interface-cli>`_ and as a `python library <#usage-as-a-python-library>`_. .. raw:: html
Main functions are (but not limited to): * Client authentication * Listing local and remote datasets * Create/remove datasets * Upload/download data to/from remote datasets * Direct integration with PyTorch dataloaders * Extracting video artifacts Support tested for python 3.9 - 3.12 🏁 Installation --------------- .. code-block:: pip install darwin-py You can now type ``darwin`` in your terminal and access the command line interface. If you wish to use the PyTorch bindings, then you can use the ``ml`` flag to install all the additional requirements .. code-block:: pip install darwin-py[ml] If you wish to use video frame extraction, then you can use the ``ocv`` flag to install all the additional requirements .. code-block:: pip install darwin-py[ocv] If you wish to use video artifacts extraction, then you need to install `FFmpeg `_ To run test, first install the ``test`` extra package .. code-block:: pip install darwin-py[test] Configuration ^^^^^^^^^^^^^ Retry Configuration ~~~~~~~~~~~~~~~~~~~ The SDK includes a retry mechanism for handling API rate limits (429) and server errors (500, 502, 503, 504). You can configure the retry behavior using the following environment variables: * `DARWIN_RETRY_INITIAL_WAIT`: Initial wait time in seconds between retries (default: 60) * `DARWIN_RETRY_MAX_WAIT`: Maximum wait time in seconds between retries (default: 300) * `DARWIN_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 10) Example configuration: .. code-block:: bash # Configure shorter retry intervals and fewer attempts export DARWIN_RETRY_INITIAL_WAIT=30 export DARWIN_RETRY_MAX_WAIT=120 export DARWIN_RETRY_MAX_ATTEMPTS=5 The retry mechanism will automatically handle: * Rate limiting (HTTP 429) * Server errors (HTTP 500, 502, 503, 504) For each retry attempt, you'll see a message indicating the type of error and the wait time before the next attempt. Development ^^^^^^^^^^^ See our development and QA environment installation recommendations `here `_ ---- Usage as a Command Line Interface (CLI) --------------------------------------- `Here you can find V7 labs doc on the CLI usage `_ Once installed, ``darwin`` is accessible as a command line tool. A useful way to navigate the CLI usage is through the help command ``-h/--help`` which will provide additional information for each command available. Client Authentication ^^^^^^^^^^^^^^^^^^^^^ To perform remote operations on Darwin you first need to authenticate. This requires a `team-specific API-key `_. If you do not already have a Darwin account, you can `contact us `_ and we can set one up for you. To start the authentication process: .. code-block:: $ darwin authenticate API key: Make example-team the default team? [y/N] y Datasets directory [~/.darwin/datasets]: Authentication succeeded. You will be then prompted to enter your API-key, whether you want to set the corresponding team as default and finally the desired location on the local file system for the datasets of that team. This process will create a configuration file at ``~/.darwin/config.yaml``. This file will be updated with future authentications for different teams. Listing local and remote datasets ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Lists a summary of local existing datasets .. code-block:: $ darwin dataset local NAME IMAGES SYNC_DATE SIZE mydataset 112025 yesterday 159.2 GB Lists a summary of remote datasets accessible by the current user. .. code-block:: $ darwin dataset remote NAME IMAGES PROGRESS example-team/mydataset 112025 73.0% Create/remove a dataset ^^^^^^^^^^^^^^^^^^^^^^^ To create an empty dataset remotely: .. code-block:: $ darwin dataset create test Dataset 'test' (example-team/test) has been created. Access at https://darwin.v7labs.com/datasets/579 The dataset will be created in the team you're authenticated for. To delete the project on the server: .. code-block:: $ darwin dataset remove test About to delete example-team/test on darwin. Do you want to continue? [y/N] y Upload/download data to/from a remote dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Uploads data to an existing remote project. It takes the dataset name and a single image (or directory) with images/videos to upload as parameters. The ``-e/--exclude`` argument allows to indicate file extension/s to be ignored from the data_dir. e.g.: ``-e .jpg`` For videos, the frame rate extraction rate can be specified by adding ``--fps `` Supported extensions: * Video files: [\ ``.mp4``\ , ``.bpm``\ , ``.mov`` formats]. * Image files [\ ``.jpg``\ , ``.jpeg``\ , ``.png`` formats]. .. code-block:: $ darwin dataset push test /path/to/folder/with/images 100%|████████████████████████| 2/2 [00:01<00:00, 1.27it/s] Before a dataset can be downloaded, a release needs to be generated: .. code-block:: $ darwin dataset export test 0.1 Dataset test successfully exported to example-team/test:0.1 This version is immutable, if new images / annotations have been added you will have to create a new release to included them. To list all available releases .. code-block:: $ darwin dataset releases test NAME IMAGES CLASSES EXPORT_DATE example-team/test:0.1 4 0 2019-12-07 11:37:35+00:00 And to finally download a release. .. code-block:: $ darwin dataset pull test:0.1 Dataset example-team/test:0.1 downloaded at /directory/choosen/at/authentication/time . ---- Usage as a Python library ------------------------- `Here you can find V7 labs doc on the usage as Python library `_ The framework is designed to be usable as a standalone python library. Usage can be inferred from looking at the operations performed in ``darwin/cli_functions.py``. A minimal example to download a dataset is provided below and a more extensive one can be found in `./darwin_demo.py `_. .. code-block:: python from darwin.client import Client client = Client.local() # use the configuration in ~/.darwin/config.yaml dataset = client.get_remote_dataset("example-team/test") dataset.pull() # downloads annotations and images for the latest exported version Follow `this guide `_ for how to integrate darwin datasets directly in PyTorch.