Too much time is spent on setting up the data
With well-designed data pipelines, rapid iterations of machine learning experiments will result in models with superhuman accuracy.
> pip install hub
Generate datasets using plug-and-play data pipelines
Using the python-native framework to seamlessly build data pipelines for feature extraction, machine learning and deep learning. Automatically ingest, clean and transform your raw data as new data comes in.
Test locally, then scale to the cloud with no code change
Snark enables building streamable data pipelines which work locally, and can be simply scaled to thousand machines on the cloud. No need to configure cloud infrastructure anymore.
Leverage most cost-efficient hardware on the cloud with the support of preemptible/spot instances.
Collaborate with your team
Data versioning and synchronization protocol implemented for you to be accessed across teams. User access management with encryption at rest and in transit. Access your data from anywhere.
Visualize data at any step
View results with our visualization engine deployed on premise or on cloud. Preview slices of data with no load time and keep track of feature engineering pipeline.
Create an array
Create a large array that you can read and write from anywhere. When you write one slice of the array, it automatically syncs to the cloud. You can lazy-load an existing array on-demand or connect to any other storage.
import hub import numpy as np # Create a large array that you can read/write from anywhere. datahub = hub.fs('./data').connect() bigarray = datahub.array('your_array_name', shape=(100000, 512, 512, 3), chunk=(100, 512, 512, 3), dtype='int32' ) # Writing to one slice of the array. Automatically syncs to cloud. image = np.random.random((512, 512, 3)) bigarray[0, :, :, :] = image # Lazy-Load an existing array from cloud on-demand bigarray = datahub.open('your_array_name') bigarray[0, :, :, :].mean()
Connect to the storage service of your choice
Connect your pipelines to any type of structured and unstructured data in the Powerful Cloud-Native Array Data Warehouse.
Google Cloud Storage