61 results for “topic:dask-distributed”
A full pipeline AutoML tool for tabular data
Evaluation Tool for Anomaly Detection Algorithms on Time Series
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
Unified Distributed Execution
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Parallel Lammps Python interface - control a mpi4py parallel LAMMPS instance from a serial python process or a Jupyter notebook - based on executorlib
Perform I/O intensive workloads on high-volume data sparsely located across multiple AWS regions through the use of Dask.
Loop like a pro, make parameter studies fun.
Code for "Training models when data doesn't fit in memory" post
Test LightGBM's Dask integration on different cluster types
Tutorial for scaling-up image analysis with dask
Scalable Cytometry Image Processing (SCIP) is an open-source tool that implements an image processing pipeline on top of Dask, a distributed computing framework written in Python. SCIP performs projection, illumination correction, image segmentation and masking, and feature extraction.
Launch a Dask cluster from a Poetry environment
Tracking urban growth by comparing satellite images over time and visualising changes on a map.
Procurement: Dask Cluster as a Process.
Open Data Profiling, Quality and Analysis on NYC OpenData dataset with semantic profiling using fuzzy ratio, Levenshtein distance and regex
HPC cluster deployment and management for the Hetzner Cloud
Fraud detection ML pipeline and serving POC using Dask and hopeit.engine. Project created with nbdev: https://www.fast.ai/2019/12/02/nbdev/
Python library to query and transform genomic data from indexed files
Scale up concurrent requests to Earth Engine interactive endpoints with Dask
Magic commands to support running MPI python code as well as multi-node Dask workloads on Jupyter notebooks.
Efficiently read climate/meteorology data into Xarray using Dask for parallelization. Transform the data for your modelling needs.
Preserve all necessary runtime data of a Dask client in order to "replay" and analyze the performance and behavior of the client after the fact
📖 Automate e-book conversion into illustrated images with local AI, simplifying the creative process from text extraction to stunning visuals.
A project using the National Library of Medicine's Semantic Medline Database to create a graphical-relational database.
User documentation website for the Sulis tier 2 HPC service. Built using Jekyll.
Testing access performance of Sentinel-1 RTC metadata catalogs
Wukong: a fast and efficient serverless DAG engine.
Python 3 tools for distributed analysis and visualisation of big climate data on HPC systems.
A custom dask remote jobqueue for HTCondor.