GitHunt
TE

TemryL/UKB-Tools

Tools in Python to quickly start using the UK-BioBank dataset before UKB RAP.

UKB-Tools

Introduction

This repository provides tools in Python to quickly start using the UK-BioBank dataset before UKB RAP. The folder has the following structure:

├── commands/
    ├── create_data.py
    ├── create_eu_set.py
    ├── get_newest_baskets.py
├── ukb_tools/
    ├── preprocess
        ├── filtering.py
        ├── labeling.py
        ├── utils.py
    ├── __init__.py
    ├── data.py
    ├── logger.py
    ├── tools.py

Installation

Clone the repository:

git clone https://github.com/TemryL/UKB-Tools.git

Move to the directory:

cd UKB-Tools

Create a virtual environment with Python 3.11 installed. Then install the dependencies:

pip install -r requirements.txt

Usage

UK-BioBank is organized by projects and baskets. Each project ID can have several basket IDs associated. When somenone requests new fields or a data update under the same project ID, a new basket will be created. Data across projects cannot be merged (because of eids randomization). However, data across baskets of the same project can be merged and it is preferable to get data for a given UKB field from the most recent basket.

Let's say we want to create a dataset with UKB fields 31, 131369, 3066. Then one can store the fields in a text file as follow:

ukb_fields.txt:

31
131369
3066

Run the following command to retrieve, for a given project ID, the most recent basket that contains the given UKB fields:

python commands/get_newest_baskets.py ${/dir/to/ukb_folder} ${project_id} ${data/ukb_fields.txt} ${data/field_to_basket.json}

The results will be stored in a JSON file as follow:

field_to_basket.json:

{
    "31": "project_52887_41230",
    "131369": "project_52887_676883",
    "3066": "project_52887_669338",
}

Finally, to merge the data in a single CSV file, run the following command:

python commands/create_data.py ${/dir/to/ukb_folder} ${data/field_to_basket.json} ${data.csv}

Contribute

Feel free to contribute to this repo by fixing issues, improving performances or adding new features!

Languages

Python100.0%

Contributors

MIT License
Created March 12, 2024
Updated August 9, 2025
TemryL/UKB-Tools | GitHunt