TO
tomlincr/simSDE
A simulacrum of the NHS Secure Data Environment, to facilitate external code development & testing
simSDE
A simulacrum of the NHS Secure Data Environment, to facilitate education, exploration, external code development and testing.
NB this is unofficial, and not endorsed by NHS England or HDR UK / BHF Data Science Centre.
MVP
- ๐งโ๐ป Environment - Google Colab based for ease of use.
- ๐ฟ Synthetic data
- ๐ฅ Hospital Episode Statistics Admitted Patient Care: See NHS Digital: Artificial data pilot.
Platform options
- Databricks Community Edition โ.
- Doesn't support Git.
- Google Colab. โ
- Supports Spark
- Poor Github integration - can open/save, but can't easily sync.
- Doesn't persist?
- Github codespaces โ
- Flexible but too complex for beginners
- https://aka.ms/configure-codespace.
- https://github.com/education/codespaces-project-template-py.
- Local โ
- Way too complex for beginners!
- Environment: pip, conda, ? poetry.
- Containers: docker, devcontainer.
- https://github.com/jplane/pyspark-devcontainer.
- Should be able to manage exactly the same as codespaces?
Roadmap
- Other HES datasets:
- HES OP
- HES A&E
- HES CC (NB not part of Artificial Data Pilot)
- HES MAT (NB not part of Artificial Data Pilot)
- Synthetic ONS Deaths.
- Synthetic GDPPR.
On this page
Contributors
Created August 3, 2023
Updated June 20, 2024