Haoran Qiu
James-QiuHaoran
Systems Research at Microsoft Azure Research | UIUC CS PhD
Languages
Top Repositories
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)
Website for our final year project - FRING: FAST BLOCKCHAIN ON SGX-FACILITATED PEER-TO-PEER NETWORK; Project FRing includes a new peer-to-peer network protocol that improves communication performance among peers and an implementation of fast, consistent blockchain system on top of this P2P network.
Final Year Project @HKU Department of Computer Science | HGFRR includes a new peer-to-peer network protocol that improves communication efficiency and security among peers, and an implementation of a fast, secure blockchain system on top of this P2P network.
This is a image matching system for scalable and efficient matching of images from a large database. The basic idea is to compute perceptural hash value for each image and compare the similarity based on the pHash computed. Searching are scalable with the elasticsearch as the backend database.
This is my (old) personal website using bootstrap. Free to clone and use as your website template :-) New homepage:
This repository contains a simple Hadoop-like (MapReduce) distributed computing platform implemented in Java. It is extended from a course project at UIUC awarded the best Java version implementation and it's open-sourced for reference.
Repositories
52Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)
This repository consists of useful tools or guides for system software development or anything interesting.
Azure HPC/AI VM Images
Final Year Project @HKU Department of Computer Science | HGFRR includes a new peer-to-peer network protocol that improves communication efficiency and security among peers, and an implementation of a fast, secure blockchain system on top of this P2P network.
Website for our final year project - FRING: FAST BLOCKCHAIN ON SGX-FACILITATED PEER-TO-PEER NETWORK; Project FRing includes a new peer-to-peer network protocol that improves communication performance among peers and an implementation of fast, consistent blockchain system on top of this P2P network.
Microsoft Azure Traces
[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents
This is my (old) personal website using bootstrap. Free to clone and use as your website template :-) New homepage:
LLM Serving Performance Evaluation Harness
ChatGPT API
This is a image matching system for scalable and efficient matching of images from a large database. The basic idea is to compute perceptural hash value for each image and compare the similarity based on the pHash computed. Searching are scalable with the elasticsearch as the backend database.
[BigTwo game written in Java] This repository is actually the version 2.0 of BigTwo. You can play the poker game with your friends now!
This is a game project built on java. RMI is used to support the communication between the server and clients. Login, registration and logout function is provided through RMI. JMS is used to support other message delivering and communication between clients and the server. It is asynchronous and safer. JDBC and MySQL is used to build the game database. The game logic is as usual 24-games.
This repository contains a simple Hadoop-like (MapReduce) distributed computing platform implemented in Java. It is extended from a course project at UIUC awarded the best Java version implementation and it's open-sourced for reference.
CoCo: Coordinated Container Scheduling with Last-Level Cache and Memory Bandwidth Partitioning
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
Instruct-tune LLaMA on consumer hardware
Several simple compilers/interpreters implemented in C/C++ or Haskell, e.g. music notation compiler (simple music notation to abc notation), several compilers for toy programming languages, and an interpreter for a JavaScript-like programming language.
An intelligent LLM serving system based on ML-driven scheduling, load-balancing, request migration and preemption, and KV cache management for high throughput, low latency, and fault tolerance.
website for systems seminar at UIUC
Othello game (versus computer AI agent) implemented in Python. Try to see whether you can beat it!
This project is about selfdriving-vehicle, consisting of object detection, geometry (overlap detection) and event decision (driving behavior analysis).
Autoscaling components for Kubernetes
This repository consists of the code and data in the WoSC 2021 paper "Is Function-as-a-Service a Good Fit for Latency-Critical Services?".
Kubernetes community content
https://arxiv.org/pdf/1707.03141.pdf
CS 598 ML4SE
Multi Type Mean Field Reinforcement Learning
No description provided.
A collection of (mostly) technical things every software developer should know