fumiama/base16384-sycl
Base16384-SYCL
A high-performance Base16384 encoding library implemented using Intel SYCL for accelerated computation on heterogeneous hardware platforms.
Overview
Note
This library requires Intel oneAPI DPC++/SYCL runtime. Please ensure proper environment setup before building and running the applications.
Base16384-SYCL is an optimized implementation of the Base16384 encoding algorithm that leverages Intel SYCL (oneAPI Data Parallel C++) to achieve superior performance on both CPU and GPU architectures. The library provides efficient encoding and decoding capabilities while maintaining cross-platform compatibility.
Features
- Hardware Acceleration: Utilizes Intel SYCL for parallel processing on CPUs, GPUs, and other accelerators
- Cross-Platform Support: Compatible with Windows and Unix-like systems
- Performance Optimized: Includes vectorization and memory optimization for maximum throughput
- Robust Error Handling: Comprehensive exception handling with detailed error reporting
- Modern C++: Written in C++20 with modern programming practices
Prerequisites
Required Dependencies
- Intel oneAPI Toolkit: DPC++/SYCL compiler and runtime
- CMake: Version 3.4 or higher
Windows-Specific Requirements
- Visual Studio Build Tools or Visual Studio IDE
- Intel DPC++ compiler (icx-cl)
- NMake (included with Visual Studio)
Unix/Linux Requirements
- Intel DPC++ compiler (icpx)
- Standard build tools (make, etc.)
Installation
1. Environment Setup
Tip
For VS Code Users: If you're using Visual Studio Code, the environment variable setup commands will be executed automatically when you open a terminal. If this fails, it may be due to a non-standard installation path. Please modify the paths in .vscode/settings.json accordingly.
Windows:
# Navigate to your Intel oneAPI installation directory
# Typically: C:\Program Files (x86)\Intel\oneAPI\
setvars.batLinux/Unix:
# Navigate to your Intel oneAPI installation directory
# Typically: /opt/intel/oneapi/
source setvars.sh2. Build Process
Clone and navigate to the project:
git clone https://github.com/fumiama/base16384-sycl.git
cd base16384-sycl
mkdir build
cd buildConfigure the build system:
Add
-DBUILD=testto enable testing.
- Windows
cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release .. - Unix-Like
cmake -DCMAKE_BUILD_TYPE=Release ..
Compile the project:
cmake --build .3. Testing
Run the test suite:
ctest4. Performance Analysis with Intel VTune
Intel VTune Profiler is a powerful performance analysis tool that can help you identify bottlenecks and optimize the applications.
Prerequisites
- Intel VTune Profiler (included in Intel oneAPI Base Toolkit)
- Compiled Base16384-SYCL application or tests with debug symbols (use
RelWithDebInfobuild type)
Running VTune Analysis
1. Launch VTune GUI:
vtune-gui2. Create a New Project:
- Click "New Project" in the welcome screen
- Set project name and location
- Configure the target application path
3. Configure Analysis Type:
Choose an analysis type based on your profiling goals:
- Hotspots Analysis: Identify CPU-intensive functions
- GPU Offload Analysis: Analyze GPU kernel performance and host-device data transfer
- Memory Consumption: Track memory usage patterns
- Threading Analysis: Detect threading issues and analyze parallelism
4. Run the Analysis:
- Click the "Start" button to begin profiling
- VTune will execute your application and collect performance data
5. Analyze Results:
Key metrics to examine:
- Kernel Execution Time: Time spent in SYCL kernels
- Memory Transfer Overhead: Host-to-device and device-to-host data transfer time
- CPU Utilization: Host CPU usage during GPU operations
- GPU Utilization: GPU compute unit occupancy
Optimization Tips
Based on VTune analysis, consider these optimization strategies:
- Reduce Host-Device Transfer: Minimize data copying between CPU and GPU
- Increase Kernel Occupancy: Optimize work-group sizes and global range
- Use Shared Memory: Leverage local memory for frequently accessed data
- Batch Operations: Process larger data chunks to amortize kernel launch overhead
Build Configuration
The project supports multiple build configurations:
- Release: Optimized for maximum performance (
-O3,/O2) - Debug: Includes debugging symbols and reduced optimization
- RelWithDebInfo: Release optimization with debug information
- MinSizeRel: Optimized for minimal binary size
Compatibility
- Operating Systems: Windows 10/11, Linux, macOS
- Architectures: x86-64, ARM64 (where Intel oneAPI is supported)
- Hardware: Intel CPUs, Intel GPUs, NVIDIA GPUs (via Level Zero), AMD GPUs (experimental)
Contributing
Contributions are welcome! Please ensure that:
- Code follows the existing style and conventions
- All tests pass (
ctest) - New features include appropriate test coverage
- Documentation is updated for significant changes
License
This project is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for detailed information.
Acknowledgments
- Intel oneAPI team for the SYCL implementation
- Base16384 algorithm developers
- Contributors to the open-source community
