Roadmap
rm-gregg follows an incremental development approach organized into cycles. The project is currently in Cycle 0/1, building the data foundation and first classifier.
Development Cycles
Cycle 0: Data Foundation
Goal: Extract stroke data from reMarkable .rm files and convert it to a normalized, ML-ready format.
| Task | Status |
|---|---|
| Build extraction pipeline using rmscene | Done |
| Normalize coordinates to [0, 1] range | Done |
| Implement stroke segmentation (gap-based and grid-based) | Done |
| Define data schema with Pydantic models | Done |
| Build synthetic stroke generator for Gregg primitives | Done |
| Feature extraction (15 geometric features) | Done |
Cycle 1: Stroke-Level Classifier (Unit 1)
Goal: Classify individual Gregg strokes from Unit 1 with >90% accuracy on held-out test data.
| Task | Status |
|---|---|
| Define Unit 1 stroke vocabulary (~10-15 classes) | Done |
| Create training data (synthetic + handwritten) | In Progress |
| Implement Random Forest / SVM classifiers | Done |
| Implement sequence model (1D CNN or LSTM) | Planned |
| Evaluate with k-fold cross-validation | Planned |
| Implement stroke-to-label prediction API | Planned |
Cycle 2: Word-Level Recognition
Goal: Map a sequence of strokes to an English word from the current unit’s vocabulary (top-3 accuracy >85%).
| Task | Status |
|---|---|
| Word-level segmentation from page data | Planned |
| Sequence classification (CTC or holistic) | Planned |
| Vocabulary-constrained decoder per unit | Planned |
| Integrate Gregg-1916 dataset for transfer learning | Planned |
Cycle 3: Feedback Engine
Goal: Compare user strokes to references and produce actionable feedback.
| Task | Status |
|---|---|
| Define canonical reference strokes | Planned |
| DTW-based stroke alignment and comparison | Done |
| Frechet distance computation | Done |
| Proportional analysis (relative sizing) | Done |
| Natural language feedback generation | In Progress |
| Scoring rubric per unit | Planned |
Cycle 4: App Integration
Goal: End-to-end workflow from reMarkable practice to web-based feedback.
| Task | Status |
|---|---|
| PDF practice sheet template generator | Planned |
| File upload and processing pipeline | Planned |
| Web frontend (rendered strokes + feedback) | Planned |
| Progress tracking per unit | Planned |
| Export model to ONNX for serving | Planned |
Data Strategy
The project bootstraps from zero real data through four phases:
- Synthetic only (current) – Parameterized Gregg primitives with controlled variation
- Self-play – The developer’s own practice data, labeled by intent
- Additional writers – A second writer’s data for generalization
- Community – Open-source contributions from the r/shorthand community
How to Contribute
See the Contributing guide for ways to help move the roadmap forward.