Grammarly is developing a compact, offline model for spelling and grammar correction, addressing challenges such as limited device memory and processing. By creating a 1B-parameter model, they aim to provide real-time, quality suggestions while maintaining the user's voice. Llama was chosen as the base model for its efficiency and capability to handle diverse writing styles. Synthetic training data was created to capture various writing contexts and error types. Optimizations improved processing speed to ~210 tokens/second. Despite successes in accuracy and preserving meaning, the model struggles with proper nouns, article placement, and tense consistency. Future work will refine training data and enhance model performance.
https://www.grammarly.com/blog/engineering/efficient-on-device-writing-assistance/