In a pen-based computing system, a microphone on the smart pen device records audio to produce audio data and a gesture capture system on the smart pen device records writing gestures to produce writing gesture data. Both the audio data and the writing gesture data include a time component. The audio data and writing gesture data are combined or synchronized according to their time components to create audio ink data. The audio ink data can be uploaded to a computer system attached to the smart pen device and displayed to a user through a user interface. The user makes a selection in the user interface to play the audio ink data, and the audio ink data is played back by animated the captured writing gestures and playing the recorded audio in synchronization.