Botzart
Botzart is a Trossen WidowX 250s robot arm capable of playing sheet music in the form of an encoded MIDI file with dynamic camera tracking of the position of the keyboard, all using a ROS2 based architecture.
Botzart -
A multi-threaded ROS2 architecture is used to allow for continuous control of the robot arm with continuous live perception updates. This means that the robot arm could be directed to play specific notes, while a separate CV node keeps a live update of the keyboard’s pose. This allows for the user to move around the keyboard from its stationary position and the robot arm dynamically adjust to continue playing.
The system consists of 4 nodes: Robot Arm Control, Computer Vision, MIDI Parser, and Motion Planner.
Spring 2025 CS5335 Final Project, Northeastern University
Mary Had A Little Lamb
Robot Arm Control -
Trossen Robotics provides a base ROS2 package for simple interface commands to the robot arm. A basic wrapper was created with these commands to simplify moving the end-effector to above the desired key and another command for striking. This allowed for fast movements and helped reduce radial arm distortion to accurately strike the centers of the keys. Likewise an initial calibration is done using these commands to strike the centers of a key at each octave, and create a relational transform from the robot to the keyboard.
Computer Vision -
Using OpenCV in Python, we are able to scan the Aruco tags placed on either side of the keyboard, as seen in the video above. This coupled with the known calibration parameters, allows for a bounding box to be created around the keys and their respective positions to be segmented. Corresponding poses are then hashed for the Motion Planner to lookup. A periodic update of the keyboard position is taken with a threshold to determine updating the poses, which also informs the Robot to Keyboard transform to update as well.
MIDI Parser -
This node takes a MIDI file of the desired song, parses it for note information and timing, and encodes it into a ROS message. The Motion Planner over a custom pub/sub message can interface to relay when to start and stop the encoded stream.
Motion Planner -
Here the initial calibration is conducted, both the Robot to Keyboard transform and initial CV poses of the keyboard. Once these are complete, a start message is transmitted to the MIDI Parser to begin streaming the song, with tempo and note information that is passed into a key lookup for the keyboard pose of the desired note. This desired note pose is transformed into robot coordinates, and run through a 20 iteration A* Inverse Kinematic solver for ideal robot commands. This process loops through the song finding each note, how long to play it to keep tempo, with dynamic tempo adjustments in the Control node in case the robot arm is behind or ahead, and finds the CV tracked key in real-time to play the desired song.