Systems and methods are described to enact machine-based, collective control by two individuals of one or more displayed virtual objects. Collective interactions may be implemented by combining an ability to specify one or more locations on a touch-sensitive display using one or more digits of a first user with an ability to monitor a portable, handheld controller manipulated by the other user. Alternatively or in addition, pointing by the first hand to the one or more locations on a display may be enhanced by a stylus or other pointing device. The handheld controller may be tracked within camera-acquired images by following camera-trackable controller components and/or by acquiring measurements from one or more embedded internal measurement units (IMUs). Optionally, one or more switches or sensors may be included within the handheld controller, operable by one or more digits of the second hand to enable alternative virtual object display and/or menu selections during collective interactions.