Welcome to NAS: The Neural Audio Sculptor

Explore Immersive Audio-Visual Experiences

The Neural Audio Sculptor (NAS) is a tool for artistic expression, designed in the context of a master's thesis by Bui Minh Hao Nguyen. NAS allows you to create unique audio-visual experiences supported by a tool that combines deep learning, music information retrieval, and various approaches in human-computer interaction. NAS transforms auditory sources into visually immersive, real-time dynamic imagery, synchronizing music features with visuals of the user’s choosing. This blend of imagery and sound creates a unified experience under the user’s artistic control.

Deep Learning Technologies

NAS utilizes state-of-the-art deep learning models, including StyleGAN and StreamDiffusion, to create real-time visualizations that dynamically respond to audio inputs. Spleeter is employed to separate different sources in music, such as vocals, drums, and bass. Furthermore, by extracting various audio features like timbre, pitch, harmony, and rhythm, NAS generates shifting visuals that align with the sounds. This integration provides a multi-sensory, immersive experience.

Interactive and Intuitive User Interface

The user interface (UI) of NAS is designed to empower users to develop their unique creative expressions, while ensuring easy usability. Artists, performers, sound enthusiasts, or anyone interested in immersive experiences can interactively direct the image generation process, with NAS offering a playground for creative exploration and expression. The UI allows you to:

Map Audio Features: Link specific audio characteristics, music instruments, or sounds to visual elements.
Adjust the Visual Output in Real-Time: Modify visual themes in real-time through text input or choices of image sources during the audio-visual experience.
Incorporate Movements: Use human pose estimation to let physical movements influence visualizations.

Creative Possibilities

NAS serves as a canvas for your imagination. Whether you’re a musician seeking to enhance live performances, an artist exploring new media, or an event organizer aiming to curate original experiences, NAS opens a portfolio of creative possibilities. The seamless integration of sound with visuals, responsive to individual user requests, makes each experience unique. Beyond artistic applications, NAS has been experimented with for immersive environments in design teams, where a VJ curates nature sounds and imagery of serene or polluted nature to foster more sustainable design practices.

Immerse Yourself

Study results showcase NAS's effectiveness in producing synchronized, dynamic visuals that enrich the audio-visual experience. From live performances to interactive installations, NAS has demonstrated its potential to transform the way we perceive and interact with sound and imagery.

For full transparency, you can access the tool on GitHub. Explore impressions here:

Video 1: Several StyleGan Models
Video 2: StreamDiffusion with text prompt "bright sun rays with ocean view"
Video 3: StreamDiffusion with text prompt "magic mushrooms" and manual transformations
Video 4: StreamDiffusion with text prompt "industrial, dark, horror creature" and weird audio sound
Video 5: Pose estimation; nose position mapped to horizontal image rotation

Hao presents a demo of NAS at HPI.