Enhancements to the pd developer branch initiated by the vibrez project Thomas Grill, Hannes Köcher, Tim Blechmann vibrez.net pd~convention 2007, Montréal 1
Outline vibrez pure data CVS audio and MIDI sub-system DSP performance Loader hooks Background processing 2
vibrez media performance system based on pd kernel commercially funded, partly open source OpenGL GUI, detached from patcher system many further extensions (pd externals) multi-threaded (GUI, Python scripting) targeted for Windows and MacOS 3
4
Does the pd kernel need to be improved? Audio performance (latency) DSP performance (support of current CPU architectures) Low priority processing Better extensibility (scripted externals) Fixing bugs 5
pd source situated on the sourceforge CVS ‣ one main but multiple devel_0_xx branches Additional features, improvements ‣ a lot of work already put into it ‣ many features haven‘t been promoted to the main branch devel_0_39 branch used for production 6
Unified audio subsystem Using portaudio only ‣ Host specifics and bug-fixing is sourced out Callback-based scheduler ‣ No more ringbuffer (temporary storage) ‣ DSP processing done in audio thread ‣ Lower latency (for pro gear) 7
Callback-based scheduler int m_scheduler() int process(void *input,void *output, int frameCount,...) { { sys_lock(); /* how much time do we have? */ for(;;) int timeout = (float)frameCount/sys_dacsr*1e6; { double time,rtime; double time,rtime; time = sys_getrealtime(); time = sys_getrealtime(); if(sys_timedlock(timeout) == ETIMEDOUT) /* allow the audio callback to run */ { /* we're late */ sys_unlock(); sys_lock_timeout_notification(); sys_microsleep(sys_sleepgrain); return 0; sys_lock(); } sys_pollmidiqueue(); for(i = 0; i < frameCount/sys_dacblocksize; ++i) sys_setmiditimediff(0,1e-6*sys_schedadvance); { audio_copy(sys_soundin,input,sys_inchannels); rtime = (sys_getrealtime()-time)*1e6; sched_tick(sys_time + sys_time_per_dsp_tick); audio_copy(output,sys_soundout,sys_outchannels); /* calculate remaining time */ } time = sys_schedadvance/4-rtime; rtime = (sys_getrealtime()-time)*1e6; if(time > 0) run_timed_idle_callbacks(time); /* calculate remaining time */ time = timeout*0.5-rtime; if(sys_pollgui()) continue; if(time > 0) /* do graphics updates */ run_timed_idle_callbacks(time); sched_pollformeters(); } sys_unlock(); pd thread audio thread sys_unlock(); return 0; } } 8
Callback-based scheduler int m_scheduler() int process(void *input,void *output, int frameCount,...) { { sys_lock(); int timeout = (float)frameCount/sys_dacsr*1e6; double time,rtime; for(;;) { time = sys_getrealtime(); double time,rtime; sys_lock(); time = sys_getrealtime(); for(i = 0; i < frameCount/sys_dacblocksize; ++i) { sys_unlock(); audio_copy(sys_soundin,input,sys_inchannels); sys_microsleep(sys_sleepgrain); sched_tick (sys_time+sys_time_per_dsp_tick); sys_lock(); audio_copy(output,sys_soundout,sys_outchannels); } rtime = (sys_getrealtime()-time)*1e6; rtime = (sys_getrealtime()-time)*1e6; /* calculate remaining time */ /* calculate remaining time */ time = sys_schedadvance/4-rtime; time = timeout*0.5-rtime; if(time > 0) if(time > 0) run_timed_idle_callbacks(time); run_timed_idle_callbacks(time); } sys_unlock(); return 0; sys_unlock(); pd thread audio thread } } 9
Latency measurements (analog loop-back) OS / Audio interface DAD latency ASIO: 6.5 ms WinXP / RME Fireface 400 ASIO: 5.0 ms WinXP / RME HDSP Multiface ASIO: 6.5 ms WinXP / M-Audio FW410 CoreAudio: 13.8 ms OS X.4 / RME Fireface 400 no improvement for MME or DirectSound 10
Changes to MIDI subsystem Using portmidi only (analogous to portaudio) MIDI input not working for „exotic“ message types with varying byte counts ‣ system exclusive, system common, real-time messages, needed e.g. for syncing 11
Message-based audio and MIDI configuration Traditionally: command-line or menu-based system configuration Several new messages to pd receiver allowing configuration and querying of audio and MIDI status Message feedback from the pd sender Portaudio only! 12
Audio configuration 13
SIMD processing Single instruction, multiple data Supported by current x86 and PPC CPUs For contiguous, aligned blocks of data ‣ Perfect for DSP vectors ‣ Considerable speed-ups 14
SIMD processing contd. Requires reformulation of DSP algorithms ‣ Sometimes easy, sometimes very hard ‣ „Auto-vectorization“ rarely effective Coded using assembly or „compiler intrinsics“ Needs memory alignment (16 Bytes) 15
SIMD example t_int *plus_perf8(t_int *w) #define INNER 4 { t_float *in1 = (t_float *)(w[1]); t_int *plus_perf_simd (t_int *w) t_float *in2 = (t_float *)(w[2]); { t_float *out = (t_float *)(w[3]); t_float *in1 = (t_float *)(w[1]); int n = (int)(w[4]); t_float *in2 = (t_float *)(w[2]); for (; n; n -= 8, in1 += 8, in2 += 8, out += 8) t_float *out = (t_float *)(w[3]); { int n = w[4]; float f0 = in1[0], f1 = in1[1], f2 = in1[2], f3 = in1[3]; int i,j; float f4 = in1[4], f5 = in1[5], f6 = in1[6], f7 = in1[7]; for(i = 0; i < n; ) for(j = 0; j < INNER; ++j,i += 4) float g0 = in2[0], g1 = in2[1], g2 = in2[2], g3 = in2[3]; _mm_store_ps (out+i, float g4 = in2[4], g5 = in2[5], g6 = in2[6], g7 = in2[7]; _mm_add_ps ( _mm_load_ps (in1+i), out[0] = f0 + g0; out[1] = f1 + g1; _mm_load_ps (in2+i) out[2] = f2 + g2; out[3] = f3 + g3; ) out[4] = f4 + g4; out[5] = f5 + g5; ); out[6] = f6 + g6; out[7] = f7 + g7; return w+5; } } return (w+5); } 16
DSP performance (mix of pd help patches) System / CPU type CPU load 60% → 40% WinXP / Core2Duo 2.4GHz 85% → 63% WinXP / P4 2.53GHz 69% → 56% MBP / Core2Duo 2.16GHz 59% → 38% MiniMac / PPC 1.66GHz only simple DSP functions optimized unfair conditions for SIMD (large DSP graph) 17
Loader hooks Hooks provide plugin functionality for loading strategies ‣ Enables externals written using script languages or virtual machines ‣ Allows new loading strategies Used/extended for the libdir loader of pd-extended 18
Background processing For long-lasting operations Avoiding CPU spikes that cause dropouts vasp arrayL arrayR a) Background thread (running continuously) vasp.cfft @detach 1 b) Idle message processing vasp.polar @detach 1 (called multiple times) vasp.update 19
Multi-threading Communication with pd kernel thread ‣ Message passing (inlet/outlet, send/receive) ‣ Use lock-free FIFO pd API not easily accessible for background thread ⌛ ‣ sys_lock must be used ‣ Symbol lookup ( gensym ) is lock-free ✌ 20
Idle processing idle_hook already existing in main pd (run once per scheduler iteration) We want timed idle callback processing Callback dynamically installable (thread-safe) Callback run in pd kernel thread Can request re-triggering for this or next scheduler iteration (returning 0, 1 or 2) 21
Retriggering work (using idle processing) start start ? pd do_some_work defer pd do_some_work delay 1 ready ready work takes some time, defer calls for work, depending on CPU as long as there is time 22
Conclusion Enhancements add functionality without compromising compatibility Discussion needed Provide patches for the SF patch tracker devel_0_39 branch ☠ Thanks to Tim Blechmann 23
Recommend
More recommend