Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN - PowerPoint PPT Presentation

Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

Installing P y D u b $ pip install pydub If u sing � les other than .wav , install ffmpeg v ia � mpeg . org SPOKEN LANGUAGE PROCESSING IN PYTHON

P y D u b ' s main class , A u dioSegment # Import PyDub main class from pydub import AudioSegment # Import an audio file wav_file = AudioSegment.from_file(file="wav_file.wav", format="wav") # Format parameter only for readability wav_file = AudioSegment.from_file(file="wav_file.wav") type(wav_file) pydub.audio_segment.AudioSegment SPOKEN LANGUAGE PROCESSING IN PYTHON

Pla y ing an a u dio file # Install simpleaudio for wav playback $pip install simpleaudio # Import play function from pydub.playback import play # Import audio file wav_file = AudioSegment.from_file(file="wav_file.wav") # Play audio file play(wav_file) SPOKEN LANGUAGE PROCESSING IN PYTHON

A u dio parameters # Import audio files wav_file = AudioSegment.from_file(file="wav_file.wav") two_speakers = AudioSegment.from_file(file="two_speakers.wav") # Check number of channels wav_file.channels, two_speakers.channels 1, 2 wav_file.frame_rate 480000 SPOKEN LANGUAGE PROCESSING IN PYTHON

A u dio parameters # Find the number of bytes per sample wav_file.sample_width 2 # Find the max amplitude wav_file.max 8488 SPOKEN LANGUAGE PROCESSING IN PYTHON

A u dio parameters # Duration of audio file in milliseconds len(wav_file) 3284 SPOKEN LANGUAGE PROCESSING IN PYTHON

Changing a u dio parameters # Change ATTRIBUTENAME of AudioSegment to x changeed_audio_segment = audio_segment.set_ATTRIBUTENAME(x) # Change sample width to 1 wav_file_width_1 = wav_file.sample_width(1) wav_file_width_1.sample_width 1 SPOKEN LANGUAGE PROCESSING IN PYTHON

Changing a u dio parameters # Change sample rate wav_file_16k = wav_file.frame_rate(16000) wav_file_16k.frame_rate 16000 # Change number of channels wav_file_1_channel = wav_file.set_channels(1) wav_file_1_channel.channels 1 SPOKEN LANGUAGE PROCESSING IN PYTHON

Let ' s practice ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Manip u lating a u dio files w ith P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

T u rning it do w n to 11 # Import audio file wav_file = AudioSegment.from_file("wav_file.wav") # Minus 60 dB quiet_wav_file = wav_file - 60 # Try to recognize quiet audio recognizer.recognize_google(quiet_wav_file) UnknownValueError: SPOKEN LANGUAGE PROCESSING IN PYTHON

Increasing the v ol u me # Increase the volume by 10 dB louder_wav_file = wav_file + 10 # Try to recognize recognizer.recognize_google(louder_wav_file) this is a wav file SPOKEN LANGUAGE PROCESSING IN PYTHON

This all so u nds the same # Import AudioSegment and normalize from pydub import AudioSegment from pydub.effects import normalize from pydub.playback import play # Import uneven sound audio file loud_quiet = AudioSegment.from_file("loud_quiet.wav") # Normalize the sound levels normalized_loud_quiet = normalize(loud_quiet) # Check the sound play(normalized_loud_quiet) SPOKEN LANGUAGE PROCESSING IN PYTHON

Remi x ing y o u r a u dio files # Import audio with static at start static_at_start = AudioSegment.from_file("static_at_start.wav") # Remove the static via slicing no_static_at_start = static_at_start[5000:] # Check the new sound play(no_static_at_start) SPOKEN LANGUAGE PROCESSING IN PYTHON

Remi x ing y o u r a u dio files # Import two audio files wav_file_1 = AudioSegment.from_file("wav_file_1.wav") wav_file_2 = AudioSegment.from_file("wav_file_2.wav") # Combine the two audio files wav_file_3 = wav_file_1 + wav_file_2 # Check the sound play(wav_file_3) # Combine two wav files and make the combination louder louder_wav_file_3 = wav_file_1 + wav_file_2 + 10 SPOKEN LANGUAGE PROCESSING IN PYTHON

Splitting y o u r a u dio # Import phone call audio phone_call = AudioSegment.from_file("phone_call.wav") # Find number of channels phone_call.channels 2 # Split stereo to mono phone_call_channels = phone_call.split_to_mono() phone_call_channels [<pydub.audio_segment.AudioSegment, <pydub.audio_segment.AudioSegment>] SPOKEN LANGUAGE PROCESSING IN PYTHON

Splitting y o u r a u dio # Find number of channels of first list item phone_call_channels[0].channels 1 # Recognize the first channel recognizer.recognize_google(phone_call_channel_1) the pydub library is really useful SPOKEN LANGUAGE PROCESSING IN PYTHON

Let ' s code ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Con v erting and sa v ing a u dio files w ith P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

E x porting a u dio files from pydub import AudioSegment # Import audio file wav_file = AudioSegment.from_file("wav_file.wav") # Increase by 10 decibels louder_wav_file = wav_file + 10 # Export louder audio file louder_wav_file.export(out_f="louder_wav_file.wav", format="wav") <_io.BufferedRandom name='louder_wav_file.wav'> SPOKEN LANGUAGE PROCESSING IN PYTHON

Reformatting and e x porting m u ltiple a u dio files def make_wav(wrong_folder_path, right_folder_path): # Loop through wrongly formatted files for file in os.scandir(wrong_folder_path): # Only work with files with audio extensions we're fixing if file.path.endswith(".mp3") or file.path.endswith(".flac"): # Create the new .wav filename out_file = right_folder_path + os.path.splitext(os.path.basename(file.path))[0] + ".wav" # Read in the audio file and export it in wav format AudioSegment.from_file(file.path).export(out_file, format="wav") print(f"Creating {out_file}") SPOKEN LANGUAGE PROCESSING IN PYTHON

Reformatting and e x porting m u ltiple a u dio files # Call our new function make_wav("data/wrong_formats/", "data/right_format/") Creating data/right_types/wav_file.wav Creating data/right_types/flac_file.wav Creating data/right_types/mp3_file.wav SPOKEN LANGUAGE PROCESSING IN PYTHON

Manip u lating and e x porting def make_no_static_louder(static_quiet, louder_no_static): # Loop through files with static and quiet (already in wav format) for file in os.scandir(static_quiet_folder_path): # Create new file path out_file = louder_no_static + os.path.splitext(os.path.basename(file.path))[0] + ".wav" # Read the audio file audio_file = AudioSegment.from_file(file.path) # Remove first three seconds and add 10 decibels and export audio_file = (audio_file[3100:] + 10).export(out_file, format="wav") print(f"Creating {out_file}") SPOKEN LANGUAGE PROCESSING IN PYTHON

Manip u lating and e x porting # Remove static and make louder make_no_static_louder("data/static_quiet/", "data/louder_no_static/") Creating data/louder_no_static/speech-recognition-services.wav Creating data/louder_no_static/order-issue.wav Creating data/louder_no_static/help-with-acount.wav SPOKEN LANGUAGE PROCESSING IN PYTHON

Yo u r t u rn ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN - PowerPoint PPT Presentation

Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator Installing P y D u b $ pip install pydub If u sing les other than .wav , install ffmpeg v ia

INTROD TRODUCT CTION TO TO PRI RIOR ORITY TY-BASED ED B BUDGET ET BUDGETI TING F FOR

Introd u ction to a u dio data in P y thon SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Introd u ction IN TE R ME D IATE IN TE R AC TIVE DATA VISU AL IZATION W ITH P L OTLY IN R

Introd u ction VISU AL IZIN G G E OSPATIAL DATA IN P YTH ON Mar y v an Valkenb u rg Data

Introd u ction to signals FIN AN C IAL TR AD IN G IN R Il y a Kipnis Professional Q u antitati

Introd u ction to E x plorator y Data Anal y sis STATISTIC AL TH IN K IN G IN P YTH ON ( PAR T 1

Introd u ction to iterators P YTH ON DATA SC IE N C E TOOL BOX ( PAR T 2 ) H u go Bo w ne -

Introd u ction to EFA FAC TOR AN ALYSIS IN R Jennifer Br u sso w Ps y chometrician Ps y cho +

Introd u ction to the NASA fireball data set BU IL D IN G DASH BOAR D S W ITH SH IN YDASH BOAR

Introd u ction to Tid y Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Introd u ction to relational databases IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON H u

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

Introd u ction to statistical seismolog y C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u

Introd u ction to animation IN TE R ME D IATE IN TE R AC TIVE DATA VISU AL IZATION W ITH P L

Introd u ction to APIs and JSONs IN TE R ME D IATE IMP OR TIN G DATA IN P YTH ON H u go Bo w

04832250 Computer Networks (Honor Track) A Data Communication and Device Networking

Lecture Outline Regeltechniek Previous lecture: Root locus, frequency response derivation.

Audio Engineering in a Nutshell Audio Engineering in a Nutshell - Or: The next two hours of your

Advanced Vitreous State The Physical Properties of Glass Passive Optical Properties of Glass

Multimedia Systems WS 2010/2011 08.11.2010 M. Rahamatullah Khondoker (Room # 36/410 )

Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis Kishore Prahallad Email:

Introduction to Synthetic Aperture Radar Dr. Armin Doerry Detailed contact information at

ECE- 4074 W eb Si te http: //w w w . csc. gatech. edu/~copel and/4074 Pr of . John A

Sambuz

Useful Links

Newsletter

Mail Us