From terminal
wget https://www.iids.uidaho.edu/exampledata/predict_words.ipynb
Sources of training text
FanFiction
From terminal
wget https://www.iids.uidaho.edu/exampledata/ed_regress.tar.bz2
tar -xf ed_regress.tar.bz2
https://www.xeno-canto.org
Download
Callipela_californica Catharus_guttatus Catherpes_mexicanus Haliaeetus_leucocephalus Pandion_haliaetus Passerina_amoena Piranga_ludoviciana Poecile_gambeli process_bc.sh Sitta_pygmaea Spizella_passerina Thryomanes_bewickii
All the bird call recordings are in MP3 format, and we need them to be WAV files, and a consistent length and sample rate. We could do this one at a time with desktop software (bad idea), or we can script it out using ffmpeg. This script goes folder by folder and processes all the MP3 files it finds. First it reads the length, and then splits up the audio file into multiple WAV files of consistent length (if minsec = maxsec ).
#/bin/bash
module load ffmpeg
if [ -z $1 ] ; then
mxsec=15;
else
mxsec=$1
fi
if [ -z $2 ] ; then
outdir="wavs15"
else
outdir=$2
fi
minsec=15
echo "splitting clips into $mxsec intervals and outputting to $outdir"
valdenom=10
valfile="../$outdir/validation_list.txt"
testfile="../$outdir/testing_list.txt"
curwd=$(pwd)
mkdir -p $outdir
# get list of directories, spaces in directory names will BREAK this
samplenum=1
for bcdir in $(ls -1 ); do
if [ -f "$bcdir" ] ; then continue ; fi
if [ "$bcdir" == "$outdir" ] ; then continue ; fi
bcout=$curwd/$outdir/$bcdir
echo "Directory $bcdir -> $bcout"
mkdir -p $bcout
cd $bcdir
filenum=1
# iterate over mp3 files within each directory, ok to have spaces in the file names
find ./ -iname "*.mp3" | while read bcmp3; do
echo "$bcmp3"
mp3dur=$(ffmpeg -i "$bcmp3" 2>&1 | grep -oP "(?<=Duration: )[0-9\:\.]+")
echo "Duration: $mp3dur"
# convert the duration in DD:HH:MM.mm format to seconds
mp3secs=0
IFS=':' read -ra ADDR <<< "$mp3dur"
mp3secsf=$(echo "${ADDR[1]}*60+${ADDR[2]}" | bc)
mp3secs=${mp3secsf%.*} # convert float to int
echo "Seconds: $mp3secsf ( $mp3secs ) "
# don't bother with clips less than the min
if (( "$mp3secs" < "$minsec" )) ; then continue; fi
# split up long clips
if (( "$mp3secs" > "$mxsec" )) ; then
nclips=$(($mp3secs/$mxsec)) # integer division
echo "splitting into $nclips clips"
for clip in $(seq 1 $nclips); do
ffmpeg -i "$bcmp3" -ss $(( ($clip-1)*$mxsec)) -to $(($clip*$mxsec)) -ar 16000 -acodec pcm_s16le -ac 1 -loglevel fatal "$bcout/$filenum-$clip.wav" <<< " "
if (( $samplenum % $valdenom == 0)) ; then
echo "$bcdir/$filenum-$clip.wav" >> $valfile
else
echo "$bcdir/$filenum-$clip.wav" >> $testfile
fi
let "samplenum=samplenum+1"
done
else
ffmpeg -i "$bcmp3" -ar 16000 -acodec pcm_s16le -ac 1 -loglevel fatal "$bcout/$filenum.wav" <<< " "
if (( $samplenum % $valdenom == 0)) ; then
echo "$bcdir/$filenum-$clip.wav" >> $valfile
else
echo "$bcdir/$filenum-$clip.wav" >> $testfile
fi
let "samplenum=samplenum+1"
fi
let "filenum=filenum+1"
done;
cd ..
done
What is the ideal length? - something to consider.
From Terminal:
Use Tensorflow example code
wget https://www.iids.uidaho.edu/exampledata/speech_commands.tar.bz2
tar -xf speech_commands.tar.bz2
Set up tensorboard - from terminal run:
jupyter tensorboard enable --user
Restart jupyter server
Train the model
python speech_commands/train.py --data_url=https://www.iids.uidaho.edu/exampledata/wavs12.tar.gz --data_dir=wavs12 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea --train_dir=tflog12 --summaries_dir=tfsum12 --background_frequency=0.0
This takes a while to train - so just download the pretrained (run in terminal):
wget https://www.iids.uidaho.edu/exampledata/tflog12.tar.bz2
wget https://www.iids.uidaho.edu/exampledata/tfsum12.tar.bz2
Generate a ‘frozen’ version of the model
python speech_commands/freeze.py --start_checkpoint=tflog12/conv.ckpt-18000 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea --output_file=tflog12/bc12_recognize.pb
Use it to identify a clip
ffmpeg -i poecile_gambeli_224194.mp3 -ar 16000 -acodec pcm_s16le -ac 1 poecile_gambeli_224194.wav
python speech_commands/label_wav.py --graph=tflog12/bc12_recognize.pb --labels=tflog12/conv_labels.txt --wav=poecile_gambeli_224194.wav
Tensorboard
Train a model with more species
python speech_commands/train.py --data_url=https://www.iids.uidaho.edu/exampledata/wavs12.tar.gz --data_dir=wavs12 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea,haliaeetus_leucocephalus,pandion_haliaetus,passerina_amoena,piranga_ludoviciana,spizella_passerina --train_dir=tfloga12 --summaries_dir=tfsuma12 --background_frequency=0.0
This takes a while to train - so just download the pretrained (run in terminal):
wget https://www.iids.uidaho.edu/exampledata/tfloga12.tar.bz2
wget https://www.iids.uidaho.edu/exampledata/tfsuma12.tar.bz2
Generate a ‘frozen’ version of the model
python speech_commands/freeze.py --start_checkpoint=tfloga12/conv.ckpt-18000 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea --output_file=tfloga12/bc12_recognize.pb
Use it to identify a clip
ffmpeg -i poecile_gambeli_224194.mp3 -ar 16000 -acodec pcm_s16le -ac 1 poecile_gambeli_224194.wav
python speech_commands/label_wav.py --graph=tfloga12/bc12_recognize.pb --labels=tfloga12/conv_labels.txt --wav=poecile_gambeli_224194.wav