Machine Learning with Tensorflow

Slides

Predictive Keyboard

From terminal

wget https://www.iids.uidaho.edu/exampledata/predict_words.ipynb

Sources of training text

US president speeches

FanFiction

Ficsave

FanFiction Books

Classical Books

Friday Letters

Regression

Download data

From terminal

wget https://www.iids.uidaho.edu/exampledata/ed_regress.tar.bz2
tar -xf ed_regress.tar.bz2

CNN - Birdcall identifier

https://www.xeno-canto.org

Download

Callipela_californica Catharus_guttatus Catherpes_mexicanus Haliaeetus_leucocephalus Pandion_haliaetus Passerina_amoena Piranga_ludoviciana Poecile_gambeli process_bc.sh Sitta_pygmaea Spizella_passerina Thryomanes_bewickii

Data preprocessing

All the bird call recordings are in MP3 format, and we need them to be WAV files, and a consistent length and sample rate. We could do this one at a time with desktop software (bad idea), or we can script it out using ffmpeg. This script goes folder by folder and processes all the MP3 files it finds. First it reads the length, and then splits up the audio file into multiple WAV files of consistent length (if minsec = maxsec ).

#/bin/bash

module load ffmpeg

if [ -z $1 ] ; then 
    mxsec=15;  
  else
    mxsec=$1
fi


if [ -z $2 ] ; then
  outdir="wavs15"
else
  outdir=$2
fi

minsec=15

echo "splitting clips into $mxsec intervals and outputting to $outdir"
valdenom=10
valfile="../$outdir/validation_list.txt"
testfile="../$outdir/testing_list.txt"

curwd=$(pwd)
mkdir -p $outdir

# get list of directories, spaces in directory names will BREAK this
samplenum=1
for bcdir in $(ls -1 ); do
  if [ -f "$bcdir" ] ; then continue ; fi
  if [ "$bcdir" == "$outdir" ] ; then continue ; fi
  bcout=$curwd/$outdir/$bcdir
  echo "Directory $bcdir -> $bcout"
  mkdir -p $bcout
  cd $bcdir
  filenum=1
  # iterate over mp3 files within each directory, ok to have spaces in the file names
  find ./ -iname "*.mp3" | while read bcmp3; do
    echo "$bcmp3"
    mp3dur=$(ffmpeg -i "$bcmp3" 2>&1 | grep -oP "(?<=Duration: )[0-9\:\.]+")
    echo "Duration: $mp3dur"
    # convert the duration in DD:HH:MM.mm format to seconds
    mp3secs=0
    IFS=':' read -ra ADDR <<< "$mp3dur"
    mp3secsf=$(echo "${ADDR[1]}*60+${ADDR[2]}" | bc)
    mp3secs=${mp3secsf%.*}  # convert float to int
    echo "Seconds: $mp3secsf ( $mp3secs ) "
    # don't bother with clips less than the min
    if (( "$mp3secs" < "$minsec" )) ; then continue; fi
    # split up long clips
    if (( "$mp3secs" > "$mxsec" )) ; then
      nclips=$(($mp3secs/$mxsec)) # integer division
      echo "splitting into $nclips clips"
      for clip in $(seq 1 $nclips); do 
        ffmpeg -i "$bcmp3" -ss $(( ($clip-1)*$mxsec)) -to $(($clip*$mxsec)) -ar 16000 -acodec pcm_s16le -ac 1 -loglevel fatal "$bcout/$filenum-$clip.wav" <<< " "
        if (( $samplenum % $valdenom == 0)) ; then
          echo "$bcdir/$filenum-$clip.wav" >> $valfile
        else
          echo "$bcdir/$filenum-$clip.wav" >> $testfile
        fi
        let "samplenum=samplenum+1"    
      done
    else
      ffmpeg -i "$bcmp3" -ar 16000 -acodec pcm_s16le -ac 1 -loglevel fatal "$bcout/$filenum.wav" <<< " "
      if (( $samplenum % $valdenom == 0)) ; then
        echo "$bcdir/$filenum-$clip.wav" >> $valfile
      else
        echo "$bcdir/$filenum-$clip.wav" >> $testfile
      fi
      let "samplenum=samplenum+1"
    fi
    let "filenum=filenum+1"    
  done;
  cd ..
done

What is the ideal length? - something to consider.

From Terminal:

Use Tensorflow example code

wget https://www.iids.uidaho.edu/exampledata/speech_commands.tar.bz2
tar -xf speech_commands.tar.bz2

Set up tensorboard - from terminal run:

jupyter tensorboard enable --user

Restart jupyter server

Train the model

python speech_commands/train.py --data_url=https://www.iids.uidaho.edu/exampledata/wavs12.tar.gz --data_dir=wavs12 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea --train_dir=tflog12 --summaries_dir=tfsum12 --background_frequency=0.0

This takes a while to train - so just download the pretrained (run in terminal):

wget https://www.iids.uidaho.edu/exampledata/tflog12.tar.bz2
wget https://www.iids.uidaho.edu/exampledata/tfsum12.tar.bz2

Generate a ‘frozen’ version of the model

python speech_commands/freeze.py --start_checkpoint=tflog12/conv.ckpt-18000 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea --output_file=tflog12/bc12_recognize.pb

Use it to identify a clip

ffmpeg -i poecile_gambeli_224194.mp3 -ar 16000 -acodec pcm_s16le -ac 1 poecile_gambeli_224194.wav
python speech_commands/label_wav.py --graph=tflog12/bc12_recognize.pb --labels=tflog12/conv_labels.txt --wav=poecile_gambeli_224194.wav

Tensorboard

Train a model with more species

python speech_commands/train.py --data_url=https://www.iids.uidaho.edu/exampledata/wavs12.tar.gz --data_dir=wavs12 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea,haliaeetus_leucocephalus,pandion_haliaetus,passerina_amoena,piranga_ludoviciana,spizella_passerina --train_dir=tfloga12 --summaries_dir=tfsuma12 --background_frequency=0.0

This takes a while to train - so just download the pretrained (run in terminal):

wget https://www.iids.uidaho.edu/exampledata/tfloga12.tar.bz2
wget https://www.iids.uidaho.edu/exampledata/tfsuma12.tar.bz2

Generate a ‘frozen’ version of the model

python speech_commands/freeze.py --start_checkpoint=tfloga12/conv.ckpt-18000 --wanted_words=callipela_californica,catharus_guttatus,catherpes_mexicanus,poecile_gambeli,sitta_pygmaea --output_file=tfloga12/bc12_recognize.pb

Use it to identify a clip

ffmpeg -i poecile_gambeli_224194.mp3 -ar 16000 -acodec pcm_s16le -ac 1 poecile_gambeli_224194.wav
python speech_commands/label_wav.py --graph=tfloga12/bc12_recognize.pb --labels=tfloga12/conv_labels.txt --wav=poecile_gambeli_224194.wav