Sonar data — Mines vs Rocks

Train a model to identify if the sonar wave bounced off a rock or mine in the ocean.

Sonar (sound navigation and ranging) is a technique based on the principle of reflection of ultrasonic sound waves. These waves propagate through water and reflect on hitting the ocean bed or any object obstructing its path.

Sonar has been widely used in submarine navigation, communication with or detection of objects on or under the water surface (like other vessels), hazard identification, etc.

There are two types of sonar technology used — passive (listening to the sound emitted by vessels in the ocean) and active (emitting pulses and listening for their echoes).

It is important to note that research shows the use of active sonar can cause mass strandings of marine animals.

Implementation of the idea on cAInvas — here!

The dataset

This dataset was used in Gorman, R. P., and Sejnowski, T. J. (1988). “Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets” in Neural Networks, Vol. 1, pp. 75–89.

The CSV files contain data regarding sonar signals bounced off a metal cylinder (mines — M) and a roughly cylindrical rock (rock — R) at various angles and under various conditions.

There are 60 attributes and one categorical column in the dataset.

Looking into the spread of categorical values in the dataset.

Spread of values

It is a fairly balanced dataset.

Preprocessing

Categorical features

The category column has R and M to denote the classes. We have to convert them into numeric values.

Snapshot of the dataset after changing column values

Now that we have re-labeled the classes, we will define class names accordingly for later use.

Balancing the dataset

Even though there is only a difference of only 14 samples, in comparison to the total number of data samples available, this difference is significant and needs to be balanced.

In order to balance the dataset, there are two options,

upsampling — resample the values to make their count equal to the class label with the higher count (here, 111).
downsampling — pick n samples from each class label where n = number of samples in class with least count (here, 97)

Here, we will be upsampling. First, we divide the whole dataset into 2, one for each label. The sample() function of the data frame is used to resample and obtain 9200 samples. The append() function of the data frame is used to combine the rows in both the datasets.

	# separating into 2 dataframes, one for each class
	df0 = df[df[60] == 0]
	df1 = df[df[60] == 1]
	print("Number of samples in:")
	print("Class label 0 - ", len(df0))
	print("Class label 1 - ", len(df1))

	# Upsampling
	# replace = True enables resampling
	df0 = df0.sample(len(df1), replace = True)
	print('\nAfter resampling - ')
	print("Number of samples in:")
	print("Class label 0 - ", len(df0))
	print("Class label 1 - ", len(df1))

	# concatente to form a single dataframe
	df = df1.append(df0)

view raw sonar_dataset_balancing.py hosted with ❤ by GitHub

Defining the input and output columns

We define the columns of the data frame to be used as input and output for the model.

	# defining the input and output columns to
	# separate the dataset in the later cells.

	input_columns = list(df.columns[:-1])
	output_columns = [df.columns[-1]]

view raw sonar_dataset_input_output.py hosted with ❤ by GitHub

There are 60 input columns and 1 output column.

Train-val split

Splitting the dataset into training and validation sets using a 90–10 split ratio. The datasets are then split into respective X and y arrays for further processing.

	# Splitting into train and val set -- 90-10 split
	train_df, val_df = train_test_split(df, test_size = 0.1, random_state = 2)

	# Splitting into X (input) and y (output)
	Xtrain, ytrain = np.array(train_df[input_columns]), np.array(train_df[output_columns])
	Xval, yval = np.array(val_df[input_columns]), np.array(val_df[output_columns])

view raw sonar_dataset_train_test.py hosted with ❤ by GitHub

The training set has 199 samples and the validation set has 23 samples.

Here is a peek into the distribution of samples in the training and validation sets.

Seems balanced!

Standardization

The range of values for the attributes are almost of the same range, but the little difference has caused a shift of the means.

Snapshot of df.describe() function output

Using the StandardScaler() function of the sklearn.preprocessing module to scale the values to have a mean = 0 and variance = 1.

The StandardScaler instance is fit on the training input data and used to transform the train, validation, and test sets.

	# Using standard scaler to standardize them
	# to values with mean = 0 and variance = 1.
	standard_scaler = StandardScaler()

	# Fit on training set alone
	Xtrain = standard_scaler.fit_transform(Xtrain)

	# Use it to transform val and test input
	Xval = standard_scaler.transform(Xval)
	#Xtest = standard_scaler.transform(Xtest)

view raw sonar_dataset_standardization.py hosted with ❤ by GitHub

The model

The model is a simple one with 4 Dense layers where the 3 initial layers use the ReLU activation function and the last one uses the Sigmoid activation function.

The model is compiled using the Binary cross-entropy loss function because the final layer of the model performs a two-class classification using the sigmoid activation function. The Adam optimizer is used and the accuracy of the model is tracked over epochs.

	model = Sequential([
	Dense(512, activation = 'relu',
	input_shape = Xtrain[0].shape),
	Dense(256, activation = 'relu'),
	Dense(128, activation = 'relu'),
	Dense(1, activation = 'sigmoid')
	])

	cb = [EarlyStopping(monitor = 'val_loss',
	patience = 3,
	restore_best_weights = True)]

	model.compile(optimizer=Adam(0.01),
	loss='binary_crossentropy',
	metrics=['accuracy'])

	history1 = model.fit(Xtrain, ytrain,
	validation_data = (Xval, yval),
	epochs=16, callbacks = cb)

	model.compile(optimizer=Adam(0.001),
	loss='binary_crossentropy',
	metrics=['accuracy'])

	history2 = model.fit(Xtrain, ytrain,
	validation_data = (Xval, yval),
	epochs=16, callbacks = cb)

view raw sonar_model.py hosted with ❤ by GitHub

The EarlyStopping callback function of the keras.callbacks module monitors the validation loss and stops the training if it doesn’t for 3 epochs continuously. The restore_best_weights parameter ensures that the model with the least validation loss is restored to the model variable.

The model is trained first with a learning rate of 0.01 which is then reduced to 0.001.

The model achieved around 91% accuracy on the validation set.

The metrics

Predictions

Let’s perform predictions on random test data samples —

	# pick random test data sample from one batch
	x = random.randint(0, len(Xtest) - 1)
	output_true = np.array(ytest)[x][0]
	print("True: ", class_names[output_true])
	output = model.predict(Xtest[x].reshape(1, -1))[0][0]
	pred = int(output>0.5) # finding max
	# Picking the label from class_names base don the model output
	print("Predicted: ", class_names[pred], "(",output, "-->", pred, ")")

view raw sonar_prediction.py hosted with ❤ by GitHub

Prediction on random sample

deepC

deepC library, compiler, and inference framework are designed to enable and perform deep learning neural networks by focussing on features of small form-factor devices like micro-controllers, eFPGAs, CPUs, and other embedded devices like raspberry-pi, odroid, Arduino, SparkFun Edge, RISC-V, mobile phones, x86 and arm laptops among others.

Compiling the model using deepC —

Head over to the cAInvas platform (link to notebook given earlier) and check out the predictions by the .exe file!

Credits: Ayisha D

Also Read: Mineral Classification

Sonar data — Mines vs Rocks — on cAInvas