Chapter 13 Simple denoising examples
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
In this exercise:
- First, we’ll generate a time series of “measurements” where the process we are measuring behaves exactly like a sine wave. This will be our “ground truth” signal.
- Next, we’ll simulate different types of noise that could result from imperfect measurements.
- Finally, we’ll apply 2 basic denoising techniques to each of our noisy signals.
13.1 Generating the ground truth
For this demonstration, we’ll generate a signal that behaves like a sine wave. Feel free to adjust any of the parameters to see how it influences our ability to denoise the signal.
# Resolution gives the number of measurements in our signal.
<- 1000
resolution # x_range_min gives the minimum x value in our signal.
<- 0
x_range_min # x_range_max gives the maximum x value in our signal.
<- (8 * pi)
x_range_max # Generate our signal x,y:
<- (seq(from=x_range_min, to=resolution) / resolution) * x_range_max
x <- sin(x)
y # Create a data frame
<- data.frame(x=x, y_true=y) data
data$y_true
gives the ground truth signal.
We can graph our ground truth:
ggplot(data, aes(x = x, y = y_true)) +
geom_point() +
labs(
x = "X",
y = "True value",
title = "Ground truth"
)
13.2 Adding some noise
First, let’s simulate that we had a noisy sensor such that each measurement has a small amount of random noise. To do so, we’ll add a little bit of Gaussian noise to our ground truth signal.
# noise_sd controls the level of noise applied to each measurement
<- 0.1
noise_sd $y_noise <- data$y + rnorm(length(data$y), 0, 0.1) data
We can graph this noisy signal:
ggplot(data, aes(x = x, y = y_noise)) +
geom_point() +
labs(
x = "X",
y = "Noisy measurement value"
)
Next, let’s add another kind of noise that we might encounter with our sensor. What if our sensor randomly fails, returning a nonsense value (in this case, let’s say 99).
<- 0.02 # Sensor fails 2% of the time
prob_sensor_failure <- 99
failure_value # "Simulate" when our sensor fails. Use rbinom to flip a biased coin for each object in our data.
$sensor_failure <- rbinom(nrow(data), 1, prob_sensor_failure)
data$y_noise_faulty <- mapply(
datafunction(y, fail) {
if (fail) {
return(failure_value)
else {
} return (y)
}
},$y_noise,
data$sensor_failure
data )
Graph our faulty sensor data.
ggplot(data, aes(x = x, y = y_noise_faulty)) +
geom_point() +
labs(
x = "X",
y = "Faulty measurement value"
)
We have our nice signal, and then we have those sensor failure measurements throwing everything off.
13.3 Denoising using a rolling average
Let’s try denoising using a rolling average. This technique loops over each value in your sequence, setting that value to the average of its neighbors. The number of neighbors is determined by the “window size”.
# This function calculates a rolling average over the sequence parameter.
# window_width defines how far the window extends to either side of the each index.
<- function(sequence, window_width=1) {
rolling_avg <- c()
avgs for (i in seq_along(sequence)) {
<- c(i)
window for (w_i in 1:window_width) {
<- i - w_i
left <- i + w_i
right
# Check that our window doesn't walk off the edge of our sequence.
# If so, extend the other side of the window.
if (left < 1) {
<- (i + window_width) + (abs(1 - left))
left
}if (right > length(sequence)) {
<- (i - window_width) - ((right - length(sequence)) )
right
}<- c(window, left)
window <- c(window, right)
window
}<- c(avgs, mean( sequence[window] ))
avgs
}return(avgs)
}
Apply our rolling average function to y_noise
<- 10
window_w $y_noise_denoised <- rolling_avg(
data$y_noise,
datawindow_width=window_w
)ggplot(data, aes(x=x, y=y_noise_denoised)) +
geom_point()
Much smoother! The rolling average works well when you want to smooth out a small magnitude of random noise.
Apply our rolling average function to y_noise_faulty
:
<- 10
window_w $y_noise_faulty_denoised <- rolling_avg(
data$y_noise_faulty,
datawindow_width=window_w
)ggplot(data, aes(x=x, y=y_noise_faulty_denoised)) +
geom_point()
Yikes! Our signal looks worse than it did before we applied the rolling average denoising.
13.3.1 Exercises
- Generate a small sequence of 10 random values.
On paper, apply a rolling average filter to your sequence.
You decide the window size.
- What would happen if you increased your window size?
- What about if you decreased your window size?
- Is the rolling average a good denoising approach for
y_noise
? What abouty_noise_faulty
? - Did the rolling average improve how representative
y_noise_faulty
is of the ground truth signal?- Try calculating the mean squared error for each of
y_noise
andy_noise_denoised
. Does error go down? - What about for
y_noise_faulty
andy_noise_faulty_denoised
?
- Try calculating the mean squared error for each of
- What happens when you make the rolling average window smaller? Bigger?
- What happens when you add more noise to the signal?
- Identify any lines of code that you don’t understand. Use the documentation to figure out what those lines of code are doing.
13.4 Denoising using a median filter
A median filter is similar to applying a rolling average, except instead of replacing each value in our sequence with the average of its neighbors, we replace it with the median of its neighbors.
# This function calculates a rolling average over the sequence parameter.
# window_width defines how far the window extends to either side of the each index.
<- function(sequence, window_width=1) {
median_filter <- c()
avgs for (i in seq_along(sequence)) {
<- c(i)
window for (w_i in 1:window_width) {
<- i - w_i
left <- i + w_i
right
# Check that our window doesn't walk off the edge of our sequence.
# If so, extend the other side of the window.
if (left < 1) {
<- (i + window_width) + (abs(1 - left))
left
}if (right > length(sequence)) {
<- (i - window_width) - ((right - length(sequence)) )
right
}<- c(window, left)
window <- c(window, right)
window
}<- c(avgs, median( sequence[window] ))
avgs
}return(avgs)
}
Apply our rolling average function to y_noise
:
<- 10
window_w $y_noise_denoised <- median_filter(
data$y_noise,
datawindow_width=window_w
)ggplot(data, aes(x=x, y=y_noise_denoised)) +
geom_point()
Apply our rolling average function to y_noise_faulty
:
<- 10
window_w $y_noise_faulty_denoised <- median_filter(
data$y_noise_faulty,
datawindow_width=window_w
)ggplot(data, aes(x=x, y=y_noise_faulty_denoised)) +
geom_point()
13.4.1 Exercises
- Do you notice any differences in the results of applying a median filter versus applying a rolling average?
- Generate a small sequence of 10 random values.
On paper, apply a median filter to your sequence. You decide the window size.
- What happens if you increase your window size?
- What about if you decrease your window size?
- Is the median filter a good denoising approach for
y_noise
? What abouty_noise_faulty
?- Try calculating the mean squared error for each of
y_noise
andy_noise_denoised
. Does error go down? - What about for
y_noise_faulty
andy_noise_faulty_denoised
?
- Try calculating the mean squared error for each of
- Did the median filter improve how representative
y_noise_faulty
is of the ground truth signal? - What happens when you make the median filter window smaller? Bigger?
- What happens when you add more noise to the signal?
- Identify any lines of code that you don’t understand. Use the documentation to figure out what those lines of code are doing.
13.5 Further exploration
As we talked about in class, the appropriate-ness of different denoising techniques will depend on your data and your particular application domain.
The R community has implemented many denoising techniques.
For example, the signal
package includes many signal denoising functions.