Summary

Code

Executable on S2

Videos


A Straight Line

A point is a 0-dimensional object. It represents a position in the space.

a point

Given any two different points in the space, we can always draw one and only one line across these two points.

Two points make a line.

For a different pair of points, we can draw a different line.

Two different points make a different line.

As a point is a 0-dimensional object that requires one point to determine its position, a line is a 1-dimensional object that requires 2 points to determine its position. In other words, two points determine a line.

A line is an infinite object in the sense that it extends infinitely to both sides. In practice, however, we can only draw a finite segment of it, such as the segment between the two points.

Although any two points determine a line, the line contains infinitely many points. These infinitely many points relate to each other by a particular relationship that we call linearity or a linear function.

A linear function maps an input, a number, by doing two things. First, it amplifies, scales, or simply makes bigger the number. For example, the following linear function multiplies any input by 2.

fun time2(x : Double) : Double {
    return x * 2.0
}

Calling the function, we have:

time2(1.0)
2.0

More examples.

time2(-10.0)
-20.0
time2(1e3)
2000.0

The second thing that a linear function can do to a number is to add a number, called an offset, to it.

For example, the following linear function adds 1.0 to any input.

fun add1(x : Double) : Double {
    return x + 1.0
}
add1(1.0)
2.0
add1(-2.0)
-1.0

We can combine these operations together in one linear function so that the function scales/multiplies and adds to any input. For example, the following function multiplies the input by 2 and adds 1 to it, combining the two above examples.

fun y1(x : Double) : Double {
    return 2.0 * x + 1.0
}
y1(0.0)
1.0
y1(1.0)
3.0
y1(2.0)
5.0
y1(3.0)
7.0

If we collect all these input-output pairs, call the input \(x\) and the output \(y\), we can collect these points \((x, y)\): \((0, 1), (1, 3), (2, 5), (3, 7)\).

We can put these points in a coordinate system. A coordinate system looks like the following.

a coordinate system

A coordinate system has two axes. The x-axis is the horizontal arrow pointing to the right. The y-axis is the vertical arrow pointing upward. There are evenly spaced markers on each of the axes, e.g., 1, 2, 3, 4, 5. If we think of an axis as a street, then the markers are street numbers. A point has two markers, the x marker that we call the x-coordinate, and the y marker that we call the y-coordinate. They help locate exactly where a point is. For example, the yellow dot in the above picture has the x-coordinate equal to 1 and the y-coordinate equal to 1. The address or the coordinates of the yellow dot is therefore \((1, 1)\).

Using the coordinate system, we can plot the points corresponding to the four input-output pairs of the above function examples, namely \((0, 1), (1, 3), (2, 5), (3, 7)\), in the coordinate system.

four points in a coordinate system

In S2, we can plot these four points programmatically. The following code plots those four points in a coordinate system.

// plot
%use plotly

// from the identity function
Plotly.plot {
    scatter {
        x(0, 1, 2, 3)
        y(1, 3, 5, 7)
    }
}

The output is a straight line, hence linearity.

plot the four points in a coordinate system

We can feed into the function more points to generate more input-output pairs. However, it is quite tedious to do this manually one by one. Instead, we can first create an array (sort of like a list) of inputs. The following code creates an array of input numbers.

val x_values = arrayOf(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0)

Then using the ‘map’ function will apply the function y1 to each of the elements or an iterator in the array.

x_values.map{ y1(it) }

The keyword ‘it’ stands for ‘iterator’. It means each element in the array.

The output is:

[1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0, 17.0, 19.0, 21.0]

It is rather difficult to see what these numbers mean. Our human brain is not designed to work with numbers. Looking at these 20 numbers (10 pairs) already gives me a headache. Reading more numbers, e.g., millions of them, is probably beyond mortals.

The most important thing to do in data analysis is VISUALIZATION. That is, we draw a picture to represent the data. Our brain is not designed to work with numbers but it is very good to work with pictures, graphs and charts. I cannot emphasize enough that the most important step when working with a data set is to somehow draw a picture or pictures of them so that we can “see” the data and perhaps recognize patterns. This process is called visualization.

Plotting data is one simplest tool that we can use to do visualization. We can plot those 10 pairs or points in a coordinate system.

val x_values = arrayOf(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0)
val y_values = x_values.map{ y1(it) }

Plotly.plot {
    scatter {
        x.set(x_values)
        y.set(y_values)
    }
}

The picture that we draw programmatically is:

visualization of 10 points

Although we probably cannot tell what those 10 number pairs mean just by looking at the values, drawing the out enables us to see immediately that all of them lie on a straight line. We can see that they form a straight line. Therefore, they can be modeled as a linear function. We have just completed an exercise of obtaining data, plotting the data (visualization), finding pattern(s) in the data, and modeling the data. These are the critical steps in any (big) data analysis.

On the other hand, with the model, i.e., that linear function that we have constructed, we can generate more points by feeding in the function more inputs. A more mathematical way of saying the same thing is that we can generate more y’s by giving the function more x’s to obtain a set of coordinates \({ (x, y) }\). The following code generates an array of Doubles from 0 to 100, with a step size of 0.5.

val x_values = (0..100 step 1).map { it.toDouble() / 2 }.toDoubleArray()

Here’s a breakdown of the code:

“val”: This keyword is used to declare a read-only variable.

“x_values”: This is the name of the variable we’re declaring.

“(0..100 step 1)”: This creates a range of integers from 0 to 100 with a step size of 1. The .. operator is used to create the range, and “step” is used to specify the step size.

“.map { it.toDouble() / 2 }”: This applies a mapping function to each element of the range. The mapping function takes the integer value of each element, converts it to a Double value using the “toDouble()” function, and then divides it by 2 to obtain the corresponding x-value. The resulting values are stored in a new list.

“.toDoubleArray()”: This converts the resulting list to a “DoubleArray”.

The resulting x_values array contains the Double values for the x-axis of the plot. Overall, this code is a concise way to create an array of Double values for the x-axis of a plot in Kotlin. We can check the values by typing:

x_values
[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0, 12.5, 13.0, 13.5, 14.0, 14.5, 15.0, 15.5, 16.0, 16.5, 17.0, 17.5, 18.0, 18.5, 19.0, 19.5, 20.0, 20.5, 21.0, 21.5, 22.0, 22.5, 23.0, 23.5, 24.0, 24.5, 25.0, 25.5, 26.0, 26.5, 27.0, 27.5, 28.0, 28.5, 29.0, 29.5, 30.0, 30.5, 31.0, 31.5, 32.0, 32.5, 33.0, 33.5, 34.0, 34.5, 35.0, 35.5, 36.0, 36.5, 37.0, 37.5, 38.0, 38.5, 39.0, 39.5, 40.0, 40.5, 41.0, 41.5, 42.0, 42.5, 43.0, 43.5, 44.0, 44.5, 45.0, 45.5, 46.0, 46.5, 47.0, 47.5, 48.0, 48.5, 49.0, 49.5, 50.0]

We can then, as before, apply the function y1 to the array, generate the outputs, and plot the data.

val y_values = x_values.map{ y1(it) }

Plotly.plot {
    scatter {
        x.set(x_values)
        y.set(y_values)
    }
}

The output is:

a plot of 200 points on a straight line

Here is a recap of what we just did as in data analysis and data modeling. We start with 10 data points given to us and do a visualization to plot them in a coordinate system. We recognize that they all lie on a straight line. Therefore, we model these 10 data points using a linear function (or a straight line). With this model, we can generate more artificial data that are not in the original data set given to us. We generate 200 data points and plot them in the above picture. All of them lie on the same straight line as with the original data set.

This process of starting with data/observations, making a hypothesis/model to describe the data, and then using the model to make predictions, and generate new data is the fundamental cycle of data analysis and Artificial Intelligence.