We need data before we can do any data analysis. Depending on what the data source is, there are different ways to read in the data.

Reading Google Sheets

Suppose we have a gsheet in our Google drive, such as this one.

gsheet: global temperature monthy

You can access this gsheet using this URL:

https://docs.google.com/spreadsheets/d/1jRFfcjrk-qRATGwhs9K2x8tSF7O2kHMfK5_kNxrUdyc

Note that the seemingly gibberish string at the end of the URL, namely “1jRFfcjrk-qRATGwhs9K2x8tSF7O2kHMfK5_kNxrUdyc”, is in fact the ID of the gsheet. We will use that later.

First of all, before we can access this gsheet, we need to make it public so that anyone,  such as your code, can access it. We do this by changing the “Share” setting.

change gsheet access

Then we make it public by sharing it with “Anyone with the link”.

gsheet public access

The Kotlin code “DataFrame.readCSV” in the package “krangl” takes as input the URL of the gsheet and create a data frame for it. The following code reads this gsheet.

				
					%use krangl

val gsheetID = "1a7H90lq7OpFjKXlLRHYSHYxgViykyVsXWpw229IVjh8" // the gsheet ID
val df = DataFrame.readCSV("https://docs.google.com/spreadsheets/d/${gsheetID}/gviz/tq?tqx=out:csv")
				
			

The output is:

data frame output

Data frame is not just a storage data structure. It comes with a number of utility functions to process the data.  For example, we can use “filterByRow” to filter (in) those rows that have source = GISTEMP.

				
					val df1 = df.filterByRow{ it["Source"] == "GISTEMP" }
df1

				
			

The output is:

data frame output

We can also sort the data frame. The following code sorts the data by “Date”.

data frame output

Reading a CSV File through GitHub

Enter your data in Microsoft Excel and save the file as a .csv extension. For instance, consider the following file.

In order to upload the CSV file on GitHub, one must have a public repository in their GitHub account. The repository can be a new one or an already created one. Enter the repository, click on Add File, and then Upload Files. Below is the image of an existing repository.

Now, drag and drop the CSV file and click on Commit Changes as shown below. Your file will be uploaded!

Once the file is uploaded, copy the URL displayed on top. Now, open S2/Kotlin IDE and type the following code.

				
					%use s2

val df = DataFrame.readCSV("https://github.com/nmltd/s2-data-sets/blob/main/multiple-regression-dataset1.csv")
println(df)
				
			

On running the above code, you will get your data set uploaded and you are good to go ahead!