Homework

Download this File as Rmd

Lecture 3

Factors and the Function tapply for Applying a Function to Groups

A factor variable splits a data set into groups. For instance, in the mtcars data set, the number of cylinders splits up the cars into groups. The function tapply(numericvector, factorvector, function) applies the function to the groups. Here of cours numericvector and factorvector must have the same length.

Let’s find the median fuel economy for the groups of cars with 4, 6, and 8 cylinders.

tapply(mtcars$mpg, as.factor(mtcars$cyl), median)
##    4    6    8 
## 26.0 19.7 15.2

Find the median fuel economy for automatic and manual transmission.

Rule of Thumb

Never copy and paste more than twice! If you need to copy and paste more than twice, then use a loop or a function instead. The main goal of today is to understand loops and functions in order to save work. Control statements help us do this.

Control Statements

if then Statements

The code for an if then statement is

if (condition) {      # condition evaluates to TRUE or FALSE
     do something
}

Notice the required parenthesis around (condition), and the required braces {} around do something, and the absence of the word then.

Example:

x=7
if (x>5) {
     print("x is greater than 5")
}
## [1] "x is greater than 5"

What happens in the above if \(x =5\) at the outset? Try it! Type out the above code below.

x=5

Above we said braces {} are needed around the action. Actually, the braces {} are not needed when the action is on the same line as the if statement. The following is the same as the first example.

x=7
if (x>5) print("x is greater than 5")
## [1] "x is greater than 5"

For testing your code you can directly put TRUE in the condition so the action is always done.

if (TRUE) {
     print("x is greater than 5")
}
## [1] "x is greater than 5"

Write an if then statement that prints “the vector x has an even number of entries” if the vector \(x\) has an even number of entries, and try it out on \(x=c(1,2,3,4)\) and \(x=c(1,2,3)\). Recall that the modulo operator (i.e. remainder) in R is %% so that the code 11 %% 2 returns one (experiment in the console).

if else Statements

The code for an if else statement is

if (condition) {      # condition evaluates to TRUE or FALSE
     do something1
} else {
     do something2
}
x=3
if (x>5) {
     print("x is greater than 5")
} else {
     print("x is not greater than 5")
}
## [1] "x is not greater than 5"

Write an if else statement that prints “the vector x has an even number of entries” if the vector \(x\) has an even number of entries, and that otherwise prints “the vector x does not have an even number of entries”.

if else if Statements

We can put two conditions together in an if else if statement to have three different actions. The code for an if else if statement is

if (condition1) {      # condition1 evaluates to TRUE or FALSE
     do something1
} else if (condition2) { # condition2 evaluates to TRUE or FALSE
     do something2                    
} else {
     do something3
}

For instance now we can properly deal with the 3 cases of >, < and =.

x=3
if (x>5) {
     print("x is greater than 5")
} else if (x==5) {
     print("x is equal to 5")
} else {
     print("x is less than 5")
}
## [1] "x is less than 5"

We wouldn’t want to do more than two conditions using this command. Instead, we would use switch. See the example in the book Wickham and Grolemund linked above.

for Loops

R is designed to avoid expliciat loops for speed: vectorization and the apply family allow us to avoid loops.

But nevertheless, sometimes we want to repeat a procedure many times using a different index each time, and it can’t be handled by vectorization or the apply family. For repeating a procedure over a fixed set of indices, we use a for loop. The code for a for loop is

for (n in somevector) {
     do something involving n
}

Let’s print 1 to 5.

for (i in 1:5) {
     print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Let’s print the sequence of words in a vectors.

for (n in c("I","want","Ferraris","in","the","garage",".")){
     print(n)
}
## [1] "I"
## [1] "want"
## [1] "Ferraris"
## [1] "in"
## [1] "the"
## [1] "garage"
## [1] "."

Write a for loop to make scatter plots of mpg against all variables in the mtcars data set (recall the names function). Don’t worry about the axis labels for now.

As we learned earlier, we shouldn’t make a scatterplot of a continuous variable against a categorical variable, as we just did in the previous exercise. So, improve your code in the previous exercise to plot mpg only against the other continuous variables. For this you will need to use the ? command to inspect all variables and decide which are continuous.

The Aggregation Pattern using for Loops

Write a for loop that sums 1 to 100 using the aggregation pattern in a for loop.

Write a for loop that finds the product of 1 to 5 using the aggregation pattern in a for loop.

Write a for loop that makes the vector c(1,2,3,4,5) using the aggregation pattern with appending vector entries.

while Loops

Use these when you don’t know how many iterations you will have.

while (condition) {
    do something 
}

Functions

User defined functions are incredibly useful. Remember the rule of thumb: never copy and paste more than twice! If you need to copy and paste more than twice, then use a loop or a function instead.

The code for a function of one variable is:

functionname = function(x) {
     somecomputation of a valueORobject
     return(valueORobject)
}

Here is a function that squares the input.

square = function(x){
     return(x^2)  
}
square(3)
## [1] 9

Recall: from a sample, the 95% confidence interval for a population mean has left endpoint \[\overline{x}-1.96*\sigma/\sqrt{n}\] and has right endpoint \[\overline{x}+1.96*\sigma/\sqrt{n}.\] Write functions that compute the left and right endpoints.

leftendpoint = function(xbar,sigma,n){
     
}
#rightendpoint = 

These would be useful in a data frame for instance.

Reminders about Statistical Inference and Histograms

See the other files.