Download Coding Homework 2 Rmd File and knit it. Read the knitted file, and do the problems directly in the template. Follow the submission instructions in the file.
In Matloff’s book, read 7.1 - 7.6, 10.1.3, 10.2.1, 10.2.5, 10.2.6, skipping the Extended Examples. Don’t worry about understanding everything, just focus on the essentials.
Read Wickham’s and Grolemund’s Chapter 19 on Functions, which also contains a discussion of conditional execution.
Read Functions, Control Statements, Conditionals by Dr. James Henderson.
tapply
for Applying a Function to GroupsA factor variable splits a data set into groups. For instance, in the mtcars
data set, the number of cylinders splits up the cars into groups. The function tapply(numericvector, factorvector, function)
applies the function
to the groups. Here of cours numericvector
and factorvector
must have the same length.
Let’s find the median fuel economy for the groups of cars with 4, 6, and 8 cylinders.
tapply(mtcars$mpg, as.factor(mtcars$cyl), median)
## 4 6 8
## 26.0 19.7 15.2
Find the median fuel economy for automatic and manual transmission.
Never copy and paste more than twice! If you need to copy and paste more than twice, then use a loop
or a function
instead. The main goal of today is to understand loops and functions in order to save work. Control statements help us do this.
if then
StatementsThe code for an if then
statement is
if (condition) { # condition evaluates to TRUE or FALSE
do something
}
Notice the required parenthesis around (condition)
, and the required braces {}
around do something
, and the absence of the word then
.
Example:
x=7
if (x>5) {
print("x is greater than 5")
}
## [1] "x is greater than 5"
What happens in the above if \(x =5\) at the outset? Try it! Type out the above code below.
x=5
Above we said braces {}
are needed around the action. Actually, the braces {}
are not needed when the action is on the same line as the if
statement. The following is the same as the first example.
x=7
if (x>5) print("x is greater than 5")
## [1] "x is greater than 5"
For testing your code you can directly put TRUE
in the condition so the action is always done.
if (TRUE) {
print("x is greater than 5")
}
## [1] "x is greater than 5"
Write an if then
statement that prints “the vector x has an even number of entries” if the vector \(x\) has an even number of entries, and try it out on \(x=c(1,2,3,4)\) and \(x=c(1,2,3)\). Recall that the modulo operator (i.e. remainder) in R is %%
so that the code 11 %% 2
returns one (experiment in the console).
if else
StatementsThe code for an if else
statement is
if (condition) { # condition evaluates to TRUE or FALSE
do something1
} else {
do something2
}
x=3
if (x>5) {
print("x is greater than 5")
} else {
print("x is not greater than 5")
}
## [1] "x is not greater than 5"
Write an if else
statement that prints “the vector x has an even number of entries” if the vector \(x\) has an even number of entries, and that otherwise prints “the vector x does not have an even number of entries”.
if else if
StatementsWe can put two conditions together in an if else if
statement to have three different actions. The code for an if else if
statement is
if (condition1) { # condition1 evaluates to TRUE or FALSE
do something1
} else if (condition2) { # condition2 evaluates to TRUE or FALSE
do something2
} else {
do something3
}
For instance now we can properly deal with the 3 cases of >
, <
and =
.
x=3
if (x>5) {
print("x is greater than 5")
} else if (x==5) {
print("x is equal to 5")
} else {
print("x is less than 5")
}
## [1] "x is less than 5"
We wouldn’t want to do more than two conditions using this command. Instead, we would use switch
. See the example in the book Wickham and Grolemund linked above.
for
LoopsR is designed to avoid expliciat loops for speed: vectorization and the apply
family allow us to avoid loops.
But nevertheless, sometimes we want to repeat a procedure many times using a different index each time, and it can’t be handled by vectorization or the apply
family. For repeating a procedure over a fixed set of indices, we use a for
loop. The code for a for
loop is
for (n in somevector) {
do something involving n
}
Let’s print 1 to 5.
for (i in 1:5) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
Let’s print the sequence of words in a vectors.
for (n in c("I","want","Ferraris","in","the","garage",".")){
print(n)
}
## [1] "I"
## [1] "want"
## [1] "Ferraris"
## [1] "in"
## [1] "the"
## [1] "garage"
## [1] "."
Write a for loop to make scatter plots of mpg
against all variables in the mtcars
data set (recall the names
function). Don’t worry about the axis labels for now.
As we learned earlier, we shouldn’t make a scatterplot of a continuous variable against a categorical variable, as we just did in the previous exercise. So, improve your code in the previous exercise to plot mpg
only against the other continuous variables. For this you will need to use the ?
command to inspect all variables and decide which are continuous.
Write a for loop that sums 1 to 100 using the aggregation pattern in a for loop.
Write a for loop that finds the product of 1 to 5 using the aggregation pattern in a for loop.
Write a for loop that makes the vector c(1,2,3,4,5)
using the aggregation pattern with appending vector entries.
while
LoopsUse these when you don’t know how many iterations you will have.
while (condition) {
do something
}
User defined functions are incredibly useful. Remember the rule of thumb: never copy and paste more than twice! If you need to copy and paste more than twice, then use a loop
or a function
instead.
The code for a function of one variable is:
functionname = function(x) {
somecomputation of a valueORobject
return(valueORobject)
}
Here is a function that squares the input.
square = function(x){
return(x^2)
}
square(3)
## [1] 9
Recall: from a sample, the 95% confidence interval for a population mean has left endpoint \[\overline{x}-1.96*\sigma/\sqrt{n}\] and has right endpoint \[\overline{x}+1.96*\sigma/\sqrt{n}.\] Write functions that compute the left and right endpoints.
leftendpoint = function(xbar,sigma,n){
}
#rightendpoint =
These would be useful in a data frame for instance.
See the other files.