Our first exercise entails reading data from files and summarizing and combining the records. This is very common and important step; but somewhat tricky to learn. We will use the data in a file named “Operating-Health-Facilities-with-location-2014.csv” under “data” read the data in the file into a data.frame and answer all the questions by using R code only.
Replace the ___________ with the appropriate code
Feel free to use materials of Introduction to R in: http://rspatial.org/intr/.
d <- read.csv(___________, stringsAsFactors=FALSE)
table(_________)
x <- table(_________)
names(which.min(_________))
names(x)[which.min(x)]
min(_________)
What is the average elevation of all clinics? > x <- d$ALTITUDE[________==“Clinic”] > mean(x, na.rm=_________)
Create a data.frame with four columns: ZONE, REGION, the number of facilities, and their average elevation (by REGION)
we want something like this
ZONE | REGION | elevation | n |
---|---|---|---|
1 Central | Dodoma | 1059.4 | 369 |
2 Central | Singida | 1355.4 | 210 |
3 Eastern | Dar es Salaam | 54.4 | 599 |
We can use the aggregate function. If you struggle, divide and conquer. Do what you can do first. Split the task into parts, and then combine them again later.
One approach:
a1 <- aggregate(_________, d[, c(‘ZONE’, ‘REGION’)], FUN=_________, na.rm=_________)
a2 <- aggregate(d$ALTITUDE, d[, c(‘ZONE’, ‘REGION’)], FUN=_________)
a <- merge(_________, _________, by=_________)
7b. provide appropriate column names
colnames(a)____ <- c(‘elevation’, ‘n’)
7c. sort by ZONE, REGION. See Introduction to R. Chapter 11
a <- a[order(_________, _________), ]
7d. elevation should be rounded to the nearest 10 cm
a$elevation <- _________(_________ , _________)
7e. show the first five lines of the data.frame
a[_________, ]
7f. Save “a” in a file on your disk (comma separated by values format)
write._________(_________, _________, row.names=FALSE)
w <- read.csv(_________, stringsAsFactors=FALSE)
number of water points:
wa <- aggregate(w[, _________], w[, ‘REGION’, drop=FALSE], _________)
colnames(wa)[2] <- ‘nwp’
number of people served
wb <- aggregate(w[, _________, drop=FALSE], w[, ‘REGION’, drop=FALSE], _________, na.rm=TRUE)
Get modal function from the raster package
library(raster)
wc <- aggregate(w[, _________], w[, ‘REGION’, drop=FALSE], modal, na.rm=TRUE)
or make a function using which.max()
f <- function(x) names(which.max(table(x)))
wc <- aggregate(w[, _________], w[, ‘REGION’, drop=FALSE], f)
combine:
water <- merge(_________, _________, by=‘REGION’)
water <- merge(water, _________, by=‘REGION’)
qual <- table(_________, _________)
REGION | ZONE | elevation | n | nwp | POPULATION.SERVED | WATER_QUAN | WATERPOINT | coloured | fluoride | fluoride abandoned | good | milky | salty | salty abandoned | soft | unknown |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Arusha | Northern | 1542.9 | 334 | 4564 | 1776454 | insufficient | communal standpipe | 18 | 161 | 1 | 0 | 5 | 20 | 1 | 4227 | 131 |
Dodoma | Central | 1059.4 | 369 | 3532 | 4239573 | enough | communal standpipe | 38 | 0 | 0 | 0 | 21 | 594 | 22 | 2694 | 163 |
Geita | Lake | 1721.7 | 143 | 1944 | 384 | enough | hand pump | 32 | 1 | 0 | 0 | 33 | 51 | 1 | 1561 | 265 |
Iringa | Southern Highlands | 1541.1 | 241 | 2292 | 108092 | enough | communal standpipe | 3 | 0 | 0 | 0 | 0 | 27 | 4 | 2191 | 67 |
Kagera | Lake | 1426.5 | 298 | 4531 | 572969 | enough | communal standpipe | 64 | 5 | 0 | 0 | 76 | 117 | 2 | 4066 | 201 |
Katavi | Southern Highlands | 1162.5 | 77 | 1069 | 217 | enough | hand pump | 6 | 0 | 0 | 0 | 6 | 127 | 2 | 907 | 21 |
m <- merge(a, water, by=‘REGION’)
q <- as.data.frame.matrix(qual)
result <- merge(_________, _________, by.x=‘REGION’, by.y=0)
head(result)
or
m <- merge(a, water, by=‘REGION’)
qq <- as.data.frame(qual)
qq <- reshape(, direction=‘wide’, timevar=‘Var2’, idvar=‘Var1’)
result <- merge(_________, _________, by.x=‘REGION’, by.y=1)