ex1_intro2R.knit

Reading data

Our first exercise entails reading data from files and summarizing and combining the records. This is very common and important step; but somewhat tricky to learn. We will use the data in a file named “Operating-Health-Facilities-with-location-2014.csv” under “data” read the data in the file into a data.frame and answer all the questions by using R code only.

Replace the ___________ with the appropriate code

Feel free to use materials of Introduction to R in: http://rspatial.org/intr/.

Read the file into a data.frame.

d <- read.csv(___________, stringsAsFactors=FALSE)

how many rows and columns are there?

what are the variable names?

What is the number of facilities for each zone? See data exploration

table(_________)

Which region has most facilities (and how many), and which has least (and how many)?

x <- table(_________)
names(which.min(_________))
names(x)[which.min(x)]
min(_________)

What is the average elevation of all clinics? > x <- d$ALTITUDE[________==“Clinic”] > mean(x, na.rm=_________)
Create a data.frame with four columns: ZONE, REGION, the number of facilities, and their average elevation (by REGION)

we want something like this

ZONE	REGION	elevation	n
1 Central	Dodoma	1059.4	369
2 Central	Singida	1355.4	210
3 Eastern	Dar es Salaam	54.4	599

We can use the aggregate function. If you struggle, divide and conquer. Do what you can do first. Split the task into parts, and then combine them again later.

One approach:
a1 <- aggregate(_________, d[, c(‘ZONE’, ‘REGION’)], FUN=_________, na.rm=_________)
a2 <- aggregate(d$ALTITUDE, d[, c(‘ZONE’, ‘REGION’)], FUN=_________)
a <- merge(_________, _________, by=_________)

7b. provide appropriate column names

colnames(a)____ <- c(‘elevation’, ‘n’)

7c. sort by ZONE, REGION. See Introduction to R. Chapter 11

a <- a[order(_________, _________), ]

7d. elevation should be rounded to the nearest 10 cm

a$elevation <- _________(_________ , _________)

7e. show the first five lines of the data.frame

a[_________, ]

7f. Save “a” in a file on your disk (comma separated by values format)

write._________(_________, _________, row.names=FALSE)

Also read fdile “Tanzania-All-Waterpoints-July-2015.csv”

w <- read.csv(_________, stringsAsFactors=FALSE)

For each region, compute the total number of water points, and the total number of people served (POPULATION.SERVED), and the most common quantity (“WATER_QUAN”) and type (“WATERPOINT”) (either use the “modal” function from the raster package; or use the table function

number of water points:
wa <- aggregate(w[, _________], w[, ‘REGION’, drop=FALSE], _________)
colnames(wa)[2] <- ‘nwp’
number of people served
wb <- aggregate(w[, _________, drop=FALSE], w[, ‘REGION’, drop=FALSE], _________, na.rm=TRUE)
Get modal function from the raster package

library(raster)
wc <- aggregate(w[, _________], w[, ‘REGION’, drop=FALSE], modal, na.rm=TRUE)
or make a function using which.max()
f <- function(x) names(which.max(table(x)))
wc <- aggregate(w[, _________], w[, ‘REGION’, drop=FALSE], f)
combine:

water <- merge(_________, _________, by=‘REGION’)
water <- merge(water, _________, by=‘REGION’)

Make a table that shows counts of water quality types (“WATER_QUAL”) by region

qual <- table(_________, _________)

Make a new data.frame that combines the REGION level totals/averages for the Health facilities and for the water points, and for the water quality to get something like:

REGION	ZONE	elevation	n	nwp	POPULATION.SERVED	WATER_QUAN	WATERPOINT	coloured	fluoride	fluoride abandoned	milky	salty	salty abandoned	soft	unknown
Arusha	Northern	1542.9	334	4564	1776454	insufficient	communal standpipe	18	161	1	5	20	1	4227	131
Dodoma	Central	1059.4	369	3532	4239573	enough	communal standpipe	38	0	0	21	594	22	2694	163
Geita	Lake	1721.7	143	1944	384	enough	hand pump	32	1	0	33	51	1	1561	265
Iringa	Southern Highlands	1541.1	241	2292	108092	enough	communal standpipe	3	0	0	0	27	4	2191	67
Kagera	Lake	1426.5	298	4531	572969	enough	communal standpipe	64	5	0	76	117	2	4066	201
Katavi	Southern Highlands	1162.5	77	1069	217	enough	hand pump	6	0	0	6	127	2	907	21

m <- merge(a, water, by=‘REGION’)
q <- as.data.frame.matrix(qual)
result <- merge(_________, _________, by.x=‘REGION’, by.y=0)
head(result)
or
m <- merge(a, water, by=‘REGION’)
qq <- as.data.frame(qual)
qq <- reshape(, direction=‘wide’, timevar=‘Var2’, idvar=‘Var1’)
result <- merge(_________, _________, by.x=‘REGION’, by.y=1)