The naijR package: Philosophy, Purpose and Usage (Part 2)

In the last post, an attempt was made to highlight the motivation behind the creation of the naijR package as well as the expectations for how it could add value to the R ecosystem. In this article, we will demonstrate a few situations where the package could be put to use and reveal some of our future plans.

Broadly speaking, the current functionality of the package is two-fold:

  1. Manipulation of data/information on Nigeria, including administrative data.
  2. Drawing of various Nigerian maps.

Administrative data

Nigeria is a republic that is presently made up of 36 federating units called States. Each of these States has a lower governing unit called the Local Government Area (LGA); nationwide, there are 774 of these LGAs. Also, Nigeria has a Federal Capital Territory (FCT), which is not a State in the strict sense, but in most situations is classified as one – it has 6 Area Councils, which are synonymous with the LGAs.

The naijR packages has functions that make it easier to manage data of this nature. To use it, we will first load it into an R session

library(naijR)

States

To create an object with the names of the country’s States, we have the states function

all_states <- states()
all_states
##  [1] "Abia"                      "Adamawa"                  
##  [3] "Akwa Ibom"                 "Anambra"                  
##  [5] "Bauchi"                    "Bayelsa"                  
##  [7] "Benue"                     "Borno"                    
##  [9] "Cross River"               "Delta"                    
## [11] "Ebonyi"                    "Edo"                      
## [13] "Ekiti"                     "Enugu"                    
## [15] "Federal Capital Territory" "Gombe"                    
## [17] "Imo"                       "Jigawa"                   
## [19] "Kaduna"                    "Kano"                     
## [21] "Katsina"                   "Kebbi"                    
## [23] "Kogi"                      "Kwara"                    
## [25] "Lagos"                     "Nasarawa"                 
## [27] "Niger"                     "Ogun"                     
## [29] "Ondo"                      "Osun"                     
## [31] "Oyo"                       "Plateau"                  
## [33] "Rivers"                    "Sokoto"                   
## [35] "Taraba"                    "Yobe"                     
## [37] "Zamfara"

One will notice that length(all_states) is 37 because the FCT is included by default. To prevent this, we can set the all argument to FALSE.

no_fct <- states(all = FALSE)
length(no_fct)
## [1] 36
"Federal Capital Territory" %in% no_fct
## [1] FALSE

It is the local convention to group the States into six (6) “geo-political zones” (GPZ), namely North-Central, North-East, North-West, South-East, South-South and South-West. The function also helps out with this:

states(gpz = 'se')
## [1] "Abia"    "Anambra" "Ebonyi"  "Enugu"   "Imo"
states(gpz = c('nc', 'sw'))
##  [1] "Benue"    "Kogi"     "Kwara"    "Nasarawa" "Niger"    "Plateau" 
##  [7] "Ekiti"    "Lagos"    "Ogun"     "Ondo"     "Osun"     "Oyo"

Local Government Areas

The function lgas_ng creates a character vector that has LGAs as its elements. Thus, to get the 774 LGAs, we can say

lgas <- lgas_ng()
tail(lgas)
## [1] "Zango"        "Zangon Kataf" "Zaria"        "Zing"         "Zurmi"       
## [6] "Zuru"

and to get those in given States, we provide it as the ng.state argument, for example

# LGAs in Benue State
lgas_ng("Benue")
##  [1] "Agatu"       "Apa"         "Ado"         "Buruku"      "Gboko"      
##  [6] "Guma"        "Gwer East"   "Gwer West"   "Katsina-Ala" "Konshisha"  
## [11] "Kwande"      "Logo"        "Makurdi"     "Obi"         "Ogbadibo"   
## [16] "Ohimini"     "Oju"         "Okpokwu"     "Oturkpo"     "Tarka"      
## [21] "Ukum"        "Ushongo"     "Vandeikya"

When we have more than one State for which we want to fetch the LGAs, we still use a character vector of the States to do so using the ng.state argument

my_states <- c("Kebbi", "Kwara")
states2 <- lgas_ng(my_states)
states2
## $Kebbi
##  [1] "Aleiro"       "Arewa Dandi"  "Argungu"      "Augie"        "Bagudo"      
##  [6] "Birnin Kebbi" "Bunza"        "Dandi"        "Fakai"        "Gwandu"      
## [11] "Jega"         "Kalgo"        "Koko/Besse"   "Maiyama"      "Ngaski"      
## [16] "Sakaba"       "Shanga"       "Suru"         "Wasagu/Danko" "Yauri"       
## [21] "Zuru"        
## 
## $Kwara
##  [1] "Asa"          "Baruten"      "Edu"          "Ekiti"        "Ifelodun"    
##  [6] "Ilorin East"  "Ilorin South" "Ilorin West"  "Irepodun"     "Isin"        
## [11] "Kaiama"       "Moro"         "Offa"         "Oke Ero"      "Oyun"        
## [16] "Pategi"

Observe that when the elements of ng.state are 2 or more, the function lgas_ng returns a named list, and not a vector. This is to ease the State-wise retrieval of the LGAs

states2$Kebbi
##  [1] "Aleiro"       "Arewa Dandi"  "Argungu"      "Augie"        "Bagudo"      
##  [6] "Birnin Kebbi" "Bunza"        "Dandi"        "Fakai"        "Gwandu"      
## [11] "Jega"         "Kalgo"        "Koko/Besse"   "Maiyama"      "Ngaski"      
## [16] "Sakaba"       "Shanga"       "Suru"         "Wasagu/Danko" "Yauri"       
## [21] "Zuru"

Putting together the states and lgas_ng functions, it is easy to list the LGAs in a given GPZ and then extract them by State,

lgas_ss <- lgas_ng(states('ss'))
str(lgas_ss)
## List of 6
##  $ Akwa Ibom  : chr [1:31] "Abak" "Eastern Obolo" "Eket" "Esit Eket" ...
##  $ Bayelsa    : chr [1:8] "Brass" "Ekeremor" "Kolokuma/Opokuma" "Nembe" ...
##  $ Cross River: chr [1:18] "Abi" "Akamkpa" "Akpabuyo" "Bakassi" ...
##  $ Delta      : chr [1:25] "Aniocha North" "Aniocha South" "Bomadi" "Burutu" ...
##  $ Edo        : chr [1:18] "Akoko-Edo" "Egor" "Esan Central" "Esan North-East" ...
##  $ Rivers     : chr [1:23] "Abua/Odual" "Ahoada East" "Ahoada West" "Akuku-Toru" ...
lgas_ss$`Akwa Ibom`
##  [1] "Abak"              "Eastern Obolo"     "Eket"             
##  [4] "Esit Eket"         "Essien Udim"       "Etim Ekpo"        
##  [7] "Etinan"            "Ibeno"             "Ibesikpo Asutan"  
## [10] "Ibiono-Ibom"       "Ika"               "Ikono"            
## [13] "Ikot Abasi"        "Ikot Ekpene"       "Ini"              
## [16] "Itu"               "Mbo"               "Mkpat-Enin"       
## [19] "Nsit-Atai"         "Nsit-Ibom"         "Nsit-Ubium"       
## [22] "Obot Akara"        "Okobo"             "Onna"             
## [25] "Oron"              "Oruk Anam"         "Udung-Uko"        
## [28] "Ukanafun"          "Uruan"             "Urue-Offong/Oruko"
## [31] "Uyo"

Testing for States

One may come across a situation where we want to test whether a given vector has Nigerian States as its element(s). For this, we can use the function is_state, which returns a logical vector as output.

is_state("Adamawa")
## [1] TRUE
is_state("Oklahoma")
## [1] FALSE

The function allows for some fine-tuning via the arguments test and allow.na. Note that the default value for allow.na is TRUE

state_nw <- states('nw')
is_state(state_nw)
## [1] TRUE
is_state(state_nw, test = 'selected')
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
state_nw[4] <- NA
is_state(state_nw)
## [1] TRUE
is_state(state_nw, test = 'selected')
## [1] TRUE TRUE TRUE   NA TRUE TRUE TRUE
is_state(state_nw, test = 'selected', allow.na = FALSE)
## [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

For the details of how this function works, check ?naijR::is_state.

One scenario where this function will prove very useful is where you have a data table and you want to check a given column for an accurate presentation of the states

state <- c("Abia", "Admawa", "Kano", "Born", "Kebbi", "Plateau", "Imo", "Oyo")
category <- sample(c("A", "B", "C"), 8, replace = TRUE)
numb <- c("123456789", "0123456789", "8000000001", "9012345678", "07098765432", "08123456789", "09064321987", "O8055577889")
dframe <- data.frame(state = state, category = category, number = numb,
                     stringsAsFactors = FALSE)
head(dframe)
##     state category      number
## 1    Abia        C   123456789
## 2  Admawa        A  0123456789
## 3    Kano        A  8000000001
## 4    Born        B  9012345678
## 5   Kebbi        B 07098765432
## 6 Plateau        B 08123456789

The reader should be aware that some of the States have been misspelt. To check whether the column in its entirety is correct

is_state(dframe$state)
## [1] FALSE

So something is wrong with this column. Sometimes, the user may want more granular details of such issues

result <- is_state(dframe$state, test = 'selected')
result
## [1]  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE

One could use this to fish out the particular row where the problem exists.

dframe[which(!result), ]
##    state category     number
## 2 Admawa        A 0123456789
## 4   Born        B 9012345678

This example is somewhat contrived, but when you have a data frame with thousands of rows, this function becomes a life-saver.

Phone numbers

On many an occasion, we want to work with phone numbers. In Nigeria, that would be mostly mobile numbers. The function fix_mobile is essentially a tool for data cleaning: It adds leading zeros to phone numbers where they are missing (a common issue with data read from MS Excel spreadsheets) and it removes numbers that are either not correct Nigeria mobile numbers or those that have been so badly entered that they cannot be fixed. Using our data frame dframe

new_num <- fix_mobile(dframe$number)
# compare numbers
data.frame(old = dframe$number, new = new_num, stringsAsFactors = FALSE)
##           old         new
## 1   123456789        
## 2  0123456789        
## 3  8000000001 08000000001
## 4  9012345678 09012345678
## 5 07098765432 07098765432
## 6 08123456789 08123456789
## 7 09064321987 09064321987
## 8 O8055577889        

One may notice that the numbers that were really bad have been removed and those without leading zeros have been repaired. Also, the last number had an ‘O’ where there ought to be a zero–a common data entry mistake.

Maps

The mapping functionalities of naijR are based primarily on the map_ng function. To draw a map with state boundaries shown

map_ng()

states-of-nigeria

The function can also be used to plot points as well as chropleth maps. As this blog post is now considerably long, I would suggest that the reader study this vignette to get the details on how to draw these maps.

Future plans

This package is still in its early days and a number features are planned for future releases, such as

  • LGA and ward level maps
  • Integration with key data-related APIs
  • Generation of legacy maps beginning with the year of Independence
  • Intelligent name matching for administrative divisions
  • Distance computations
  • and many more…

Feebback and contributions to this project are welcome; to do so please visit the GitHub repository and file an issue or make a pull request. All contributions will be duly acknowledged.

 

2 thoughts on “The naijR package: Philosophy, Purpose and Usage (Part 2)

Comments

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s