R is a beautiful, elegant programming language that’s in use in the world of data science and related fields and is particularly good as a low-hanging fruit for professionals who don’t have a computer science background. At least, this was my experience. When I first encountered the language, I was working on a UN-funded assessment and though I had just 3 months of R know-how under the belt, I was able to successfully use it to get good results and outputs. Even though I feel a little embarrassed at some of the constructs I used at the time, the bottom-line is that I got the job done.
One of the strengths of statistical programming (in whatever language) is reproducibility, which in turn leads to greater productivity. In my short career as a data scientist, I repeatedly ran into coding problems that were peculiar to my country of residence—Nigeria.
In the R datasets package, which comes with the basic installation, we have some objects for obtaining the names of different jurisdictions in the United States of America. For instance, we have
state.abb which, respectively, are the 50 US States and their common abbreviations. We can use these to quickly create a data frame:
us.name <- state.name us.abb <- state.abb us.states <- data.frame(us.name, us.abb) head(us.states)
## us.name us.abb ## 1 Alabama AL ## 2 Alaska AK ## 3 Arizona AZ ## 4 Arkansas AR ## 5 California CA ## 6 Colorado CO
Conversely, if one wanted to analyse data on using the States of Nigeria, you would have to write them out manually every time you ran the analysis, or dig out an old script where they had been written up. A user who is not acquainted with the States of the Federation will have to look them up in an external reference. Time is money.
Similarly, drawing US maps is relatively straightforward with commonly available packages. For instance the maps package draws a state map of the US with this code:
This function can even draw a county-level map when the
'county' argument is supplied.
The package also favours the French and a few other countries by providing easy maps for their own areas.
To be fair, other parts of the world are not left out but drawing their maps takes a little more work (and diligent study of the documentation). To draw the map of Nigeria, we need to additionally load the mapdata package, since it has the CIA World Database that is used for all world regions.
library(mapdata) map('worldHires', 'Nigeria')
Okay, we now have a plain map of Nigeria, but that’s about it. We can’t draw sub-national divisions like States and Local Government Areas.
The inspiration for the naijR package came from working in actual data science projects and an experiencing the significant time wasted on repetitive tasks such manually writing and cross-checking the names of States and Local Governments, and handling poorly entered phone numbers, to mention a few. The initial releases of the package focused on these problems, and later on, mapping capabilities were included.
The motivation for developing this package is two-fold:
- Ease the work of scientists and analysts working on data that are specific to Nigeria: People that work with Nigerian datasets face the real challenge of data cleaning. For instance, one may find Local Government Areas spelt wrongly and many do not find the tedium of cross-checking them worth their while. Also, you will find several Local Government Areas from different States that have the same name. How does one deal with that? How can one tell whether a phone number is a genuine, properly formatted number belonging to one of the telcos operating in the country?
- Ease the onboarding of prospective R users working on local projects: A new R user who has to work with the average Nigerian dataset would probably have a steeper learning curve. This is because they lack various in-built objects and functions as well as existing packages to support their work. To the best of my knowledge, an R package that focuses on Nigeria (and many other African countries) does not exist. The more people have extensions that ease and support the use of R, the more the ecosystem grows and further innovative solutions would emerge. Field experience has revealed there is a heavy reliance on tools such as spreadsheet applications; these are limited for promoting good data science practices and for interoperability among reliable, time-tested platforms and techonologies used for data management.
With naijR it is hoped these and other related issues can receive some attention. In Part 2 of this blog post, I will provide some code demonstrations on naijR‘s key functionalities and how it can help in solving some day-to-day data science problems.
The package has now been published on CRAN and its current stable version is v. 0.1.0. To install it at the R console use
The development version equally can be obtained from GitHub
# install.packages('remotes') remotes::install_github('BroVic/naijR')
naijR is going to be distributed under the terms of the GNU GENERAL PUBLIC LICENSE, Version 3 and users are free to use the source code within its confines. Also bug reports and pull requests are highly welcomed.
As stated earlier, I will demonstrate possible use cases for the package, but the impatient reader is encouraged look up the package documentation:
help(package = "naijR") vignette('nigeria-maps', package = 'naijR')
Any feedback will be appreciated.