R TUTORIAL : 3 – Import data into R13th August 2018
The data you want to import into R can come in all sorts for formats: flat files, statistical software files, databases and web data.
Getting different types of data into R often requires a different approach to use. To learn more in general on how to get different data types into R you can check out this online Importing Data into R tutorial , this post on data importing, or this webinar by RStudio.
Flat files are typically simple text files that contain table data. The standard distribution of R provides functionality to import these flat files into R as a data frame with functions such as read.table() and read.csv() from the utils package. Specific R packages to import flat files data are readr, a fast and very easy to use package that is less verbose as utils and multiple times faster (more information), and data.table’s fread() function for importing and munging data into R (using the fread function).
In case you want to get your excel files into R, it’s a good idea to have a look at the readxl package. Alternatively, there is the gdata package which has function that supports the import of Excel data, and the XLConnect package. The latter acts as a real bridge between Excel and R meaning you can do any action you could do within Excel but you do it from inside R. Read more on importing your excel files into R.
Software packages such as SAS, STATA and SPSS use and produce their own file types. The haven package by Hadley Wickham can deal with importing SAS, STATA and SPSS data files into R and is very easy to use. Alternatively there is the foreign package, which is able to import not only SAS, STATA and SPSS files but also more exotic formats like Systat and Weka for example. It’s also able to export data again to various formats.
The packages used to connect to and import from a relational database depend on the type of database you want to connect to. Suppose you want to connect to a MySQL database, you will need the RMySQL package. Others are for example the RpostgreSQL and ROracle package.The R functions you can then use to access and manipulate the database, is specified in another R package called DBI.
If you want to harvest web data using R you need to connect R to resources online using API’s or through scraping with packages like rvest. To get started with all of this, there is this great resource freely available on the blog of Rolf Fredheim.
(Any Comment related to this post or for any help/discussion please write here.)