- R Data Science Essentials
- Raja B. Koushik Sharan Kumar Ravindran
- 530字
- 2025-04-04 20:23:12
Reading data from different sources
Importing data to R is quite simple and can be done from multiple sources. The most common method of importing data to R is through the comma-separated values (CSV) format. The CSV data can be accessed through the read.csv
function. This is the simplest way to read the data as it requires just a single line command and the data is ready. Depending on the quality of the data, it may or may not require processing.
data <- read.csv("c:/local-data.csv")
The other function similar to read.csv
is read.csv2
. This function is also used to read the CSV files but the difference is that read.csv2
is mostly used in the European countries, where comma is used as decimal point and semicolon is used as a separator. Also, the data can be read from R using a few more parameters, such as read.table
and read.delim
. By default, read.delim
is used to read tab-delimited files, and the read.table
function can be used to read any file by supplying suitable parameters as the input:
data <- read.delim("local-data.txt", header=TRUE, sep="\t") data <- read.table("local-data.txt", header=TRUE, sep="\t")
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
All the preceding functions can take multiple parameters that would explain the data source's format at best. Some of these parameters are as follows:
header
: This is a logical value indicating the presence of column names in the file. When it is set toTRUE
, it indicates that the column names are present. By default, the value is considered asTRUE
.sep
: This defines the separator in the file. By default, the separator is comma forread.csv
, tab forread.delim
, and white space for theread.table
function.nrows
: This specifies the maximum number of rows to read from the file. By default, the entire file will be read.row.names
: This will specify which column should be considered as a row name. When it is set asNULL
, the row names will be forced as numbers. This parameter will take the column's position (one represents the first column) as input.fill
: This parameter when set asTRUE
can read the data with unequal row lengths and blank fields are implicitly added.
These are some of the common parameters used along with the functions to read the data from a file.
We have so far explored reading data from a delimited file. In addition to this, we can read data in Excel formats as well. This can be achieved using the xlsx
or XLConnect
packages. We will see how to use one of these packages in order to read a worksheet from a workbook:
install.packages("xlsx") library(xlsx) mydata <- read.xlsx("DTH AnalysisV1.xlsx", 1) head(mydata)
In the preceding code, we first installed the xlsx
package that is required to read the Excel files. We loaded the package using the library
function, then used the read.xlsx
function to read the excel file, and passed an additional parameter, 1
, that specifies which sheet to read from the excel file.