Reading data from different sources_R Data Science Essentials-QQ阅读中文幻言网

书名：R Data Science Essentials
作者名：Raja B. Koushik Sharan Kumar Ravindran
本章字数：530字
更新时间：2025-04-04 20:23:12

Reading data from different sources

Importing data to R is quite simple and can be done from multiple sources. The most common method of importing data to R is through the comma-separated values (CSV) format. The CSV data can be accessed through the read.csv function. This is the simplest way to read the data as it requires just a single line command and the data is ready. Depending on the quality of the data, it may or may not require processing.

data <- read.csv("c:/local-data.csv")

The other function similar to read.csv is read.csv2. This function is also used to read the CSV files but the difference is that read.csv2 is mostly used in the European countries, where comma is used as decimal point and semicolon is used as a separator. Also, the data can be read from R using a few more parameters, such as read.table and read.delim. By default, read.delim is used to read tab-delimited files, and the read.table function can be used to read any file by supplying suitable parameters as the input:

data <- read.delim("local-data.txt", header=TRUE, sep="\t")
data <- read.table("local-data.txt", header=TRUE, sep="\t")

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

All the preceding functions can take multiple parameters that would explain the data source's format at best. Some of these parameters are as follows:

header: This is a logical value indicating the presence of column names in the file. When it is set to TRUE, it indicates that the column names are present. By default, the value is considered as TRUE.
sep: This defines the separator in the file. By default, the separator is comma for read.csv, tab for read.delim, and white space for the read.table function.
nrows: This specifies the maximum number of rows to read from the file. By default, the entire file will be read.
row.names: This will specify which column should be considered as a row name. When it is set as NULL, the row names will be forced as numbers. This parameter will take the column's position (one represents the first column) as input.
fill: This parameter when set as TRUE can read the data with unequal row lengths and blank fields are implicitly added.

These are some of the common parameters used along with the functions to read the data from a file.

We have so far explored reading data from a delimited file. In addition to this, we can read data in Excel formats as well. This can be achieved using the xlsx or XLConnect packages. We will see how to use one of these packages in order to read a worksheet from a workbook:

install.packages("xlsx")
library(xlsx)
mydata <- read.xlsx("DTH AnalysisV1.xlsx", 1)
head(mydata)

In the preceding code, we first installed the xlsx package that is required to read the Excel files. We loaded the package using the library function, then used the read.xlsx function to read the excel file, and passed an additional parameter, 1, that specifies which sheet to read from the excel file.