Programming/Kdb/Labs/Exploratory data analysis

In this lab we'll make sense of the following data set from the UCI Machine Learning Repository:

Name: Real estate valuation data set
Data Set Characteristics: Multivariate
Attribute Characteristics: Integer, Real
Associated Tasks: Regression
Number of Instances: 414
Number of Attributes: 7
Missing Values? N/A
Area: Business
Date Donated: 2018.08.18
Number of Web Hits: 111,613
Original Owner and Donor: Prof. I-Cheng Yeh, Department of Civil Engineering, Tamkang University, Taiwan
Relevant papers:
- Yeh, I.C., and Hsu, T.K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271.

URL: https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set

There are many data sets on UCI that are worth exploring. We picked this one because it is relatively straightforward and clean.

Let's read the data set information:

The market historical data set of real estate valuation is collected from Sindian Dist., New Taipei City, Taiwan. The real estate valuation is a regression problem. The data set was randomly split into the training data set (2/3 samples) and the testing data set (1/3 samples).

This paragraph describes how the original researchers split up the data set. We will split it up differently: fifty-fifty.

Let's read on:

The inputs are as follows:

X1 = the transaction date

Anonymous

Search

Programming/Kdb/Labs/Exploratory data analysis

Namespaces

More

Page actions

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Programming/Kdb/Labs/Exploratory data analysis

Navigation

Wiki tools

Page tools