Difference between revisions of "Programming/Kdb/Labs/Exploratory data analysis"
From Thalesians Wiki
< Programming | Kdb | Labs
Line 16: | Line 16: | ||
* '''URL:''' <span class="plainlinks">https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set</span> | * '''URL:''' <span class="plainlinks">https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set</span> | ||
There are many data sets on UCI that are worth exploring. We picked this one because it is relatively straightforward and clean. | |||
Let's read the data set information: | |||
<blockquote> | |||
The market historical data set of real estate valuation is collected from Sindian Dist., New Taipei City, Taiwan. The real estate valuation is a regression problem. The data set was randomly split into the training data set (2/3 samples) and the testing data set (1/3 samples). | |||
</blockquote> | |||
This paragraph describes how the original researchers split up the data set. We will split it up differently: fifty-fifty. | |||
Let's read on: | |||
<blockquote> | |||
The inputs are as follows: | |||
* X1 = the transaction date | |||
</blockquote> | |||
</blockquote> |
Revision as of 14:06, 18 June 2021
In this lab we'll make sense of the following data set from the UCI Machine Learning Repository:
- Name: Real estate valuation data set
- Data Set Characteristics: Multivariate
- Attribute Characteristics: Integer, Real
- Associated Tasks: Regression
- Number of Instances: 414
- Number of Attributes: 7
- Missing Values? N/A
- Area: Business
- Date Donated: 2018.08.18
- Number of Web Hits: 111,613
- Original Owner and Donor: Prof. I-Cheng Yeh, Department of Civil Engineering, Tamkang University, Taiwan
- Relevant papers:
- Yeh, I.C., and Hsu, T.K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271.
There are many data sets on UCI that are worth exploring. We picked this one because it is relatively straightforward and clean.
Let's read the data set information:
The market historical data set of real estate valuation is collected from Sindian Dist., New Taipei City, Taiwan. The real estate valuation is a regression problem. The data set was randomly split into the training data set (2/3 samples) and the testing data set (1/3 samples).
This paragraph describes how the original researchers split up the data set. We will split it up differently: fifty-fifty.
Let's read on:
The inputs are as follows:
- X1 = the transaction date