Dataset

A Survey of Methods and Input Data Types for House Price Prediction: Literature list

General file description This xlsx document contains the literature list that forms the basis of the paper 'A Survey of Methods and Input Data Types for House Price Prediction' by Geerts, M., vanden Broucke, S. and De Weerdt, J. The Excel document contains seven sheets, relating to the phases described in the survey. Phase3 This sheet contains the literature list for the end of Phase 2 and the start of Phase 3. It has 590 rows and 19 columns. Each row contains the citation information of one article. The columns describe the ID, Authors, Title, Year, Source title, Volume, Issue, DOI, ISSN, ISBN, PubMed, Publisher, Document Type, Language, Keywords, Link, Book DOI, Algorithmic (Title) and Algorithmic (Abstract). The latter two columns are used to indicate whether the articles describe an algorithmic approach to predict house prices based on the title and the abstract respectively. These two columns take the values 'Yes', 'No', and 'Maybe', and were completed during Phase 3. Phase4 This sheet contains the literature list for the end of Phase 3 and the start of Phase 4. It has 116 rows and 20 columns. Each row contains the citation information of one article. The columns describe the ID, Authors, Title, Year, Source title, Volume, Issue, DOI, ISSN, ISBN, PubMed, Publisher, Document Type, Language, Keywords, Link, Book DOI, Algorithmic (Title), Algorithmic (Abstract) and Reading. All columns are the same as in the first sheet, except for the three last columns. The columns Algorithmic (Title) and Algorithmic (Abstract) now only contain the value 'Yes' as only the articles that describe an algorithm are retained in Phase 3. The column Reading describes the outcome of Phase 4. This columns is empty if the article is retained in this phase and describes the reason if it is not retained. Phase4(end) This sheet contains the literature list for the end of Phase 4. It has 94 rows and 20 columns. Each row contains the citation information of one article. The columns describe the ID, Authors, Title, Year, Source title, Volume, Issue, DOI, ISSN, ISBN, PubMed, Publisher, Document Type, Language, Keywords, Link, Book DOI, Algorithmic (Title), Algorithmic (Abstract) and Reading. All columns are the same as in the second sheet. The column Reading is now empty because the articles that were not retained in Phase 4 are removed from the list. Data table This sheet contains a table of the literature at the end of Phase 4 with indications of input data types used in the articles, the data novelty score and the cluster that the articles belong to. It has 95 rows, where each row contains the information of one article, except the last 'Total' row. It contains 21 columns :

ID: This is the same identifier as in the previous sheets.
Column1: This is a new identifier, based on an ordering on year and author.
Authors: Same as before.
Title: Same as before.
Year: Same as before.
Structural, Temporal data, Socioeconomic, Environmental, POI, Basic spatial, Location, Eucl Distances, Adv Spatial, Network Distance, Topographical data, Graphs, Images, Text: These are the different input data types. The cell is filled with 'X' if the corresponding article is using the input data type described in the column name.
Score: This column indicates the data novelty score, calculated as explained in the paper based on the sheet 'Rules Data novelty score'.
Cluster: This column indicates the cluster number as explained in the Discussion section of the paper.

Rules Data novelty score This sheet contains 15 rows, of which the first contains the titles, and two columns. The first columns contains the input data types as in the previous sheet and the second column contains the respective novelty scores. Model table This sheet contains a table of the literature at the end of Phase 4 with indications of model types used in the articles, the model novelty score and the cluster that the articles belong to. It has 95 rows, where each row contains the information of one article, except the last 'Total' row. It contains 21 columns :

ID: Same as before.
Column1: Same as before
Authors: Same as before.
Title: Same as before.
Year: Same as before.
MRA, Kriging, SEM, SVC, Time Series, FL, NN, DT, RF, GBT, SVM, ANN, (Other) Ensembles, DL: These are the different model types. The cell is filled with 'X' if the corresponding article is using the model type described in the column name.
Score: This column indicates the model novelty score, calculated as explained in the paper based on the sheet 'Rules Model novelty score'.
Cluster: This column indicates the cluster number as explained in the Discussion section of the paper.

Rules Model novelty score This sheet contains 15 rows, of which the first contains the titles, and two columns. The first columns contains the model types as in the previous sheet and the second column contains the respective novelty scores.

Jaar van publicatie:2022

DOI: https://doi.org/10.48804/h1qclh

Toegankelijkheid:open

Uitgever:KU Leuven RDR

Licentie:CC-BY-ND-4.0

Formaat:xlsx

Trefwoorden: data mining, geospatial data, house price prediction, literature review, machine learning, real estate, regression, spatial-temporal systems, survey

Dataset

A Survey of Methods and Input Data Types for House Price Prediction: Literature list

Creators/Contributors