Data Science Strategies for Real Estate Development
By expanding the margins of knowledge through coordination and organization, data science helps to unlock better buildings for sustainability, equitability, and health while maintaining financial efficiency and profitability. This research explores what data can be for real estate development more broadly: What real estate data is out there? Where is data science headed in helping us to answer questions? And, ultimately, how can data science help us build better buildings and districts through the real estate development process? As a result, this research presents insights on the characteristics of data and data management solution providers in the real estate industry, analyzes outcomes and features of the development process, and suggests recommendations for real estate development stakeholders. The understanding gained from this study can help real estate stakeholders consider data science applications from the perspective of the real estate development process. Moreover, the insights on the current market dynamics of real estate data companies can help various stakeholders recognize the potential saturation and gap in the market.
We collected data science company data to understand which firms are providing data or data management solutions. We studied their similarities, differences, and overall distribution patterns relating to the data delivery method, data collection process, data audience, real estate function, real estate product type, real estate development phases, and API availability. Similarly, we then applied a data science framework to the real estate development process. To do so, we modeled characteristics of outcomes and features related to the real estate development process. Afterwards, we linked the data companies to the data science frameworks to see where data science architectures could be applied to the real estate development process.
“Core skills for development, design, and planning are shifting to encompass analytics in data science and machine learning.”
– Dr. Andrea Chegut, MIT Real Estate Innovation Lab
Results from Market Analysis:
- The distribution of data and solution providers showed similar importance in utilizing both internal and external data in making business decisions.
- The number of applicable data and solution providers increases as the development process progresses from Idea Inception to Asset Management and/or Sales.
- The majority of the companies surveyed provided data or solutions related to the Brokerage and Sales function of real estate. This observation is aligned with the current market sentiment that an app exists for every step of the home-buying and home-ownership experience.
“With proximity to consumerism, residential real estate has more data and solution providers, and many of them strive to solve operational issues. However, there is so much more to be done for data science and real estate development.“
– Sunnie (Sun Jung) Park, MIT Real Estate Innovation Lab


Distribution of Data and Data Solution Companies by Real Estate Function
Results from the Real Estate Development Process:
- Physical and Design analysis among all task categories have the highest number of tasks.
- When we look at real estate development phases and the number of tasks within them, we find that more than half of the tasks fall under Preconstruction and Feasibility phases.
- Project IRR (Internal Rate of Return) is the most common outcome necessary for making a decision in real estate development.
- Out of 588 unique features, Project Location was the most frequently appearing feature.
“The next technology development I’m watching is real estate intelligence…There are AI, big data, wide data, machine learning, neurolearning, and so much more impacting other industries, and we’re going to see it actually impact the real estate business, too.”
– Steve Weikal, MIT Real Estate Innovation Lab


Distribution of Real Estate Development Tasks by Task Category and Real Estate Phase



