The information available in the "Survey on Urban Housing" (IHU) was systematized in a more structured way, in a market standard relational database management system (RDBMS). The model chosen was that of an SQL database, which is accessible by tools that are more standardized and easier to use than those used in the doctoral thesis (which for example imply knowledge of scientific programming languages such as Python).
The database is available for download in Microsoft SQLServer and Access file formats.
The complete set of information for the 279 homes was organized in the following relational scheme:
This relational scheme is almost canonical (in terms of the relational database' canon), with only one or another detail in which the canon was broken (for example the HouseType field in the HouseRooms table), in terms of redundancy. This was due to a sought after facilitation of future manipulations, for example in queries to the database.
However, users of the database should be careful in maintaining the integrity of the database, as it is not automatically ensured in these cases by the RDBMS
The information provided in the two volumes of the "Survey on Urban Housing" does not reveal the same logical organization care that an RDBMS has. As all the work of the survey was carried out over several years and predictably with many stakeholders, there is a certain lack of consistency in criteria in the records.
Therefore, some heuristic assumptions had to be made, although care was taken to make the least inferences, which could create biased analysis in the future. In fact, if certain theoretical assumptions are assumed in the accumulation of facts, future analysis can only confirm these initial assumptions. In this way, we tried to maintain the same level of information and only a few obvious errors were corrected.
Some noticed problems:
- The survey has information on what activities are carried out in each room (WHAT-WHERE). There is also information about the presence of people in each room at each time of the day (WHO-WHERE-WHEN). However, there is no crossing of these two data series. Thus, we do not have more fine grained information that encompasses all WHO-WHAT-WHERE-WHEN situations.
- Information regarding children is not exempt from inconsistencies. It is not possible to simultaneously state the child's age and sex, or their temporal location in rooms. It would be possible to make inferences, for example the age related to sleeping with the parents or having a room on their own. This assumption would have the consequence that future analysis from the data would inevitably come to the conclusion that these inferences were right. Of course, the inference that there is only one child, he is the age of an only child, seems indisputable. But even so, it was decided not to make any inference in this regard.
- The 'all' type of person was assumed to be each and every family member, excluding the employee, although the notation of the texts introduces many doubts.