Sustainability Score

Regions by sustainability score for Environmental importance weighting.

Aggregated sustainability indicators and ranking in urban areas of the Republic of Kazakhstan

This project provides an analysis of various cities based on multiple factors such as air pollution, crime rate, household spending, and unemployment rate. It uses a combination of data imputation, normalization, and visualization techniques to rank cities according to their overall scores using equal and Environmental importance weighting techniques for developing aggregated sustainability indicators for live quality evaluation in urban areas of the Republic of Kazakhstan.

Source

All of the data is originally sourced from The Bureau of National Statistics of Kazakhstan and later merged and cleaned for use in this project.

Metadata

Metadata is based on frictionlessdata's data package standard.

Data

The dataset merged_to_normalize.csv contains various metrics for different cities. In total, 16 different parameters from our previous research were included.

We have also added some metadata such as column descriptions and data packaged it.

Data Cleaning:

The last row of the CSV is dropped as it might contain summary or unwanted data.
Only numerical columns are selected for further processing.

Imputation:

Missing values in the numerical columns are imputed using K-Nearest Neighbors (KNN) with n_neighbors=5. This method takes the missing value, evaluates all of the other variables aside from it and then approximates the missing value based on 5 nearest matches from other regions.

Normalization:

Numerical data is scaled using MinMaxScaler to bring all values into the range [0, 1].

Negative Indicators:

Several columns representing negative indicators are adjusted so that higher values represent worse conditions. Namely, the following parameters have been concluded to negatively effect a region's rating:
air_pollution_index: A metric indicating the level of air pollution
crime_amount: Total number of crimes
household_spending_per_month: Average household spending per month
unemployment-rate: Unemployment rate in the city
car-amount-rate: Number of cars per 1000 inhabitants

Positive Indicators:

The rest of the columns that represent positive indicators are as follows:
Region: The name of the region or city.
avg_salary: Average salary of the residents in the region.
Population: Total population of the region.
gdp_per_capita: Gross Domestic Product (GDP) per capita in Tenge.
Postgraduate_Education: Proportion of the population with postgraduate education.
Higher_Education: Proportion of the population with higher education.
life_expectancy: Average life expectancy of the residents.
Med-institution-rate: Number of medical institutions per 10000 inhabitants.
School-number-rate: Number of schools per 10000 inhabitants in the region.
public_transport_quantity: Quantity of public transportation per 10000 inhabitants.
Entertainment-places-rate: Number of entertainment venues per 10000 inhabitants.
clearance_rate: Rate at which crimes are solved divided by the amount of crimes
Total_Score: An overall score or index derived from various metrics to assess the region's quality of life.