Logo

Combined indicators for regions and cities

FilesSizeFormatCreatedUpdatedLicenseSource
23.42 kBcsvThe Bureau of National Statistics of Kazakhstan

Indicatores that describes each 19 regions in Republic of Kazakhstan

Read more

Data Files

FileDescriptionSizeLast modifiedDownload
normalized-indicators
2.66 kB
normalized-indicators
final-rating
759 B
final-rating

Data Previews

Normalized and combined indicators

Schema

nametypeformatdescription
RegionstringdefaultRegion and major cities name
air_pollution_indexnumberdefaultA metric indicating the level of air pollution
avg_salarynumberdefaultAverage salary of the residents in the region
Total_city_populationnumberdefaultTotal population of the region
clearance_ratenumberdefaultRate at which crimes are solved divided by the number of crimes
crime_amountnumberdefaultTotal number of crimes
gdp_per_capita_tengenumberdefaultGross Domestic Product (GDP) per capita in Tenge
Postgraduate_EducationnumberdefaultProportion of the population with postgraduate education
Higher_EducationnumberdefaultProportion of the population with higher education
household_spending_per_monthnumberdefaultAverage household spending per month
life_expectancynumberdefaultAverage life expectancy of the residents
med_institutions_amountnumberdefaultNumber of medical institutions per 10,000 inhabitants
number_of_carsnumberdefaultNumber of cars per 1,000 inhabitants
number_of_schoolsnumberdefaultNumber of schools per 10,000 inhabitants in the region
public_transport_quantitynumberdefaultQuantity of public transportation per 10,000 inhabitants
unemployment_ratenumberdefaultUnemployment rate in the city.
number of entertainment placesintegerdefaultNumber of entertainment venues per 10,000 inhabitants

Final gained score for each region and city

Schema

nametypeformatdescription
idintegerdefaultUnique identifier
RegionstringdefaultRegion and major cities name
Total_ScorenumberdefaultTotal score

Aggregated sustainability indicators and ranking in urban areas of the Republic of Kazakhstan

This project provides an analysis of various cities based on multiple factors such as air pollution, crime rate, household spending, and unemployment rate. It uses a combination of data imputation, normalization, and visualization techniques to rank cities according to their overall scores using equal and Environmental importance weighting techniques for developing aggregated sustainability indicators for live quality evaluation in urban areas of the Republic of Kazakhstan.

Source

All of the data is originally sourced from The Bureau of National Statistics of Kazakhstan and later merged and cleaned for use in this project.

Metadata

Metadata is based on frictionlessdata's data package standard.

Data

The dataset merged_to_normalize.csv contains various metrics for different cities. In total, 16 different parameters from our previous research were included.

We have also added some metadata such as column descriptions and data packaged it.

Data Cleaning:

  • The last row of the CSV is dropped as it might contain summary or unwanted data.
  • Only numerical columns are selected for further processing.

Imputation:

  • Missing values in the numerical columns are imputed using K-Nearest Neighbors (KNN) with n_neighbors=5. This method takes the missing value, evaluates all of the other variables aside from it and then approximates the missing value based on 5 nearest matches from other regions.

Normalization:

  • Numerical data is scaled using MinMaxScaler to bring all values into the range [0, 1].

Negative Indicators:

  • Several columns representing negative indicators are adjusted so that higher values represent worse conditions. Namely, the following parameters have been concluded to negatively effect a region's rating:

  • air_pollution_index: A metric indicating the level of air pollution

  • crime_amount: Total number of crimes

  • household_spending_per_month: Average household spending per month

  • unemployment-rate: Unemployment rate in the city

  • car-amount-rate: Number of cars per 1000 inhabitants

Positive Indicators:

  • The rest of the columns that represent positive indicators are as follows:

  • Region: The name of the region or city.

  • avg_salary: Average salary of the residents in the region.

  • Population: Total population of the region.

  • gdp_per_capita: Gross Domestic Product (GDP) per capita in Tenge.

  • Postgraduate_Education: Proportion of the population with postgraduate education.

  • Higher_Education: Proportion of the population with higher education.

  • life_expectancy: Average life expectancy of the residents.

  • Med-institution-rate: Number of medical institutions per 10000 inhabitants.

  • School-number-rate: Number of schools per 10000 inhabitants in the region.

  • public_transport_quantity: Quantity of public transportation per 10000 inhabitants.

  • Entertainment-places-rate: Number of entertainment venues per 10000 inhabitants.

  • clearance_rate: Rate at which crimes are solved divided by the amount of crimes

  • Total_Score: An overall score or index derived from various metrics to assess the region's quality of life.

Aggregated Sustainability Indicators

Summary of Normalized Indicators: normalized_indicators Correlation Matrix: correlation

Sustainability score

Regions by sustainability score for equally assigned weight: equal_weight

Regions by sustainability score for Environmental importance weighting: environmental

Regions with top 5(green) and bottom 5(red) sustainability score for equally assigned weight. equal_map

Regions with top 5 (green) and bottom 5(red) sustainability score for Environmental importance weighting. environmental_map

Clustering:

  • KMeans clustering is performed on the normalized data (excluding the region and total score columns) to categorize cities into clusters based on their overall characteristics.
  • PCA (Principal Component Analysis) is used to reduce the dimensions of the data for visualization purposes. This helps to visualize the clusters in a 2D space.
  • Three clusters are used, each representing different levels of life quality: average, higher, and lower.

Visualization:

  • First, a horizontal bar plot is created to display the scores of each city:

image

  • Then, the 'Total score' column is scaled again to represent values from 0 to 100, and a second graph is made for better clarity.

image

  • After that, a PCA plot is created to visualize the clusters formed by KMeans. Each city is represented in a 2D space, colored according to its cluster:

image

  • Finally, a bar chart is created where each city's bar is colored according to its cluster, providing a visual representation of both the total score and the cluster classification:

image

Results

The script outputs a DataFrame showing the regions and their total scores, sorted in descending order. A bar plot is also generated to visually represent the rankings.

License

This dataset is licensed under the Open Data Commons Public Domain and Dedication License.

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud