API AccessAccess dataset files directly from scripts, code, or AI agents.
Browse dataset files
Access dataset files directly from scripts, code, or AI agents.
Each file has a stable URL (r-link) that you can use directly in scripts, apps, or AI agents. These URLs are permanent and safe to hardcode.
Start with these files — they give you everything you need to understand and access the dataset.
- 1. Fetch datapackage.json to inspect schema and resources
- 2. Download data resources listed in datapackage.json
- 3. Read README.md for full context
Data Previews
world-cities
Schema
| name | type | description |
|---|---|---|
| name | string | English name of the city |
| country | string | Common name of the country, in english |
| subcountry | string | Name of the major administrative area |
| geonameid | integer | id from geonames |
Data Files
| File | Description | Size | Last modified | Download |
|---|---|---|---|---|
world-cities | 1.3 MB | 6 days ago | world-cities |
| Files | Size | Format | Created | Updated | License | Source |
|---|---|---|---|---|---|---|
| 1 | 1.3 MB | csv | 6 days ago | Geonames |
List of major cities in the world
Data
The data is extracted from geonames, a very exhaustive list of worldwide toponyms.
This datapackage only list cities above 15,000 inhabitants. Each city is associated with its
country and subcountry to reduce the number of ambiguities. Subcountry can be the name of a state (eg in
United Kingdom or the United States of America) or the major administrative section (eg ”region” in France”).
See admin1 field on geonames website for further info about subcountry.
Notice that :
- some cities like Vatican city or Singapore are a whole state so they don't belong to any subcountry. Therefore subcountry is
N/A. - There is no guaranty that a city has a unique name in a country and subcountry (At the time of writing, there are about 60 ambiguities). But for each city,
the source data primary key
geonameidis provided.
Preparation
Preparation
This repository uses dataflows to process and normalize the data.
You first need to install the dependencies:
pip install -r scripts/requirements.txt
Then run the script
python scripts/process.py
License
All data is licensed under the Creative Common Attribution License as is the original data from geonames. This means you have to credit geonames when using the data. And while no credit is formally required a link back or credit to Lexman and the Open Knowledge Foundation is much appreciated.
All source code is licensed under the MIT licence.