Localisation

Misata generates country-accurate data automatically — names, salary distributions, national ID formats, currencies, postcodes, and company suffixes — from a geographic signal in your story.

Automatic detection

import misata

# Locale detected from story — no extra flag needed
tables = misata.generate("German SaaS company in Berlin with 2k enterprise customers")
# → de_DE names, salary ~ lognormal median €45k, 5-digit postcodes, GmbH/AG suffixes

tables = misata.generate("Brazilian fintech with R$ payments and CPF verification")
# → pt_BR names, salary median ~R$33.6k, national IDs in CPF format ###.###.###-##

tables = misata.generate("Indian startup in Bangalore with ₹ salary bands and Aadhaar KYC")
# → hi_IN names, salary median ~₹350k/yr, Aadhaar 12-digit national IDs

Explicit locale

# Override detected locale
tables = misata.generate("Ecommerce store with 10k orders", locale="ja_JP")

# Or via CLI
# misata generate --story "Ecommerce store" --locale ja_JP

15 built-in locales

LocaleCountryCurrencySalary medianNational ID
en_USUnited StatesUSD / $$62 000SSN ###-##-####
en_GBUnited KingdomGBP / ££34 000NIN AA######A
de_DEGermanyEUR / €€45 000Steuer-IdNr
fr_FRFranceEUR / €€38 000NIR
pt_BRBrazilBRL / R$R$33 600CPF ###.###.###-##
es_ESSpainEUR / €€27 000NIE
hi_INIndiaINR / ₹₹350 000Aadhaar ####-####-####
ja_JPJapanJPY / ¥¥4 400 000My Number
zh_CNChinaCNY / ¥¥90 000Resident ID
ar_SASaudi ArabiaSARSAR 96 000National ID
ko_KRSouth KoreaKRW / ₩₩42 000 000RRN
nl_NLNetherlandsEUR / €€42 000BSN
it_ITItalyEUR / €€29 000Codice Fiscale
pl_PLPolandPLNPLN 72 000PESEL
tr_TRTurkeyTRYTRY 720 000TC Kimlik

Salary data sourced from OECD, World Bank, ILO (2023–24).

Inspect a locale pack

pack = misata.get_locale_pack("de_DE")

print(pack.salary_median)        # 45000
print(pack.currency_symbol)      # €
print(pack.top_cities[:3])       # ['Berlin', 'Hamburg', 'Munich']
print(pack.company_suffixes)     # ['GmbH', 'AG', 'UG', 'KG', 'e.K.']
print(pack.postcode_pattern)     # \d{5}
print(pack.national_id_label)    # Steuer-IdNr

Detect from a story

locale = misata.detect_locale("South Korean company in Seoul with KRW salaries")
# → "ko_KR"

locale = misata.detect_locale("A generic SaaS company")
# → "en_US"  (default)

What locale affects

  • Names — Faker locale pool (de_DE Faker generates German names, ja_JP generates Japanese names)
  • Salary & age distributions — lognormal/normal priors from national statistics replace the en_US defaults
  • Postcodes — pattern-generated to match the country format (e.g. 5 digits for DE, SW1A 1AA format for GB)
  • National IDs — pattern-generated to match country format (CPF, SSN, Aadhaar, etc.)
  • Company suffixes — GmbH/AG for Germany, S.A./SARL for France, Ltd/PLC for UK
  • Phone prefixes — country dialling code prepended

Asset-backed vocabulary takes priority

If you have ingested Kaggle vocabulary assets for name columns, those always win over locale-based Faker names. Locale is the fallback, not the override.