licence-normaliser
Robust licence normalisation with a three-level hierarchy for common licences.
licence-normaliser maps common licence representations (SPDX tokens,
URLs, prose descriptions) to a canonical three-level hierarchy.
Features
Three-level hierarchy - LicenceFamily → LicenceName → LicenceVersion.
Wide format support - SPDX tokens, URLs, and prose descriptions for supported licences.
Creative Commons support - Full CC family with versions and IGO variants.
Publisher-specific licences - Springer, Nature, Elsevier, Wiley, ACS, and more.
File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.
Pluggable parsers - Drop in a new parser class to ingest any external licence registry. Parsers implement plugin interfaces (
RegistryPlugin,URLPlugin, etc.).Strict mode - Raise
LicenceNotFoundErrorinstead of silently returning"unknown".Caching - LRU caching for performance.
CLI - Command-line interface with
--strictand--tracesupport.
Hierarchy
The library uses a three-level hierarchy:
LicenceFamily - broad bucket:
"cc","osi","copyleft","publisher-tdm", …LicenceName - version-free:
"cc-by","cc-by-nc-nd","mit","wiley-tdm"LicenceVersion - fully resolved:
"cc-by-3.0","cc-by-nc-nd-4.0"
LicenceVersion also has optional jurisdiction (e.g., "uk",
"au") and scope (e.g., "igo") fields for CC licences.
Installation
With uv:
uv pip install licence-normaliser
Or with pip:
pip install licence-normaliser
Quick start
from licence_normaliser import normalise_licence
v = normalise_licence("CC BY-NC-ND 4.0")
assert str(v) == "cc-by-nc-nd-4.0" # ← LicenceVersion
assert str(v.licence) == "cc-by-nc-nd" # ← LicenceName
assert str(v.licence.family) == "cc" # ← LicenceFamily
# With jurisdiction and scope
v = normalise_licence("http://creativecommons.org/licenses/by-nc/2.0/uk")
assert v.jurisdiction == "uk"
assert v.scope is None
v = normalise_licence("http://creativecommons.org/licenses/by-nc/3.0/igo")
assert v.jurisdiction is None
assert v.scope == "igo"
Strict mode
By default, unresolvable inputs return an "unknown" result. Pass
strict=True to raise LicenceNotFoundError instead:
from licence_normaliser import normalise_licence
from licence_normaliser.exceptions import LicenceNotFoundError
# Silent fallback (default)
v = normalise_licence("some-unknown-string")
assert v.family.key == "unknown"
# Strict: raises on unresolvable input
try:
v = normalise_licence("some-unknown-string", strict=True)
except LicenceNotFoundError as exc:
print(exc.raw) # original input
print(exc.cleaned) # cleaned form that failed lookup
Trace / Explain
Set ENABLE_LICENCE_NORMALISER_TRACE=1 or pass trace=True to get
resolution traces showing how the licence was matched:
from licence_normaliser import normalise_licence
# Via function
v = normalise_licence("cc by-nc-nd 3.0 igo", trace=True)
print(v.explain())
# Via class
from licence_normaliser import LicenceNormaliser
ln = LicenceNormaliser(trace=True)
v = ln.normalise_licence("MIT")
print(v.explain())
Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched:
Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo'
[✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json)
Result:
version_key: 'cc-by-nc-nd-3.0-igo'
name_key: 'cc-by-nc-nd'
family_key: 'cc'
The trace can also be accessed via v._trace for programmatic use.
Batch normalisation
from licence_normaliser import normalise_licences
results = normalise_licences(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
print(r.key)
# Strict batch - raises on first unresolvable
results = normalise_licences(["MIT", "Apache-2.0"], strict=True)
Custom plugins
The LicenceNormaliser class lets you inject custom plugin classes for
specialised use cases:
from licence_normaliser import LicenceNormaliser
from licence_normaliser.parsers.alias import AliasParser
from licence_normaliser.parsers.spdx import SPDXParser
# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenceNormaliser(
registry=[SPDXParser],
alias=[AliasParser],
family=[AliasParser],
name=[AliasParser],
cache=True,
cache_maxsize=8192,
)
# MIT resolves via SPDX parser
assert str(ln.normalise_licence("MIT")) == "mit"
# CC BY resolves via Alias
assert str(ln.normalise_licence("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"
Note
LicenceNormaliser() automatically loads the full default set of
parsers. To use a reduced set you must explicitly pass all six
plugin lists (registry, url, alias, family, name, prose).
For caching, LicenceNormaliser wraps the resolution method
with lru_cache.
Disable it by passing cache=False for debugging:
from licence_normaliser import LicenceNormaliser
ln = LicenceNormaliser(cache=False)
result = ln.normalise_licence("MIT")
Update data (CLI)
licence-normaliser update-data --force
# Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONs
Integration tests (public API only)
All integration tests live in
src/licence_normaliser/tests/test_integration.py
and only import the public API.
CLI usage
Normalise a single licence:
licence-normaliser normalise "MIT"
# Output: mit
licence-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# Licence: cc-by
# Family: cc
licence-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error
Batch normalise:
licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
licence-normaliser batch --strict MIT "Apache-2.0"
Exceptions
from licence_normaliser.exceptions import (
DataSourceError, # data source loading errors
LicenceNormaliserError, # base class
LicenceNotFoundError, # raised by strict mode
LicenceNormalisationError, # kept for backwards compatibility
)
from licence_normaliser import (
LicenceTrace, # resolution trace object
LicenceTraceStage, # resolution stage enum
)
Testing
All tests run inside Docker:
make test
To test a specific Python version:
make test-env ENV=py312
Licence
MIT
Project documentation
Contents:
- Contributor guidelines
- Security Policy
- Release history and notes
- Package
- Indices and tables
- Project source-tree
- README.rst
- CONTRIBUTING.rst
- AGENTS.md
- conftest.py
- docker-compose.yml
- pyproject.toml
- scripts/README.rst
- scripts/__init__.py
- scripts/apply_aliases_patch.py
- scripts/check_missing_aliases.py
- scripts/compare_datasets.py
- scripts/compare_scancode_categories.py
- scripts/find_alias_duplicates.py
- scripts/migrate_publishers_to_aliases.py
- scripts/migrate_url_map_to_aliases.py
- scripts/sort_aliases.py
- scripts/test_name_inference.py
- src/licence_normaliser/__init__.py
- src/licence_normaliser/_cache.py
- src/licence_normaliser/_core.py
- src/licence_normaliser/_models.py
- src/licence_normaliser/_normaliser.py
- src/licence_normaliser/_trace.py
- src/licence_normaliser/cli/__init__.py
- src/licence_normaliser/cli/_main.py
- src/licence_normaliser/data/README.rst
- src/licence_normaliser/data/aliases/aliases.json
- src/licence_normaliser/data/creativecommons/creativecommons.json
- src/licence_normaliser/data/opendefinition/opendefinition.json
- src/licence_normaliser/data/osi/osi.json
- src/licence_normaliser/data/prose/prose_patterns.json
- src/licence_normaliser/data/scancode_licensedb/scancode_licensedb.json
- src/licence_normaliser/data/spdx/spdx.json
- src/licence_normaliser/defaults.py
- src/licence_normaliser/exceptions.py
- src/licence_normaliser/parsers/__init__.py
- src/licence_normaliser/parsers/alias.py
- src/licence_normaliser/parsers/creativecommons.py
- src/licence_normaliser/parsers/opendefinition.py
- src/licence_normaliser/parsers/osi.py
- src/licence_normaliser/parsers/prose.py
- src/licence_normaliser/parsers/scancode_licensedb.py
- src/licence_normaliser/parsers/spdx.py
- src/licence_normaliser/plugins.py
- src/licence_normaliser/tests/__init__.py
- src/licence_normaliser/tests/conftest.py
- src/licence_normaliser/tests/test_alias_expansion.py
- src/licence_normaliser/tests/test_aliases.py
- src/licence_normaliser/tests/test_cache.py
- src/licence_normaliser/tests/test_cli.py
- src/licence_normaliser/tests/test_core.py
- src/licence_normaliser/tests/test_exceptions.py
- src/licence_normaliser/tests/test_integration.py
- src/licence_normaliser/tests/test_models.py
- src/licence_normaliser/tests/test_prose.py
- src/licence_normaliser/tests/test_trace.py