feat: add database.dbname setting for PostgreSQL connections#1426
feat: add database.dbname setting for PostgreSQL connections#1426kushalbakshi wants to merge 3 commits intodatajoint:masterfrom
Conversation
Add database.dbname config option (env: DJ_DBNAME) to specify which PostgreSQL database to connect to. Defaults to 'postgres' if not set (existing behavior preserved). Required where the primary database has a non-default name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nit__, add tests - Extract duplicated connect kwargs construction into _build_connect_kwargs() - Add dbname as explicit keyword argument to Connection.__init__() for programmatic use (explicit arg overrides config value) - Add 5 unit tests for dbname settings (default, env var, config file, dict access, override context manager) - Bump version to 2.2.1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| # label_prs.yaml(prep), release.yaml(bump), post_release.yaml(edit) | ||
| # manually set this version will be eventually overwritten by the above actions | ||
| __version__ = "2.2.0" | ||
| __version__ = "2.2.1" |
There was a problem hiding this comment.
This is done automatically as part of the PyPI publishing action.
dimitri-yatsenko
left a comment
There was a problem hiding this comment.
Good change overall — the _build_connect_kwargs() refactor is clean and the config plumbing is solid. A few observations:
database_prefix and dbname serve the same purpose
Both settings isolate groups of schemas:
database_prefix(existing) — namespaces schema names on MySQL, which has a flat namespacedbname(this PR) — selects a different PostgreSQL database, which is a native isolation boundary
They're two backends' answers to the same question. Worth considering whether these should be unified:
- One config setting:
database_prefix(already exists) - MySQL behavior: prepend prefix to schema names (as today)
- PostgreSQL behavior: use the prefix as the
dbnameparameter — each prefix maps to a separate PostgreSQL database
This way a single dj.config["database.database_prefix"] = "lab_a_" would isolate schemas on both backends — via naming on MySQL, via separate databases on PostgreSQL.
If they should remain separate (there are cases where you'd want both a dbname and a prefix on PostgreSQL), the docs should clarify when to use which.
Note: database_prefix is currently defined in settings but never referenced anywhere else in the codebase — it's a config slot users read manually. This PR is a good opportunity to think about the relationship between the two.
Minor items
__repr__doesn't showdbname— the format string only usesuser,host,port. Worth includingdbnamewhen set, so users can tell which database they're connected to.- MySQL silently ignores
dbname— the MySQL adapter accepts**kwargsso it'll receive and discarddbname. The config description says "for PostgreSQL connections" — should it warn if set with MySQL backend?
dimitri-yatsenko
left a comment
There was a problem hiding this comment.
Follow-up from offline discussion:
Rename dbname → name
The setting should be database.name (env var DJ_DATABASE_NAME) to avoid stutter with database.database and stay consistent with the section's naming style:
dj.config["database.name"] = "my_lab"
# or
# DJ_DATABASE_NAME=my_labThe Connection.__init__ kwarg and adapter parameter should also use database_name rather than dbname.
Deprecate database_prefix
database_prefix was a workaround for MySQL's flat schema namespace. With PostgreSQL's native database isolation, it's no longer needed going forward:
- 2.2: non-empty
database_prefixemits a deprecation warning - 2.3: non-empty
database_prefixraises an error
This PR should introduce database.name as the forward-looking setting and add the deprecation warning for database_prefix.
dimitri-yatsenko
left a comment
There was a problem hiding this comment.
based on our discussion, rename database.dbname to database.name.
dimitri-yatsenko
left a comment
There was a problem hiding this comment.
Version bump should be removed from this PR
The version.py change to 2.2.1 should be reverted. The release workflow handles version bumps automatically:
draft_release.yaml— creates a draft release via release-drafterpost_draft_release_published.yaml— when the draft is published, it extracts the version from the release name, updatesversion.py, builds, publishes to PyPI, and creates a PR back to master with the bump
If PRs bump version.py themselves, it creates conflicts: multiple PRs may target different patch versions, or a different version may be chosen at release time. Leave version.py at 2.2.0 and let the release workflow own the version number.
dimitri-yatsenko
left a comment
There was a problem hiding this comment.
MySQL should warn/error when database_name is set
Currently if a user sets dj.config["database.name"] with the MySQL backend, the value gets passed as a kwarg to the MySQL adapter's connect() method, absorbed by **kwargs, and silently ignored. No error, no warning — the user thinks they're connecting to a specific database but they're not.
The PR should handle this — either:
- Raise a warning in
Connection.__init__ifdatabase_nameis set with the MySQL backend - Or have the MySQL adapter explicitly check for and reject the
dbnamekwarg
Summary
database.dbnameconfig option (env var:DJ_DBNAME) to specify which PostgreSQL database to connect to_build_connect_kwargs()helper to eliminate duplicated connection parameter constructiondbnamekeyword argument toConnection.__init__()for programmatic useMotivation
The PostgreSQL adapter's
connect()method already accepts adbnamekeyword argument and defaults to"postgres"when not provided. However, there was no way to pass this value through the config orConnectionlayer — the kwargs passed toadapter.connect()were hardcoded inConnection.connect().This means DataJoint could only connect to a PostgreSQL database named
postgres. Any PostgreSQL deployment where the primary database has a different name (e.g., organizational naming conventions, multi-tenant setups, or hosted PostgreSQL services that use a non-default database name) required monkey-patching to work around the limitation.Changes
settings.pydbname: str | Nonefield toDatabaseSettingswithDJ_DBNAMEenv var aliasNone(preserving existing behavior — adapter defaults to"postgres")connection.pydbnamekeyword-only argument toConnection.__init__()— explicit arg overrides configdatabase.dbnamefrom config when no explicit arg is provideddbnameinconn_infodict (participates in__eq__comparison)_build_connect_kwargs()to eliminate duplicated parameter construction between the primary connect path and the SSL fallback pathdbnametoadapter.connect()only when set (non-None)version.pytests/unit/test_settings.pyTestDbnameConfiguration:NoneDJ_DBNAMEenv varConfiguration
{ "database": { "host": "my-postgres-host.example.com", "port": 5432, "user": "pipeline_user", "password": "...", "backend": "postgresql", "dbname": "my_database" } }Or via environment variable:
export DJ_DBNAME=my_databaseOr programmatically:
When
dbnameis not set, behavior is unchanged — the PostgreSQL adapter defaults to"postgres".Test plan