Skip to content

Fixes #29257: prevent endless pending-migration loop on latin1 databases#29258

Open
govardhankarne1 wants to merge 1 commit into
open-metadata:mainfrom
govardhankarne1:fix/migration-tracking-tables-utf8mb4
Open

Fixes #29257: prevent endless pending-migration loop on latin1 databases#29258
govardhankarne1 wants to merge 1 commit into
open-metadata:mainfrom
govardhankarne1:fix/migration-tracking-tables-utf8mb4

Conversation

@govardhankarne1

@govardhankarne1 govardhankarne1 commented Jun 21, 2026

Copy link
Copy Markdown

Describe your changes:

Fixes #29257

On a MySQL database whose default charset is latin1, the migration tracking tables SERVER_CHANGE_LOG / SERVER_MIGRATION_SQL_LOGS are created (flyway v000) without an explicit charset and inherit latin1. A 1.13.0 migration statement carries a non-latin1 character (U+2194) in its leading comment; Flyway keeps the comment in the statement text, latin1 can't store it, and the INSERT that records the statement into SERVER_MIGRATION_SQL_LOGS fails. That failure was swallowed silently, so the statement was never recorded and the server reported pending migrations on every boot — an endless loop that migrate --force did not fix.

Changes:

  • MigrationWorkflow: before running migrations on MySQL, normalize the two tracking tables to utf8mb4 (no-op if already utf8mb4; no-op on PostgreSQL). Self-heals existing latin1 installs.
  • flyway v000__create_server_change_log.sql: pin DEFAULT CHARSET=utf8mb4 so fresh installs create the tables correctly.
  • MigrationProcessImpl: log a warning instead of silently swallowing a failed statement-recording insert, so this is diagnosable instead of invisible.

Testing:

  • MigrationTrackingTableCharsetTest: charset decision (latin1/utf8mb3 -> convert; utf8mb4/unknown -> skip).
  • MigrationProcessImplTest.warnsButDoesNotAbortWhenRecordingMigrationStatementFails: a recording failure is logged at WARN and does not abort the migration.

Related to #28255 (same MySQL-charset-on-upgrade theme; this PR covers the migration-tracking tables and the silent recording failure rather than openmetadata_db/FK charset).

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added a test that covers the exact scenario we are fixing.

Greptile Summary

This PR fixes an endless pending-migration loop on MySQL databases whose default charset is latin1. The tracking tables (SERVER_CHANGE_LOG / SERVER_MIGRATION_SQL_LOGS) previously inherited latin1, which cannot store non-latin1 characters found in certain migration statement comments; the resulting failed INSERT was swallowed silently, leaving migrations perpetually re-detected as pending.

  • v000__create_server_change_log.sql: pins both tracking tables to ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci, consistent with every other MySQL DDL in the repo, fixing the issue for fresh installs.
  • MigrationWorkflow: adds ensureMigrationTrackingTablesUseUtf8mb4() called at the top of runMigrationWorkflows, which detects and converts existing latin1/utf8mb3 tracking tables to utf8mb4 using information_schema (no-op on PostgreSQL and already-utf8mb4 tables).
  • MigrationProcessImpl: replaces the silent catch block on upsertServerMigrationSQL failures with a LOG.warn that includes version, checksum, and the exception, making previously invisible recording failures diagnosable in logs.

Confidence Score: 4/5

The change is well-scoped and directly addresses the described bug; the self-healing ALTER TABLE is safe to run on existing installs, the DDL change is consistent with the rest of the codebase, and both new code paths have unit test coverage.

The charset-detection and conversion path in MigrationWorkflow has no exception handling: any unexpected failure — a transient DB issue, a permission gap, or an ALTER TABLE refusal — propagates out of runMigrationWorkflows and stops all migrations from executing. This could make an upgrade worse on some installs than the original bug it fixes. Everything else in the PR is clean.

openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java — specifically the ensureMigrationTrackingTablesUseUtf8mb4 method and its lack of exception handling.

Important Files Changed

Filename Overview
bootstrap/sql/migrations/flyway/com.mysql.cj.jdbc.Driver/v000__create_server_change_log.sql Pins both tracking tables to ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci, consistent with every other MySQL DDL file in the repo; fixes fresh-install latin1 inheritance. Minor: file still lacks a trailing newline.
openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java Adds ensureMigrationTrackingTablesUseUtf8mb4() called at the top of runMigrationWorkflows; self-heals latin1/utf8mb3 tables via ALTER TABLE. No exception handling around the charset check means any unexpected failure aborts all migrations.
openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationProcessImpl.java Replaces silent swallowing of upsertServerMigrationSQL failures with a LOG.warn that includes the version, checksum, and exception, making the previously invisible problem diagnosable.
openmetadata-service/src/test/java/org/openmetadata/service/migration/api/MigrationTrackingTableCharsetTest.java New unit tests covering needsUtf8mb4Conversion for latin1, utf8mb3, utf8mb4, and null inputs. All four cases are tested correctly.
openmetadata-service/src/test/java/org/openmetadata/service/migration/api/MigrationProcessImplTest.java Adds warnsButDoesNotAbortWhenRecordingMigrationStatementFails which verifies that a upsertServerMigrationSQL exception results in a WARN log and the statement is still marked SUCCESS, not aborted.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[runMigrationWorkflows] --> B{connectionType == MYSQL?}
    B -- No --> D[Skip charset check]
    B -- Yes --> C[Open JDBI Handle]
    C --> E[For each tracking table]
    E --> F[getTableCharset via information_schema]
    F --> G{charset == null?}
    G -- Yes: table doesn't exist --> H[Skip — no-op]
    G -- No --> I{startsWith 'utf8mb4'?}
    I -- Yes: already correct --> J[Skip — no-op]
    I -- No: latin1 or utf8mb3 --> K[ALTER TABLE CONVERT TO CHARACTER SET utf8mb4]
    K --> L{Exception?}
    L -- Yes --> M[Propagates — aborts all migrations]
    L -- No --> N[Charset fixed]
    D --> O[Run migrations]
    H --> O
    J --> O
    N --> O
    O --> P[performSqlExecutionAndUpdate]
    P --> Q[handle.execute SQL]
    Q --> R[upsertServerMigrationSQL]
    R --> S{Exception?}
    S -- Yes --> T[LOG.warn + continue]
    S -- No --> U[QueryStatus SUCCESS]
    T --> U
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[runMigrationWorkflows] --> B{connectionType == MYSQL?}
    B -- No --> D[Skip charset check]
    B -- Yes --> C[Open JDBI Handle]
    C --> E[For each tracking table]
    E --> F[getTableCharset via information_schema]
    F --> G{charset == null?}
    G -- Yes: table doesn't exist --> H[Skip — no-op]
    G -- No --> I{startsWith 'utf8mb4'?}
    I -- Yes: already correct --> J[Skip — no-op]
    I -- No: latin1 or utf8mb3 --> K[ALTER TABLE CONVERT TO CHARACTER SET utf8mb4]
    K --> L{Exception?}
    L -- Yes --> M[Propagates — aborts all migrations]
    L -- No --> N[Charset fixed]
    D --> O[Run migrations]
    H --> O
    J --> O
    N --> O
    O --> P[performSqlExecutionAndUpdate]
    P --> Q[handle.execute SQL]
    Q --> R[upsertServerMigrationSQL]
    R --> S{Exception?}
    S -- Yes --> T[LOG.warn + continue]
    S -- No --> U[QueryStatus SUCCESS]
    T --> U
Loading

Reviews (1): Last reviewed commit: "Fixes #29257: prevent endless pending-mi..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

…latin1 databases

On a MySQL database whose default charset is latin1, the migration tracking
tables SERVER_CHANGE_LOG / SERVER_MIGRATION_SQL_LOGS are created without an
explicit charset and inherit latin1. A 1.13.0 migration statement carries a
non-latin1 character in its leading comment; Flyway keeps the comment in the
statement text, latin1 cannot store it, and the INSERT that records the
statement into SERVER_MIGRATION_SQL_LOGS fails. That failure was swallowed
silently, so the statement was never recorded and the server reported pending
migrations on every boot - an endless loop that migrate --force did not fix.

- MigrationWorkflow: normalize the two tracking tables to utf8mb4 on MySQL
  before running migrations (no-op when already utf8mb4; no-op on PostgreSQL).
- flyway v000__create_server_change_log.sql: pin DEFAULT CHARSET=utf8mb4 so
  fresh installs create the tables correctly.
- MigrationProcessImpl: log a warning instead of silently swallowing a failed
  statement-recording insert, so the cause is diagnosable.
- Tests for the charset decision and for the non-silent recording failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment on lines +425 to +439
private void ensureMigrationTrackingTablesUseUtf8mb4() {
if (connectionType == ConnectionType.MYSQL) {
try (Handle handle = jdbi.open()) {
for (final String tableName : MIGRATION_TRACKING_TABLES) {
convertTableToUtf8mb4IfNeeded(handle, tableName);
}
}
}
}

private void convertTableToUtf8mb4IfNeeded(final Handle handle, final String tableName) {
final String currentCharset = getTableCharset(handle, tableName);
if (needsUtf8mb4Conversion(currentCharset)) {
LOG.info(
"Converting migration table {} from charset '{}' to utf8mb4", tableName, currentCharset);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: Charset conversion failure aborts all migrations at startup

ensureMigrationTrackingTablesUseUtf8mb4() is the first statement in runMigrationWorkflows(), and neither it nor convertTableToUtf8mb4IfNeeded() wraps the ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4 in a try/catch. If the ALTER throws (the most realistic case is SERVER_MIGRATION_SQL_LOGS, whose PRIMARY KEY checksum VARCHAR(256) becomes 256*4=1024 bytes after conversion to utf8mb4 — this exceeds the 767-byte index limit on tables stored with the legacy COMPACT/REDUNDANT row format that pre-8.0 latin1 installs may still carry), the exception propagates and aborts the entire migration workflow before any migration runs. That is a regression versus the current behavior, where a latin1 install at least boots (looping on pending migrations). Since the explicit goal of this method is to self-heal existing installs, an unexpected conversion failure should be logged and the workflow should continue rather than hard-failing startup.

Catch and log conversion failures so a self-heal attempt cannot block server startup.:

private void convertTableToUtf8mb4IfNeeded(final Handle handle, final String tableName) {
  final String currentCharset = getTableCharset(handle, tableName);
  if (needsUtf8mb4Conversion(currentCharset)) {
    LOG.info(
        "Converting migration table {} from charset '{}' to utf8mb4", tableName, currentCharset);
    try {
      handle.execute(String.format("ALTER TABLE %s CONVERT TO CHARACTER SET utf8mb4", tableName));
    } catch (RuntimeException conversionFailure) {
      LOG.warn(
          "Failed to convert migration table {} to utf8mb4; migrations may still fail to record",
          tableName, conversionFailure);
    }
  }
}
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

Comment on lines 132 to +143
migrationDAO.upsertServerMigrationSQL(version, sql, hash(sql));
} catch (Exception logException) {
// If logging fails (table doesn't exist yet), continue - the SQL was executed
// successfully
// SERVER_MIGRATION_SQL_LOGS may legitimately not exist yet on the very first
// migration. In any other case a failure here (e.g. a table charset that cannot
// store the statement text) leaves the migration unrecorded, so the server keeps
// reporting it as pending on every boot. Surface it instead of failing silently.
LOG.warn(
"Failed to record migration statement for version {} in SERVER_MIGRATION_SQL_LOGS"
+ " (checksum {}); it may be re-detected as pending",
version,
hash(sql),
logException);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: hash(sql) recomputed in the recording-failure log path

In the catch block, hash(sql) is computed a second time (the first computation was passed to upsertServerMigrationSQL) just to include the checksum in the WARN message. hash() is a digest computation; recomputing it on the error path is wasteful and risks drift if the hashing logic ever changes. Compute the hash once before the try and reuse the local variable in both the upsert call and the log statement.

Fix:

String checksum = hash(sql);
try {
  migrationDAO.upsertServerMigrationSQL(version, sql, checksum);
} catch (Exception logException) {
  LOG.warn(
      "Failed to record migration statement for version {} in SERVER_MIGRATION_SQL_LOGS"
          + " (checksum {}); it may be re-detected as pending",
      version,
      checksum,
      logException);
}
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

@gitar-bot

gitar-bot Bot commented Jun 21, 2026

Copy link
Copy Markdown
Code Review ⚠️ Changes requested 0 resolved / 2 findings

Adds MySQL migration tracking table charset normalization and improved error logging for statement recording, but the current implementation causes migrations to abort if the conversion fails and redundantly recomputes the SQL hash.

⚠️ Edge Case: Charset conversion failure aborts all migrations at startup

📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java:425-439 📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java:469-470

ensureMigrationTrackingTablesUseUtf8mb4() is the first statement in runMigrationWorkflows(), and neither it nor convertTableToUtf8mb4IfNeeded() wraps the ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4 in a try/catch. If the ALTER throws (the most realistic case is SERVER_MIGRATION_SQL_LOGS, whose PRIMARY KEY checksum VARCHAR(256) becomes 256*4=1024 bytes after conversion to utf8mb4 — this exceeds the 767-byte index limit on tables stored with the legacy COMPACT/REDUNDANT row format that pre-8.0 latin1 installs may still carry), the exception propagates and aborts the entire migration workflow before any migration runs. That is a regression versus the current behavior, where a latin1 install at least boots (looping on pending migrations). Since the explicit goal of this method is to self-heal existing installs, an unexpected conversion failure should be logged and the workflow should continue rather than hard-failing startup.

Catch and log conversion failures so a self-heal attempt cannot block server startup.
private void convertTableToUtf8mb4IfNeeded(final Handle handle, final String tableName) {
  final String currentCharset = getTableCharset(handle, tableName);
  if (needsUtf8mb4Conversion(currentCharset)) {
    LOG.info(
        "Converting migration table {} from charset '{}' to utf8mb4", tableName, currentCharset);
    try {
      handle.execute(String.format("ALTER TABLE %s CONVERT TO CHARACTER SET utf8mb4", tableName));
    } catch (RuntimeException conversionFailure) {
      LOG.warn(
          "Failed to convert migration table {} to utf8mb4; migrations may still fail to record",
          tableName, conversionFailure);
    }
  }
}
💡 Quality: hash(sql) recomputed in the recording-failure log path

📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationProcessImpl.java:132-143

In the catch block, hash(sql) is computed a second time (the first computation was passed to upsertServerMigrationSQL) just to include the checksum in the WARN message. hash() is a digest computation; recomputing it on the error path is wasteful and risks drift if the hashing logic ever changes. Compute the hash once before the try and reuse the local variable in both the upsert call and the log statement.

Fix
String checksum = hash(sql);
try {
  migrationDAO.upsertServerMigrationSQL(version, sql, checksum);
} catch (Exception logException) {
  LOG.warn(
      "Failed to record migration statement for version {} in SERVER_MIGRATION_SQL_LOGS"
          + " (checksum {}); it may be re-detected as pending",
      version,
      checksum,
      logException);
}
🤖 Prompt for agents
Code Review: Adds MySQL migration tracking table charset normalization and improved error logging for statement recording, but the current implementation causes migrations to abort if the conversion fails and redundantly recomputes the SQL hash.

1. ⚠️ Edge Case: Charset conversion failure aborts all migrations at startup
   Files: openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java:425-439, openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java:469-470

   `ensureMigrationTrackingTablesUseUtf8mb4()` is the first statement in `runMigrationWorkflows()`, and neither it nor `convertTableToUtf8mb4IfNeeded()` wraps the `ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4` in a try/catch. If the ALTER throws (the most realistic case is `SERVER_MIGRATION_SQL_LOGS`, whose PRIMARY KEY `checksum VARCHAR(256)` becomes 256*4=1024 bytes after conversion to utf8mb4 — this exceeds the 767-byte index limit on tables stored with the legacy COMPACT/REDUNDANT row format that pre-8.0 latin1 installs may still carry), the exception propagates and aborts the entire migration workflow before any migration runs. That is a regression versus the current behavior, where a latin1 install at least boots (looping on pending migrations). Since the explicit goal of this method is to self-heal existing installs, an unexpected conversion failure should be logged and the workflow should continue rather than hard-failing startup.

   Fix (Catch and log conversion failures so a self-heal attempt cannot block server startup.):
   private void convertTableToUtf8mb4IfNeeded(final Handle handle, final String tableName) {
     final String currentCharset = getTableCharset(handle, tableName);
     if (needsUtf8mb4Conversion(currentCharset)) {
       LOG.info(
           "Converting migration table {} from charset '{}' to utf8mb4", tableName, currentCharset);
       try {
         handle.execute(String.format("ALTER TABLE %s CONVERT TO CHARACTER SET utf8mb4", tableName));
       } catch (RuntimeException conversionFailure) {
         LOG.warn(
             "Failed to convert migration table {} to utf8mb4; migrations may still fail to record",
             tableName, conversionFailure);
       }
     }
   }

2. 💡 Quality: hash(sql) recomputed in the recording-failure log path
   Files: openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationProcessImpl.java:132-143

   In the catch block, `hash(sql)` is computed a second time (the first computation was passed to `upsertServerMigrationSQL`) just to include the checksum in the WARN message. `hash()` is a digest computation; recomputing it on the error path is wasteful and risks drift if the hashing logic ever changes. Compute the hash once before the try and reuse the local variable in both the upsert call and the log statement.

   Fix:
   String checksum = hash(sql);
   try {
     migrationDAO.upsertServerMigrationSQL(version, sql, checksum);
   } catch (Exception logException) {
     LOG.warn(
         "Failed to record migration statement for version {} in SERVER_MIGRATION_SQL_LOGS"
             + " (checksum {}); it may be re-detected as pending",
         version,
         checksum,
         logException);
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Comment on lines +425 to +433
private void ensureMigrationTrackingTablesUseUtf8mb4() {
if (connectionType == ConnectionType.MYSQL) {
try (Handle handle = jdbi.open()) {
for (final String tableName : MIGRATION_TRACKING_TABLES) {
convertTableToUtf8mb4IfNeeded(handle, tableName);
}
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unhandled exception aborts all migrations on charset-check failure. ensureMigrationTrackingTablesUseUtf8mb4 has no try-catch, so any unexpected failure — e.g., a transient DB connectivity hiccup, a MySQL user that lacks SELECT on information_schema, or an ALTER TABLE the DB refuses — propagates directly out of runMigrationWorkflows and stops every migration from running. Given that this check is a self-healing guard rail, logging the error and continuing (or wrapping it so only a human-readable message surfaces) might be more resilient. If keeping it as a hard failure, a descriptive exception message would help operators diagnose the abort without reading through stack traces.

checksum VARCHAR(256) PRIMARY KEY,
executedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
); No newline at end of file
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The file is missing a trailing newline on the last line. This is inconsistent with standard POSIX text-file convention and causes noisy git diffs. Adding a newline after the final ; would align with the rest of the repo's SQL files.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to 1.13.0 loops on "There are pending migrations" when the database charset is latin1

1 participant