Support some of the DDLs for schema-based-sharding from any node#8523
Open
onurctirtir wants to merge 36 commits intorelease-14.0from
Open
Support some of the DDLs for schema-based-sharding from any node#8523onurctirtir wants to merge 36 commits intorelease-14.0from
onurctirtir wants to merge 36 commits intorelease-14.0from
Conversation
…AlterDistributedObjectStmtFromCoordinator()
…ate distributed-schema tables - create table - create table as - create table partition of - create table partitioned by
…r alter table set schema
… table DDLs from any node
…safe_triggers for distributed-schema tables
Even before, run_command_on_master_and_workers_temp() UDF that we were using to create those views was silently failing when creating the views on workers. This is because, "EXECUTE p_sql;" on the coordinator was already propagating the command to the workers. However, we were not checking "PERFORM run_command_on_workers(p_sql);" for errors, so any failure there was ignored.
.. not when we stop syncing metadata to it.
Resetting local group id means setting it 0, which is same as setting it
to COORDINATOR_GROUP_ID. But when we stop syncing metadata to a worker, we
don't want it to assume it's the coordinator until we actually remove it
as it's still part of this cluster and can be re-added later. This mostly
doesn't cause an issue because we distinguish between the commands to be
sent to worker nodes with metadata vs without metadata, so we mostly don't
send a command to a worker without metadata if the commands has to check,
e.g., if the node is coordinator or not.
However, it seems that in some cases we don't make such a distinction.
For example, we create sequences that a table depends on due to having a
serial column or using a column that defaults to nextval('my_sequence')
on all remote nodes, including the ones that we marked as
"hasmetadata = false", when creating prerequisite objects using
PropagatePrerequisiteObjectsForDistributedTable() before creating shards.
This actually doesn't make much sense because a non-MX worker doesn't need
sequences as we always resolve nextval() calls on coordinator
and MX workers before preparing shard-level queries, so actually we never
need sequences when executing shard-level queries on non-MX workers. And
with next commit, worker_apply_sequence_command() will need to check if
the node it's sent to is a coordinator or not during execution and doing
so would be problematic if we don't reset local group id when stopping
metadata sync to a worker, as such a worker would then assume it's a
coordinator and would try to execute coordinator specific logic in
worker_apply_sequence_command() too.
…buted-schema tables from workers
…buted-schema tables from workers A further refactor should ideally make the rest of the code-base be using SetLocalEnableDDLPropagation() as well. Also, another further refactor should ideally get rid of extra args of AlterSequenceMinMax(). Also, ideally, sequence-processing related parts in PostprocessAlterTableStmt should be moved into a seperate function etc.
We never meant to support this UDF from workers because EnsureReferenceTablesExistOnAllNodesExtended() always assumed that it's run from the coordinator, which might change very soon but still we wouldn't want to support replicate_reference_tables() from workers for scoping purposes.
During the tests where we create distributed-schema tables from a worker node, altering citus.next_shard_id / citus.next_placement_id using run_command_on_coordinator() doesn't guarantee that we'll use the values we set them to, e.g., if the citus internal connection that we use to connect to the coordinator changes. For this reason, let's always directly alter pg_dist_shardid_seq / pg_dist_placement_placementid_seq at the system level by connecting to the coordinator.
…un from a worker Given the recent support added to create distributed-schema tables from any node; in the code-paths that can now acquire a colocation id lock from a worker, make sure to acquire the lock on the coordinator, maybe by using a remote connection that won't be closed until the end of the transaction. EnsureReferenceTablesExistOnAllNodesExtended() is also one of those code-paths, but normally it needs to release the lock at certain points, so it'll be handled in a separate commit differently. The code-paths that are handled in this commit are the ones that don't need to release the lock until the end of the transaction.
…n from a worker Given the support for some operations from any node any node; in the code-paths that can acquire a lock pg_dist_node from a worker, make sure to acquire the lock on the coordinator as well, maybe by using a remote connection that won't be closed until the end of the transaction. EnsureReferenceTablesExistOnAllNodesExtended() is also one of those code-paths but it'll be handled in a separate commit.
…on is run from a worker Given the recent support added to create distributed-schema tables from any node; in the code-paths that can now acquire a placement colocation lock from a worker, make sure to acquire the lock on the coordinator, maybe by using a remote connection that won't be closed until the end of the transaction.
…ing a distributed-schema table from a worker Acquire a transactional advisory lock and release it by commiting the remote transaction, because, in Postgres there is no other way to release a transcational advisory lock in a separate command.
…mand via session-level conn .. during isolation tests. See the code comments.
…node - Add tests for reference table replication as well as a failure test for #6592. - Make sure to test create / drop schema / table for non-super user in schema_based_sharding_from_workers_a.sql. - Add more non-super user tests. - Add tests for the cases when the relation name is not qualified. - Add tests for schema / table names that need proper quoting. - Add more tests for foreign key constraints. - Add isolation tests.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release-14.0 #8523 +/- ##
================================================
+ Coverage 88.72% 88.74% +0.02%
================================================
Files 287 287
Lines 63266 63802 +536
Branches 7931 8009 +78
================================================
+ Hits 56133 56622 +489
- Misses 4864 4877 +13
- Partials 2269 2303 +34 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note that all the commits are reviewed and first merged into https://github.com/citusdata/citus/tree/feature/ddl-from-any-node-schema-based-sharding, from #8461, #8464, #8478, #8486 and #8506, except:
Once this PR passes all mixed-version cluster tests and is merged to "release-14.0", I'll create another PR to merge all the changes to "main" branch as well.
One can also check individual commits / commit messages to better understand the changes.
DESCRIPTION: Support some of the DDLs for schema-based-sharding from any node.
What's supported in general?
Add support for the following from any node in general, regardless of whether
the command targets a distributed-schema / table or not:
Note that we have already been supporting the following:
What's supported for distributed-schemas / tables?
Add support for the following from any node when the command targets a
distributed-schema / table:
foreign keys, partitioning relationship and many other attributes that can be
defined at "create table" time.
into a distributed-schema from any node
relationships between distributed-schema tables within the same schema
Also remove the limitations that were inherited from regular distributed tables
around the usage of triggers with distributed-schema tables, i.e., do not
require citus.enable_unsafe_triggers for distributed-schema tables anymore as
it's the case for Citus managed local tables.
Important changes at the high-level
relevant code-paths to support a DDL from any node.
SendCommandToRemoteNodesWithMetadata() in the code-paths where we need to send
a command to other nodes, e.g., to sync metadata changes when creating a
distributed-schema table from a worker.
fetch the next colocation id, shard id and shard placement id. With this PR,
whenever we need to consume one of those sequences, we fetch the next sequence
value from the coordinator and perform rest of the operations on the current
node. This is because, we've never synced those sequences to workers, so we
consider their coordinator versions as source of truth.
altering distributed-schema tables from workers.
is initiated from a worker.
creating a distributed-schema table from workers.
For the last three points, please also see the next sections for more details.
Significant changes for sequences
In Postgres, a column can be associated with a sequence in three ways:
Such columns can be used in Citus tables in three scenarios:
When Citus tables are created or altered on the coordinator, we enforce the
following:
workers based on node group ids.
Otherwise, workers are effectively prevented from using the sequence by
setting current value to the maximum value.
See AlterSequenceMinMax().
rewritten to worker_nextval() to provide a clearer error when invoked on
workers.
For serial and identity columns, the default expression cannot be rewritten,
but the same sequence range restrictions still apply.
last_value and is_called state are implicitly preserved on the coordinator
after it's distributed.
And we want to preserve the same behavior when distributed-schema tables are
created or altered from workers, hence the code changes introduced in this PR:
default expression that uses nextval(), regardless of the data type.
addition to remote workers.
That way, we can use the appropriate sequence range on the local worker as
well if it's bigint-based, or we can prevent the local worker from using the
sequence otherwise, by setting the current value to the maximum value.
coordinator's sequence.
That way, we can continue using the full range starting from the worker's
current value on the coordinator after distribution, as it's the case when
creating Citus tables from the coordinator.
Finally, the UDFs that we send to other nodes to implement some parts of these
changes now need to know whether it's called on the coordinator or a worker,
namely worker_apply_sequence_command() and
citus_internal.adjust_identity_column_seq_settings(). For these UDFs to properly
check whether it's called on the coordinator or a worker, we always need to have
the node group id to be already assigned.
For coordinator and metadata-synced workers, this is always the case. Note that,
by default, citus_add_node() always syncs metadata to the workers when adding
them into the metadata. However, although not recommended, user can later
execute stop_metadata_sync_to_node() for a worker to stop metadata syncing to
that worker.
In that case, before this PR, we were resetting the node group id to 0 for
workers with metadata sync stopped, where, 0 is assumed to be the coordinator's
node group id almost always in the code-base. So with this PR, we defer
resetting the group id until the worker is actually removed from the metadata.
Ideally, we should not be sending commands to create / alter sequences to the
workers without a metadata because we never use sequences in shard-level queries
and DMLs, but when querying shell tables. However, changing this would require
more work and deferring resetting the node group id until the worker is removed
from the metadata anyways seems like a correct thing to do.
Also, to preserve compatibility in mixed-version clusters, we keep the older
variant of worker_apply_sequence_command(), i.e., the one that takes 2
parameters rather than 4, and worker_adjust_identity_column_seq_ranges(), which
is the ancestor of citus_internal.adjust_identity_column_seq_settings(). And
when creating or altering a distributed table from the coordinator, we keep
sending these older variants to the workers. This is because, the most
significant difference between using the older variants and the newer variants
is that the newer variants also send last_value and is_called state of the
sequence to the node that they're sent to. However, as we don't actually need to
do that when creating or altering a distributed table from the coordinator, and
since using them from the coordinator could cause failures in mixed-version
clusters, e.g., when one of the workers is still on the older version of Citus,
we keep using the older variants when creating or altering a distributed table
from the coordinator.
Significant changes around Citus locks
Now we acquire the following locks on the coordinator when creating a
distributed-schema table from a worker:
This is because, on several operations that we can only initiate from the
coordinator today, such as shard-moves, we rely on those locks to be acquired
on the coordinator if a concurrent operation that could interfere with us was
already initiated.
Note that in the scope of these locks, in code-paths that we're sure that the
operation cannot be initiated from a worker, we don't even check whether we're
on a worker etc., so we always just acquire the lock locally. In other words, we
check "if we need to remotely acquire those locks on the coordinator" only in
the code-paths that can initiate an operation from a worker, e.g., operations
around distributed-schemas / tables that we've added support for initiating from
a worker.
Note that we don't care about "colocation default lock". This is because, even
if it appears, e.g., in CreateCitusTable(), it's never called when creating a
distributed-schema table by definition.
Support implicitly replicating reference tables from workers
We also support for replicating reference tables to all nodes when creating a
distributed-schema table from a worker, when the reference tables are not yet
replicated to all nodes. To do that, as well as acquiring necessary locks on the
coordinator as mentioned above, we make sure to connect to the coordinator
and initiate shard-copy operations from there.
Future improvements
avoid such deadlocks.
objects from any node:
functions that the users might use with distributed-schema tables are not
supported.
creation are not supported.
configs at create table time and altering them later if / as much as
alter table allows.
inherited.
or another distributed-schema is not supported.
undistribute_table() from workers.
citus_schema_undistribute are not supported.