-
Notifications
You must be signed in to change notification settings - Fork 301
MachineMetadata constarints with %i interpolation doesn't work. #1446
Comments
It seems like it is only affecting one of our clusters and it is possible that cluster may have been running another fork of fleet that suffers from this issue. Rebooting the machines seems to fixed it. I'm closing this for now until I see this happening again. |
I'm reopening this bug because we updated our fleet to use master branch HEAD and we started seeing this problem again. |
@jonboulle We think something changed between |
Confirmed. $ fleetctl list-machines
MACHINE IP METADATA
1e1a8655... coreos3 hostname=coreos3
74dce006... coreos2 hostname=coreos2
cd29d103... coreos1 hostname=coreos1 template: [Service]
ExecStart=/bin/bash -c "while true; do echo Hello, World %i!; sleep 1; done"
[X-Fleet]
MachineMetadata="hostname=%i" $ fleetctl start hello@random.service v0.11.5 just hangs when it can not find corresponding machine. |
Can you try with this instead?
IIRC, the way the code parses the MachineMetadata is special when there is only one condition and when there are more than one. Also, doesn't |
@daniellowtw what do you mean? I use only one condition. |
I @daniellowtw was considering the case of AND vs OR in: https://github.com/coreos/fleet/blob/master/Documentation/unit-files-and-scheduling.md#schedule-unit-to-machine-with-specific-metadata |
@kayrus https://github.com/coreos/fleet/blob/master/job/job_test.go#L323
when there is one condition and
when there's more than one. I think it doesn't parse it if there are quotes when there is only one required metadata |
I've tried to cherry-pick this to the v0.11.5 tag and reproduced the issue. Will try to investigate what is wrong. |
Looks like this loop doesn't recognize whether unit is a template or not and returns |
Weird. When you submit templated unit, etcdctl shows that two instances were created:
|
Closed by #1520 |
We're using the
%i
interpolation for dynamic scheduling. Our instance metadata looks as follows:When we take a unit like:
And we submit two copies of it:
fleetctl start echo@eu1-staging-es-1.service echo@eu1-staging-es-2.service
We'd expect the
echo@eu1-staging-es-1.service
to be scheduled ond00b1eae
andecho@eu1-staging-es-2.service
to be scheduled onc8648469
.However, what we're seeing is:
This happens rarely, but it does happen. We couldn't reproduce it when the constraint
instancename=%i
was explicit"instancename=eu1-staging-es-2"
.It's pretty puzzling, since we know that the only place that deals with Units of Jobs is the
jobs.go
and that's used by the engine and agents. The interpolation is done within it: https://github.com/coreos/fleet/blob/master/job/job.go#L179Any ideas what this could be caused by?
The text was updated successfully, but these errors were encountered: