Content-Length: 363062 | pFad | https://github.com/coreos/fleet/issues/1446

6D MachineMetadata constarints with %i interpolation doesn't work. · Issue #1446 · coreos/fleet · GitHub
Skip to content
This repository was archived by the owner on Jan 30, 2020. It is now read-only.

MachineMetadata constarints with %i interpolation doesn't work. #1446

Closed
daniellowtw opened this issue Feb 29, 2016 · 13 comments
Closed

MachineMetadata constarints with %i interpolation doesn't work. #1446

daniellowtw opened this issue Feb 29, 2016 · 13 comments

Comments

@daniellowtw
Copy link

We're using the %i interpolation for dynamic scheduling. Our instance metadata looks as follows:

fleetctl list-machines | grep es-                                         
94356eb9... 10.200.0.6  image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-0,instancetype=elasticsearch,region=europe-west1-d
c8648469... 10.200.0.12 image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-2,instancetype=elasticsearch,region=europe-west1-d
d00b1eae... 10.200.0.8  image=coreos-beta-899-4-0-v20160121,instancename=eu1-staging-es-1,instancetype=elasticsearch,region=europe-west1-d

When we take a unit like:

[Unit]
Description=echo Service
Description=echo Service
After=etcd-cluster-ready.service docker.service dns-resolv-update.service
Requires=etcd-cluster-ready.service docker.service dns-resolv-update.service

[Service]
Restart=always
RestartSec=60
TimeoutSec=0
ExecStartPre=-/opt/bin/echo "whatever %i"
ExecStart=/opt/bin/echo "whatever %i"
ExecStop=-/opt/bin/echo "whatever %i"

[X-Fleet]
MachineMetadata="instancetype=elasticsearch" "instancename=%i"

And we submit two copies of it:
fleetctl start echo@eu1-staging-es-1.service echo@eu1-staging-es-2.service

We'd expect the echo@eu1-staging-es-1.service to be scheduled on d00b1eae and echo@eu1-staging-es-2.service to be scheduled on c8648469.

However, what we're seeing is:

Unit echo@eu1-staging-es-1.service inactive
Unit echo@eu1-staging-es-2.service inactive
Unit echo@eu1-staging-es-1.service launched on d00b1eae.../10.200.0.8
Unit echo@eu1-staging-es-2.service launched on d00b1eae.../10.200.0.8

This happens rarely, but it does happen. We couldn't reproduce it when the constraint instancename=%i was explicit "instancename=eu1-staging-es-2".

It's pretty puzzling, since we know that the only place that deals with Units of Jobs is the jobs.go and that's used by the engine and agents. The interpolation is done within it: https://github.com/coreos/fleet/blob/master/job/job.go#L179

Any ideas what this could be caused by?

@daniellowtw
Copy link
Author

It seems like it is only affecting one of our clusters and it is possible that cluster may have been running another fork of fleet that suffers from this issue. Rebooting the machines seems to fixed it.

I'm closing this for now until I see this happening again.

@daniellowtw
Copy link
Author

I'm reopening this bug because we updated our fleet to use master branch HEAD and we started seeing this problem again.

@daniellowtw daniellowtw reopened this Mar 17, 2016
@mwitkow
Copy link
Contributor

mwitkow commented Mar 17, 2016

@jonboulle
We've recently rebased our fleetd that we run to be on the HEAD of master and patched it with the monitoring patches (#1415). The branch is here (https://github.com/improbable-io/fleet/tree/imp_release_20160316_INF-metrics).

We think something changed between fleetd version 0.11.5 and HEAD that caused the instancename=%i interpolation not to work in engine scheduling. We've seen similar problems before when we were experimenting with head before (hence the bug).

@ghost
Copy link

ghost commented Mar 18, 2016

We're rolled back to a version branched off of:

3178aed

Merge pull request #1263 from

miekg/followonly-hack

Implement follow_only flag

This version does not have this problem, so this issue has been introduced since then.

@jonboulle jonboulle added this to the v0.12.0 milestone Mar 18, 2016
@kayrus kayrus self-assigned this Mar 18, 2016
@kayrus
Copy link
Contributor

kayrus commented Mar 18, 2016

Confirmed.

$ fleetctl list-machines
MACHINE         IP      METADATA
1e1a8655...     coreos3 hostname=coreos3
74dce006...     coreos2 hostname=coreos2
cd29d103...     coreos1 hostname=coreos1

template:

[Service]
ExecStart=/bin/bash -c "while true; do echo Hello, World %i!; sleep 1; done"

[X-Fleet]
MachineMetadata="hostname=%i"
$ fleetctl start hello@random.service

v0.11.5 just hangs when it can not find corresponding machine.
master branch schedules unit randomly independently on metadata.

@daniellowtw
Copy link
Author

Can you try with this instead?

[Service]
ExecStart=/bin/bash -c "while true; do echo Hello, World %i!; sleep 1; done"

[X-Fleet]
MachineMetadata=hostname=%i

IIRC, the way the code parses the MachineMetadata is special when there is only one condition and when there are more than one.

Also, doesn't fleetctl start hello@random.service try to schedule this on machines with hostname=random? If so, then it wouldn't be scheduled on any of the machines in the cluster.

@kayrus
Copy link
Contributor

kayrus commented Mar 18, 2016

@daniellowtw what do you mean? I use only one condition.

@mwitkow
Copy link
Contributor

mwitkow commented Mar 18, 2016

@daniellowtw
Copy link
Author

@kayrus https://github.com/coreos/fleet/blob/master/job/job_test.go#L323
The test uses

MachineMetadata=hostname=%i

when there is one condition and

MachineMetadata="hostname=%i" "foo=bar"

when there's more than one.

I think it doesn't parse it if there are quotes when there is only one required metadata
i.e. MachineMetadata="hostname=%i"

@kayrus
Copy link
Contributor

kayrus commented Mar 18, 2016

I've tried to cherry-pick this to the v0.11.5 tag and reproduced the issue. Will try to investigate what is wrong.

@kayrus
Copy link
Contributor

kayrus commented Mar 18, 2016

Looks like this loop doesn't recognize whether unit is a template or not and returns hello@.service instead of hello@nodename.service. Then https://github.com/coreos/fleet/blob/master/unit/unit.go#L233 doesn't recognize specifiers of the "naked" template.

@kayrus
Copy link
Contributor

kayrus commented Mar 18, 2016

Weird. When you submit templated unit, etcdctl shows that two instances were created:

etcdctl get /_coreos.com/fleet/unit/79699e32624ad46c366ef37de89006ae9da617cd
{"Raw":"[Service]\nExecStart=/bin/bash -c \"while true; do echo Hello, World %i!; sleep 1; done\"\n"}
etcdctl get /_coreos.com/fleet/unit/bd40daa16d1d9d8962b12c6f512f912be98a5a4f
{"Raw":"[Service]\nExecStart=/bin/bash -c \"while true; do echo Hello, World %i!; sleep 1; done\"\n\n[X-Fleet]\nMachineMetadata=\"hostname=%i\"\n"}

kayrus added a commit to endocode/fleet that referenced this issue Mar 21, 2016
kayrus added a commit to endocode/fleet that referenced this issue Mar 21, 2016
kayrus added a commit to endocode/fleet that referenced this issue Mar 24, 2016
kayrus added a commit to endocode/fleet that referenced this issue Mar 24, 2016
kayrus added a commit to endocode/fleet that referenced this issue Mar 31, 2016
kayrus added a commit to endocode/fleet that referenced this issue Mar 31, 2016
@kayrus
Copy link
Contributor

kayrus commented Mar 31, 2016

Closed by #1520

@kayrus kayrus closed this as completed Mar 31, 2016
hectorj2f pushed a commit to giantswarm/fleet that referenced this issue Apr 6, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/coreos/fleet/issues/1446

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy