<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>服务器安全维护工作室 &#187; 如何实现对集群任务最小影响的 ECS 容器实例自动化终止方案</title>
	<atom:link href="https://www.fuwuqiok.com/tag/%e5%a6%82%e4%bd%95%e5%ae%9e%e7%8e%b0%e5%af%b9%e9%9b%86%e7%be%a4%e4%bb%bb%e5%8a%a1%e6%9c%80%e5%b0%8f%e5%bd%b1%e5%93%8d%e7%9a%84-ecs-%e5%ae%b9%e5%99%a8%e5%ae%9e%e4%be%8b%e8%87%aa%e5%8a%a8%e5%8c%96/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.fuwuqiok.com</link>
	<description></description>
	<lastBuildDate>Sun, 01 Mar 2020 07:28:40 +0000</lastBuildDate>
	<language>zh-CN</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.2.26</generator>
	<item>
		<title>如何实现对集群任务最小影响的 ECS 容器实例自动化终止方案</title>
		<link>https://www.fuwuqiok.com/%e5%a6%82%e4%bd%95%e5%ae%9e%e7%8e%b0%e5%af%b9%e9%9b%86%e7%be%a4%e4%bb%bb%e5%8a%a1%e6%9c%80%e5%b0%8f%e5%bd%b1%e5%93%8d%e7%9a%84-ecs-%e5%ae%b9%e5%99%a8%e5%ae%9e%e4%be%8b%e8%87%aa%e5%8a%a8%e5%8c%96/</link>
		<comments>https://www.fuwuqiok.com/%e5%a6%82%e4%bd%95%e5%ae%9e%e7%8e%b0%e5%af%b9%e9%9b%86%e7%be%a4%e4%bb%bb%e5%8a%a1%e6%9c%80%e5%b0%8f%e5%bd%b1%e5%93%8d%e7%9a%84-ecs-%e5%ae%b9%e5%99%a8%e5%ae%9e%e4%be%8b%e8%87%aa%e5%8a%a8%e5%8c%96/#comments</comments>
		<pubDate>Fri, 17 Aug 2018 02:48:01 +0000</pubDate>
		<dc:creator><![CDATA[admin]]></dc:creator>
				<category><![CDATA[Amazon AWS]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[服务器代维]]></category>
		<category><![CDATA[服务器代维护]]></category>
		<category><![CDATA[服务器安全代维]]></category>
		<category><![CDATA[服务器安全设置]]></category>
		<category><![CDATA[服务器维护]]></category>
		<category><![CDATA[服务器迁移]]></category>
		<category><![CDATA[如何实现对集群任务最小影响的 ECS 容器实例自动化终止方案]]></category>

		<guid isPermaLink="false">https://www.fuwuqiok.com/?p=3754</guid>
		<description><![CDATA[<p>问题背景 Amazon ECS 是一种容器管理服务，可以很方便地运行、停止和管理群集上的 Docker 容器。 [&#8230;]</p>
<p><a rel="nofollow" href="https://www.fuwuqiok.com/%e5%a6%82%e4%bd%95%e5%ae%9e%e7%8e%b0%e5%af%b9%e9%9b%86%e7%be%a4%e4%bb%bb%e5%8a%a1%e6%9c%80%e5%b0%8f%e5%bd%b1%e5%93%8d%e7%9a%84-ecs-%e5%ae%b9%e5%99%a8%e5%ae%9e%e4%be%8b%e8%87%aa%e5%8a%a8%e5%8c%96/">如何实现对集群任务最小影响的 ECS 容器实例自动化终止方案</a>，首发于<a rel="nofollow" href="https://www.fuwuqiok.com">服务器安全维护工作室</a>。</p>
]]></description>
				<content:encoded><![CDATA[<h2>问题背景</h2>
<p>Amazon ECS 是一种容器管理服务，可以很方便地运行、停止和管理群集上的 Docker 容器。当使用 ECS 运行容器任务时，会将它们放置在 ECS 群集上。Amazon ECS 从指定的映像存储库中，下载指定的容器映像，并在集群中的容器实例上运行这些映像所承载的任务。</p>
<p>我的同事 Chris Barclay 发了一篇很不错的<a href="https://amazonaws-china.com/cn/blogs/compute/how-to-automate-container-instance-draining-in-amazon-ecs/">博客文章</a>，介绍了在 Auto Scaling 组缩小 ECS 集群之前，使用容器实例耗尽的方法，自动化地删除正在进行的容器实例。</p>
<p>根据多个实际的客户案例，需要从 Amazon ECS 群集中终止实例的应用场景很多且重要， 例如： EC2 AMI 的升级和更新，执行系统关键升级补丁，系统核心组库的更新，Docker 软件版本的升级和更新，ECS 代理的版本升级和更新，集群大小的变更等等。</p>
<h2>解决方案</h2>
<p>通常而言，这些应用场景，都有一个共同的目标就是当容器实例的终止时，或从集群中删除容器实例时，不会影响集群中正在进行的任务，也就是说，阻止将新任务安排在处于 DRAINING 状态的容器实例上，如果资源可用（或预先起动新的容器实例），则新任务分配到 ECS 集群中的其他容器实例，待终止的容器实例上正在运行的任务，等其成功迁移到其他容器实例后，终止实例。实战中，亦可手动修改容器实例的状态为 DRAINING。本文中，我们将展示如何实现对集群任务最小影响的 ECS 容器实例自动化终止方案，其中会需要使用Auto Scaling组的生命周期挂钩以及 Amazon Lambda 提供的无服务函数调用，如下图所示：</p>
<p><a href="https://www.fuwuqiok.com/wp-content/uploads/2018/08/1-6.jpg"><img class="attachment-medium" src="https://www.fuwuqiok.com/wp-content/uploads/2018/08/1-6.jpg" alt="1-6" width="922" height="268" /></a></p>
<p>Auto Scaling 组支持可调用的生命周期挂钩，例如：Lambda 函数，以允许其在实例启动或终止之前完成，此例为实例终止之前。生命周期挂钩调用的 Lambda 函数完成以下两个任务：</p>
<ol>
<li>将 ECS 容器实例状态设置为 DRAINING。</li>
<li>检查容器实例上是否存在任何正在进行的任务。 如有则会向 SNS 发布消息，再次调用该 Lambda 函数进行检查。</li>
</ol>
<p>该 Lambda 函数会重复执行第2步，直到容器实例上没有任何正在运行的任务，或者生命周期挂钩心跳超时，以先发生者为准。 之后，控制权返回到 Auto Scaling 生命周期挂钩，终止容器实例。</p>
<h2>参考示例</h2>
<p>要实现上述自动化容器实例终止方案，可参考<a href="https://github.com/awslabs/ecs-cid-sample/blob/master/cform/ecs.yaml">开源的 CloudFormation 模板</a>，以及 S3 存储桶中上传 <a href="https://github.com/awslabs/ecs-cid-sample/blob/master/code/index.zip">Lambda 部署软件包</a>，设置本文中描述的资源。该模板创建以下资源：</p>
<ul>
<li>VPC 和关联的网络元素（子网，安全组，路由表等）</li>
<li>ECS 群集，ECS 服务和示例 ECS 任务定义</li>
<li>具有两个 EC2 实例和包含生命周期终止挂钩的 Auto Scaling 组</li>
<li>Lambda 函数</li>
<li>SNS 话题</li>
<li>能执行 Lambda 函数的 IAM 角色</li>
</ul>
<p>鉴于中国区有关 Auto Scaling 组的可信任实体和全球的命名方式有所区别，因此可参考<a href="https://github.com/aws-samples/ecs-cid-sample/pull/17">这里</a>的修改方法，对 CloudFormation 模板进行配置和更改。</p>
<p>创建 CloudFormation 堆栈，我们可以通过触发实例终止事件，来了解这是如何工作的：</p>
<ul>
<li>在 Amazon EC2 控制台中 ，选择 Auto Scaling Groups 并选择由 CloudFormation 创建的 Auto Scaling 组的名称。</li>
<li>选择操作 ， 编辑并更新服务，将实例的数量减少1个。</li>
</ul>
<p>这将触发一个实例的终止过程。选择 Auto Scaling 组实例选项卡：实例状态值应显示生命周期状态：</p>
<p><a href="https://www.fuwuqiok.com/wp-content/uploads/2018/08/2-4.jpg"><img class="attachment-medium" src="https://www.fuwuqiok.com/wp-content/uploads/2018/08/2-4.jpg" alt="2-4" width="544" height="325" /></a></p>
<p>此时，生命周期挂钩被激活并向 SNS 发布消息，最终触发 Lambda 函数的响应和执行。之后， Lambda 函数将 ECS 容器实例状态更改为 DRAINING。ECS 服务介入调度，停止实例上的任务并在可用实例上启动该任务。</p>
<p><a href="https://www.fuwuqiok.com/wp-content/uploads/2018/08/3-5.jpg"><img class="attachment-medium" src="https://www.fuwuqiok.com/wp-content/uploads/2018/08/3-5.jpg" alt="3-5" width="628" height="168" /></a></p>
<p>任务完成后，Auto Scaling 组活动历史记录确认 EC2 实例已终止。</p>
<p><a href="https://www.fuwuqiok.com/wp-content/uploads/2018/08/4-4.jpg"><img class="attachment-medium" src="https://www.fuwuqiok.com/wp-content/uploads/2018/08/4-4.jpg" alt="4-4" width="628" height="97" /></a></p>
<p>&nbsp;</p>
<h2>深入分析</h2>
<p>我们来深入分析一下 Lambda 函数内部的工作原理。该函数首先检查，收到的事件中的 LifecycleTransition 值是否为 EC2_INSTANCE_TERMINATING，表示当前已经进入生命周期挂钩的终止状态之前。</p>
<pre class=" language-c" data-language="C"><code class=" language-c"> <span class="token macro property"># If the event received is instance terminating...</span>
<span class="token keyword">if</span> <span class="token string">'LifecycleTransition'</span> in message<span class="token punctuation">.</span><span class="token function">keys</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"message autoscaling {}"</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>message<span class="token punctuation">[</span><span class="token string">'LifecycleTransition'</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> message<span class="token punctuation">[</span><span class="token string">'LifecycleTransition'</span><span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">'autoscaling:EC2_INSTANCE_TERMINATING'</span><span class="token punctuation">)</span> <span class="token operator">&gt;</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">:</span></code></pre>
<p>继续调用函数 “checkContainerInstanceTaskStatus”。该函数根据容器实例的 ID，将容器实例状态设置为 ‘DRAINING’。</p>
<pre class=" language-c" data-language="C"><code class=" language-c"><span class="token macro property"># Get lifecycle hook name</span>
lifecycleHookName <span class="token operator">=</span> message<span class="token punctuation">[</span><span class="token string">'LifecycleHookName'</span><span class="token punctuation">]</span>
<span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"Setting lifecycle hook name {} "</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>lifecycleHookName<span class="token punctuation">)</span><span class="token punctuation">)</span>

<span class="token macro property"># Check if there are any tasks running on the instance</span>
tasksRunning <span class="token operator">=</span> <span class="token function">checkContainerInstanceTaskStatus</span><span class="token punctuation">(</span>Ec2InstanceId<span class="token punctuation">)</span>
</code></pre>
<p>然后，检查实例上是否有任务正在运行。如有任务正在运行，则向 SNS 发布消息以再次触发 Lambda 函数后退出。</p>
<pre class=" language-c" data-language="C"><code class=" language-c"><span class="token macro property"># Use Task ARNs to get describe tasks</span>
descTaskResp <span class="token operator">=</span> ecsClient<span class="token punctuation">.</span><span class="token function">describe_tasks</span><span class="token punctuation">(</span>cluster<span class="token operator">=</span>clusterName<span class="token punctuation">,</span> tasks<span class="token operator">=</span>listTaskResp<span class="token punctuation">[</span><span class="token string">'taskArns'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token keyword">for</span> key in descTaskResp<span class="token punctuation">[</span><span class="token string">'tasks'</span><span class="token punctuation">]</span><span class="token punctuation">:</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"Task status {}"</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>key<span class="token punctuation">[</span><span class="token string">'lastStatus'</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"Container instance ARN {}"</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>key<span class="token punctuation">[</span><span class="token string">'containerInstanceArn'</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"Task ARN {}"</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>key<span class="token punctuation">[</span><span class="token string">'taskArn'</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>

<span class="token macro property"># Check if any tasks are running</span>
<span class="token keyword">if</span> <span class="token function">len</span><span class="token punctuation">(</span>descTaskResp<span class="token punctuation">[</span><span class="token string">'tasks'</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">&gt;</span> <span class="token number">0</span><span class="token punctuation">:</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"Tasks are still running.."</span><span class="token punctuation">)</span>
 <span class="token keyword">return</span> <span class="token number">1</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"NO tasks are on this instance {}.."</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>Ec2InstanceId<span class="token punctuation">)</span><span class="token punctuation">)</span>
 <span class="token keyword">return</span> <span class="token number">0</span>
</code></pre>
<p>继续执行的 Lambda 函数，发现容器实例上没有运行的任务时，则继续完成生命周期挂钩并终止 EC2 实例。</p>
<pre class=" language-c" data-language="C"><code class=" language-c"><span class="token macro property">#Complete lifecycle hook.</span>
try<span class="token punctuation">:</span>
 response <span class="token operator">=</span> asgClient<span class="token punctuation">.</span><span class="token function">complete_lifecycle_action</span><span class="token punctuation">(</span>
 LifecycleHookName<span class="token operator">=</span>lifecycleHookName<span class="token punctuation">,</span>
 AutoScalingGroupName<span class="token operator">=</span>asgGroupName<span class="token punctuation">,</span>
 LifecycleActionResult<span class="token operator">=</span><span class="token string">'CONTINUE'</span><span class="token punctuation">,</span>
 InstanceId<span class="token operator">=</span>Ec2InstanceId<span class="token punctuation">)</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"Response = {}"</span><span class="token punctuation">.</span><span class="token function">format</span><span class="token punctuation">(</span>response<span class="token punctuation">)</span><span class="token punctuation">)</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token string">"Completedlifecycle hook action"</span><span class="token punctuation">)</span>
except Exception<span class="token punctuation">,</span> e<span class="token punctuation">:</span>
 <span class="token function">print</span><span class="token punctuation">(</span><span class="token function">str</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span><span class="token punctuation">)</span> 
</code></pre>
<h2>结论</h2>
<p>本文讨论了 ECS 容器实例终止的多种应用场景，提供了对集群任务最小影响的 ECS 容器实例自动化终止方案，并通过参考示例展示和深入分析了其工作原理。基于参考示例，可以使用 CloudFormation，Lambda 等服务，实现真正的滚动部署 ，先启动新实例并批量终止实例，同时保证对现有的集群任务带来最小影响。要了解有关容器实例耗尽的更多信息，请参阅 A<a href="http://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-draining.html">mazon ECS 开发人员指南</a> 。</p>
<p><a rel="nofollow" href="https://www.fuwuqiok.com/%e5%a6%82%e4%bd%95%e5%ae%9e%e7%8e%b0%e5%af%b9%e9%9b%86%e7%be%a4%e4%bb%bb%e5%8a%a1%e6%9c%80%e5%b0%8f%e5%bd%b1%e5%93%8d%e7%9a%84-ecs-%e5%ae%b9%e5%99%a8%e5%ae%9e%e4%be%8b%e8%87%aa%e5%8a%a8%e5%8c%96/">如何实现对集群任务最小影响的 ECS 容器实例自动化终止方案</a>，首发于<a rel="nofollow" href="https://www.fuwuqiok.com">服务器安全维护工作室</a>。</p>
]]></content:encoded>
			<wfw:commentRss>https://www.fuwuqiok.com/%e5%a6%82%e4%bd%95%e5%ae%9e%e7%8e%b0%e5%af%b9%e9%9b%86%e7%be%a4%e4%bb%bb%e5%8a%a1%e6%9c%80%e5%b0%8f%e5%bd%b1%e5%93%8d%e7%9a%84-ecs-%e5%ae%b9%e5%99%a8%e5%ae%9e%e4%be%8b%e8%87%aa%e5%8a%a8%e5%8c%96/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
