Thomas Vachon

Using AWS Athena and CloudTrail Revisited

2018-04-17T04:00:00+00:00

What’s new with Athena and CloudTrail

AWS has made great strides to make CloudTrail far more useful in the past year. Recently AWS has provided a point & click wizard in CloudTrail to setup Athena validating the strengths of this approach but they stop short of giving great guidance on how to use and scale it.

Acknowledgements

I want to thank Corcoran Smith @corcoranCI for reminding me to update this article.

Setting Up the Tables

AWS released the CloudTrail SerDe sometime after my last post and I have been using for the past 6 to 9 months. If you look at the last article you will notice that there was a very complicated CREATE TABLE statement, luckily that has changed to this:

CREATE EXTERNAL TABLE my_table_name (
         eventversion STRING,
         userIdentity STRUCT< type:STRING,
         principalid:STRING,
         arn:STRING,
         accountid:STRING,
         invokedby:STRING,
         accesskeyid:STRING,
         userName:STRING,
         sessioncontext:STRUCT< attributes:STRUCT< mfaauthenticated:STRING,
         creationdate:STRING>,
         sessionIssuer:STRUCT< type:STRING,
         principalId:STRING,
         arn:STRING,
         accountId:STRING,
         userName:STRING>>>,
         eventTime STRING,
         eventSource STRING,
         eventName STRING,
         awsRegion STRING,
         sourceIpAddress STRING,
         userAgent STRING,
         errorCode STRING,
         errorMessage STRING,
         requestParameters STRING,
         responseElements STRING,
         additionalEventData STRING,
         requestId STRING,
         eventId STRING,
         resources ARRAY<STRUCT< ARN:STRING,
        accountId: STRING,
        type:STRING>>,
         eventType STRING,
         apiVersion STRING,
         readOnly STRING,
         recipientAccountId STRING,
         serviceEventDetails STRING,
         sharedEventID STRING,
         vpcEndpointId STRING 
) PARTITIONED BY(
         region string,
         year string,
         month string
) 
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://my_consolidated_bucket/my-cross-account-prefix/AWSLogs/

Some of the key changes are in how the data is parsed, there is less de-nesting of JSON but ultimately its much easier to query now. Also, I ensure I added Partitions to the tables, I’ll explain why that is important later, but do it now and I’ll show you how to automate it next.

Getting Started With Partitions

What I discovered when I was querying data going back over time was inefficiencies and more importantly increased cost. Partitions in Athena are the right way to solve this but to do that you have to add them individually to each table.

Given the amount of logs I have and the infrequency which I look at some regions, I decided to partition on region, year, and month. To do that I first looked at Boto3, but unfortunately as of this writing there still is not a Waiter function for Athena queries. It can be very easy to overrun any quotas or limits on the DDL statements on concurrent query limits, so I went looking and found the fantastic overlay on Boto3/CLI called athena-CLI which I can not recommend more highly.

To add the partitions, I loaded up a script and used the waiters native in athena-cli to ensure I didn’t overrun. I added some concurrency to keep it under my DDL limit but to add some speed improvements.

For example, here is a query to add a partition to us-east-1 for April 2018 for account “999999999999”

ALTER TABLE my_table_name ADD PARTITION (region='us-east-1',year='2018',month='04') 
location 's3://my_consolidated_bucket/my-cross-account-prefix/AWSLogs/999999999999/CloudTrail/us-east-1/2018/04/';

Also, you can pre-partition your data, so I generally load up a year’s worth of partitions at once. Athena does not care if the folder is present or not when you setup the partition. It is important to note, if you setup partitions in your schema, if you do not create them, you will never see the data when you query.

Note: When AWS presents you with the DDL from the CloudTrail screen, it does not contain partitions, I strongly encourage you to add them

Working with the data

Lets look at the structure of a few records as they appear now.

eventversion        | 1.04
useridentity        | {type=IAMUser, principalid=LKUWHE3545KJ34534L65U, 
                      arn=arn:aws:iam::999999999999:user/iam-username, 
                      accountid=999999999999, invokedby=null, 
                      accesskeyid=FAKEACCESSID, username=iam-username, 
                      sessioncontext=null}
eventtime           | 2018-04-04T23:55:30Z
eventsource         | monitoring.amazonaws.com
eventname           | DescribeAlarms
awsregion           | us-east-1
sourceipaddress     | 198.51.100.144
useragent           | aws-cli/1.11.132
errorcode           | NULL
errormessage        | NULL
requestparameters   | {"alarmNames":["alarmname"]}
responseelements    | null
additionaleventdata | NULL
requestid           | 09a5f980-13a2-48af-94d7-f27a2affbdbe
eventid             | 55979b8b-494f-4c8f-9cf9-3edaadefe142
resources           | NULL
eventtype           | AwsApiCall
apiversion          | NULL
readonly            | NULL
recipientaccountid  | 999999999999
serviceeventdetails | NULL
sharedeventid       | NULL
vpcendpointid       | NULL
region              | us-east-1
year                | 2018
month               | 04

eventversion        | 1.05
useridentity        | {type=AssumedRole, principalid=LKUWHE3545KJ34534L65U:user@example.com,
                      arn=arn:aws:sts::999999999999:assumed-role/rolename/user@example.com, 
                      accountid=999999999999, invokedby=null, accesskeyid=null, 
                      username=null, sessioncontext=null}
eventtime           | 2018-03-12T12:00:37Z
eventsource         | signin.amazonaws.com
eventname           | ConsoleLogin
awsregion           | us-east-1
sourceipaddress     | 198.51.100.144
useragent           | Chrome/65.0.3325.146
errorcode           | NULL
errormessage        | NULL
requestparameters   | null
responseelements    | {"ConsoleLogin":"Success"}
additionaleventdata | {"LoginTo":"https://console.aws.amazon.com/console/home?region=us-east-1",
                      "MobileVersion":"No","MFAUsed":"No",
                      "SamlProviderArn":"arn:aws:iam::999999999999:saml-provider/MySamlIdp"}
requestid           | NULL
eventid             | 96b00be0-6600-4489-8f94-3f70b04c4a66
resources           | NULL
eventtype           | AwsConsoleSignIn
apiversion          | NULL
readonly            | NULL
recipientaccountid  | 999999999999
serviceeventdetails | NULL
sharedeventid       | NULL
vpcendpointid       | NULL
region              | us-east-1
year                | 2018
month               | 03

Looking at these data sets, you get simpler queries. Looking at the first data set, here is a query which would have that record in its output, as well as others:

SELECT * FROM my_table_name 
  WHERE useridentity.username = 'iam-username' 
  AND year = '2018' 
  AND month  = '03';

In this query you can see that useridentity allowed dotted notation addressing of sub-fields which allows for very powerful queries using the Presto framework including Regular Expressions. The other columns which can be addressed normally via column names, again becoming much simpler.

Query and Performance Comparison

Now that we see the data is a bit easier to comprehend, how much easier is it to write? Also as important is how much faster and efficient is it to run? To do this test, I ran the following two queries against my largest account.

I am looking to find the 20 highest counts of a tuple of Event Names/ARN/SourceIP for March 2018 in us-east-1 only.

Previous Style

SELECT record.eventName, record.userIdentity.arn, record.sourceIPAddress, COUNT(*)
FROM
(SELECT record
FROM my_table_name
CROSS JOIN UNNEST(records) AS t (record)) AS records
WHERE record.eventtime LIKE '2018-03-%' and record.awsregion = 'us-east-1'
GROUP BY record.eventName, record.userIdentity.arn, record.sourceIPAddress
ORDER BY COUNT(*) DESC
LIMIT 20;

Current Style

select eventname, useridentity.arn, sourceipaddress, count(*) 
  from my_table_name 
  where year = '2018' 
  and month = '03' 
  and region = 'us-east-1' 
  group by eventname, useridentity.arn, sourceipaddress 
  order by count(*) DESC
  LIMIT 20

The old table and query format:

449.72 seconds
Scanned 13.97GB of data
Cost $0.06985

The new table and query format:

15.03 seconds
Scanned 1.2GB of data
Cost $0.006

Conclusion

In summary, the new table structure and queries are much faster, cheaper, and easier. In fact, on average, they are 500x cheaper and 400% faster. There really is little disadvantage to changing to the new schema.

Do you have any cool queries you wrote to summarize your data? I would love to hear from you.

Using AWS Athena and CloudTrail Revisited was originally published by Thomas Vachon at Thomas Vachon on April 17, 2018.

Investigating AWS Pricing over Time

2018-04-17T04:00:00+00:00

Going Down the Rabbit Hole - A Historical Look at AWS Pricing

When a coworker asked me if AWS had a historical pricing sheet, I was astounded to find out the answer was no. I went digging into the AWS landscape to find the answers and here is what I found. AWS has publicly reduced its pricing across various services 62 or 65 times depending on who you ask or what metric you utilize. With these changes it becomes hard to understand historical pricing trends for AWS other than they make it cheaper as their internal modeling allows.

Selecting the Target

I decided to deep dive on this topic with the one service which has the longest pricing history and the most consistent formats, S3. For those of you who have never looked or might have forgotten, S3 was one of the first three offerings out of the AWS team with EC2 and SQS being the other two. While the SQS service is the eldest, it hasn’t seen many price reductions in its lifetime. Part of this is due to that its a managed offering but more likely this is due to the fact that its not really tied to economies of scale. The next logical place to look naturally would be EC2. EC2 has a long history of price reductions; however, they are hard to track with the constant enhancements of the instance families. Older instance types are hardly ever destroyed but their prices are usurped by the newer generations. As expected, this is where AWS is showing their scale and purchasing power but it creates unnatural plateaus and drops within a long term view of the service pricing. So that leaves the investigation to the “youngest” of the three elder’s children, S3.

Testing Methodology

In selecting S3, we get several benefits to our analysis:

a singular class of storage since inception providing clear pricing since 2006
a view of the AWS purchasing power compared against trackable consumables such as drive capacity
a trend which demonstrates how AWS treats smaller consumers versus the largest consumers of a service over time

Looking at all the price reductions, I normalized all the data from 2006 through the last reduction in March 2014. To do this I have to look at the current S3 Tiers and apply them retroactively to the pricing from the past which provided the pricing regardless of the tiers in use at the time. This was surprisingly time consuming as some were written as “0-50TB” while others were additive in nature “next 100 TB”.

Historical Pricing

In the graph below, you can see this normalized data. The tiers are arranged smallest to largest and the chart also ages oldest to newest in the bar chart groupings.

^{Click to expand}

What I see in that data is that AWS …

has never increased prices, but has not always made it cheaper
decreases costs in specific set of tiers (e.g. mid volume or high volume) most of the time
drove down their internal costs in 2008, 2012, and 2014 to provide the highest discounts

Purchasing Power

What surprises me in those charts is actually how steady prices are at the higher volume tiers. As a result, I went looking for a historical price record and found one at Backblaze. I want to thank Backblaze for letting use this image. I find this a particularly interesting graph because these are prices reflected at a volume scale. These are more representative of what AWS would pay than what I would pay at my local retailer.

^{Click to expand}

The chart depicted does not start in 2008, but I would venture to say that the savings in 2008 are more of an architectural update because the prices do not drastically fall as the usage increases. When we start looking at 2012 and 2014, things get more interesting. In 2012, 4TB drives premiered around $0.08/GB which was how much 2TB cost in 2010; however, even more telling was the drastic price decrease in the 3TB drives. As a result of these supply change price decreases, we see a nearly 40% drop in in storage over 50TB compared to their 2010 pricing.

Volume Customers

To see how AWS treats volume customers, I think its important to look at the tiers which they supply. If we look at the this table, we can see where AWS has decided to reward/penalize customers to storing too little over time or too much too early on.

Tier	2006	2008	2009	2010	2012	2014	2016
None	X
0-1 TB				N	X	X	X
0-50 TB		N	X
1 - 50 TB				N	X	X	X
50-100 TB		N	X	X
100-500 TB		N	X	X
50 - 500 TB					X	X	X
500 - 1000 TB			N	X	X	X	X
1000-5000 TB			N	X	X	X	X

Legend: N = Newly Added, X = Pre-existing & carried forward

What you can see in this data is clearly AWS rewards or more correctly, incentives, Data Gravity. In 2009 they cut the prices for the top 3 buckets while leaving the rest the same. In the next two increases they dropped their prices for the middle, and again in 2014 and 2016 they drastically cut prices on the higher end. Additionally, over time, they increases the “width” of the middle buckets as well. All this indicates that AWS wants to host all of your data which they use as a method to move their users into other services beyond pure storage, surprising no one.

Conclusion

I hope this sheds some light on how AWS, or any cloud provider for that matter, is reevaluating their internal costs and adjusting to provide the best value to their users. If you would like the raw data, please do not hesitate to reach out to me.

Investigating AWS Pricing over Time was originally published by Thomas Vachon at Thomas Vachon on April 17, 2018.

So Let's Talk About Multi Cloud

2018-01-08T04:00:00+00:00

Multi-Cloud: A myth or a practical reality?

For many businesses, the desire of being on one, two, or even three Cloud vendors is attractive to mitigate financial risks due to untenable price increases and business risks around availability of their critical systems. For other businesses, they want to provide a global reach to their customers where some providers excel in those regions while others do not perform within acceptable parameters.

What is a Cloud?

The more pressing question, and frankly the harder one to answer, is “What is a Cloud?”. The Cloud can be many things to many people, from the simple systems administrator version of “someone else’s computers in someone else’s data center”. If you ask a business analyst, they may tell you that Salesforce is a Cloud or possible Google’s gSuite is a Cloud. You would be amiss if you argued that either of those points of view is either invalid or even misguided.

So, for the brevity of such an article, I will be talking about IaaS vendors going forward. Some of this will also apply to PaaS vendors, but we will not be focusing on SaaS vendors at all.

Examining the Risks

The problem with most enterprises is they have largely decided to build systems in the IaaS Clouds much as they do on-premises. In some vendor offerings, such as vCloud Air or VMware on AWS, if you have the overlay networks and virtual SAN’s in place on-premises today, that largely works. The problems start when you move to offerings which are not like for like from what you did in the past.

Financial Risk

This risk is by far the least understood risk by most technologists but one of the most important to the business. As part of the Cloud transition, costs largely are moved from Capital Expenditures (CapEx) to Operational Expenditures (OpEx). This important to understand as CapEx can be seen as a transfer of assets from cash to equipment under the Generally Accepted Accounting Practices (GAAP). As such, the company is not expending money, its transferring money into non-monetary instruments which devalue over time, a process known as amortization. As such, one day one, a million dollar mainframe didn’t impact the balance sheet as a million dollars of cash gone out the door.

This all changes in the Cloud with some exceptions such as reservations. When companies spend money in the Cloud, its generally considered an OpEx usage by GAAP which means a dollar spent is a dollar not in the company’s balance sheet.

What does it mean to be Multi-Cloud

With the risks well understood, the question is now what does it actually mean to be Multi-Cloud. There are several camps of thought around this topic and I would summarize them as:

Syncing your static backups to another provider
Using vendor agnostic provisioning systems (e.g. Terraform, Puppet, Ansible, etc.) and having a copy of warm data in another provider
Actively running your app(s) across multiple Cloud vendors at once

Clearly these are cumulative, you cannot do #3 without #1 and #2. These are also presented in levels of increasing difficulty and complexity.

I would argue, for about 70% of companies, you should just do #3 and have a runbook on how to use the second provider by hand if required mitigating a potential financial risk due to increased single provider costs. For most of that 70%, to ensure you have durability of your application, using multiple regions in your primary provider is the correct way to reduce business risk.

In part 2 of this series, I will investigate what you need, when you should do it, what to avoid, and what’s the first step

So Let's Talk About Multi Cloud was originally published by Thomas Vachon at Thomas Vachon on January 08, 2018.

Making Modular CloudFormation with Includes

2017-05-08T04:00:00+00:00

Pushing the CloudFormation Bleeding Edge: Native Modular Templates

When the YAML format for CloudFormation was launched in September 2016, many of the users knew it was only a matter of time until the commonly used pattern of including multiple YAML files into a single file made its way into CloudFormation. On March 28, 2017, AWS did exactly that by launching the AWS::Include Transform, albeit with surprising lack of fanfare.

While YAML was not a prerequisite to having this feature, it made it infinitely easier leverage as an end-user. There are very important things which I have discovered as I integrated AWS::Include into my daily work; some of these are documented fully, others are documented partially, and others not at all.

Terms of Use:

Partials - The snippets of CloudFormation stored in S3
Master - The template executed by the end-user
Includes - The AWS::Includes Transform

Key Points

Your Partials may be in either JSON or YAML
Partials must use only the long form of a Function Call (e.g Fn::Sub not !Sub)
Change sets are required for use of Includes
Partials must be accessible by the end-user’s STS assumption for CloudFormation
- Public ACL Read is not required if you have a good bucket policy
Partials are included into the Master before evaluation of functions
- Prevents using Fn::Sub in a Location directive for dev/prod s3 path of the Partials
Errors which occur in Partials create unusual errors on evaluation
Understanding scope is very important
Nested Includes calls is not supported (i.e. your partials template cannot have another partial in it)

Now I will dive into some of these key points in detail, use cases for where to AWS::Include, and lessons learned from living on the bleeding edge.

In a future post, I will detail use cases where Includes is ideal for your business such as creating predictable IAM Roles or a multi-engine RDS template

Key Points in Details

Execution Model

The execution model when using Includes is through CloudFormation Change Sets, which is a great way to enforce a known checkpoint but brings in difficulties for people who don’t use CloudFormation daily. When you use a Includes and you want to make a new stack, you are left with two options:

Create the stack within the AWS Console - the console automatically creates a blank stack, change set, and prompts for CAPABILITY_NAMED_IAM
Create a “blank” stack (e.g. just a wait handle) and then create a change set against that stack with CAPABILITY_NAMED_IAM in the create-change-set call

Mixing JSON and YAML

This adventure to using Includes was my first significant effort into using YAML for all my CloudFormation, up until this point the need for YAML (specifically inline comments) was not worth the time it would take to rewrite what we had been using to date.

One of the use-cases I leveraged Includes for is deploying IAM Policies and Roles for Federated Login as it requires predictable Role naming. I have found in practice it is easier to write the policies in JSON but as I was doing YAML now, I decided to keep it in pure YAML for readability.

AWS has, thankfully, provided the ability to continue keeping your Partials in either language regardless of the Master’s language. You may choose to keep your JSON templates for things like IAM Policies and use YAML for the Master.

What I did leverage from time to time was cfn-flip to ensure my YAML syntax was inline with the JSON evaluation. As the templates are included they are converted to JSON, so doing this is a reasonable checkpoint for yourself.

Evaluation Logic and Order

As I said above you cannot have any other Function evaluated before the includes happens. This means that you cannot do something like this:

Parameters:
  PartialsEnv:
    Type: String
    Default: prod
    AllowedValues:
      - prod
      - dev
...

Resources:
  MyTestResource:
    Fn::Transform:
      Name: AWS::Include
      Parameters:
        Location:
          Fn::Sub: s3://my-partials-bucket/${PartialsEnv}/resources/test_resource.yaml

This will throw an error that you must provide a valid S3 URI/Object. I have raised this with support and an RFE has been created to allow this or something like it be accepted.

While the order of operations is not specific to Includes, I was bitten by the unwritten order more than once. For your reference, I have compiled an incomplete list of the evaluation precedence steps here:

Mappings
References Lookups
Conditional Statements
Substitutions

I will try to obtain the information on a complete list of these in the future and make a separate post on that.

The reason I included these is that you are more likely to run into trying to do Mappings inside Conditionals using Reference Lookups and that will fail with unpredictable results.

Errors & Debugging

I have compiled a list of the most common issues I have run into within Includes and YAML in general

Common Errors	Causes
Circular Dependancies	Incorrect/Invalid Reference in the Partial
Invalid Policy Syntax for IAM on execution	A `` IAM Policy Resource was not `""`
Invalid MajorEngineVersion ##.# for SqlServer Option Group	SqlServer Option Groups are ##.## and during the YAML to JSON conversion, it drops the superfluous 4th digit, quoting such as `"13.00"` fixes this
Must Provide valid S3 URI or S3 Object	1. You are referencing a private S3 Object with no Bucket Policy
	2. You are trying to do a Function within the S3 Location call
	3. You are trying to reference a S3 Object which has no valid CloudFormation items

Scope and Use In Practice

When using Includes, its very important to pay attention to how Scope is in use. You cannot have two Includes calls in a single Scope level. I have detailed interesting use cases around scope below and its important all of these items are mutually exclusive within a single template (#1) or within sections (#2/#3).

1 - Use the transform section of the template

In this example I show how you can replace an entire template with Includes except Parameters, which can not be part of Includes.

Master Template

---
  AWSTemplateFormatVersion: 2010-09-09
  Transform:
    Name: AWS::Include
      Parameters:
        Location: s3://my-partials-bucket/my_stack.yaml

  # Parameters cannot be in an Includes
  Parameters:
    MyParam:
      Type: String

Partials Template

Metadata:
  ... # Your Metadata
Conditionals:
  ... # Your Conditionals
Mappings:
  ... # Your Mappings
Resources:
  ... # Your Resources
Outputs:
  ... # Your Outputs

2 - A section level

In this example I show how you can take all of the Resources, Outputs, etc of a template and put them into a Partial

Master Template

Mappings:
  Fn::Transform:
    Name: AWS::Include
    Parameters:
      Location: s3://my-partials-bucket/mappings/my_mappings.yaml

# Parameters cannot be in an Includes
Parameters:
  MyParam:
    Type: String

...

Resources:
  Fn::Transform:
    Name: AWS::Include
    Parameters:
      Location: s3://my-partials-bucket/resources/my_resources.yaml

Outputs:
  Fn::Transform:
    Name: AWS::Include
    Parameters:
      Location: s3://my-partials-bucket/outputs/my_outputs.yaml

Partials Templates

# my_mappings.yaml
AWS::CloudFormation::Interface:
 ParameterGroups:
  -
    Label:
      default: Global Account Information
    Parameters:
      - MyParam
 ParameterLabels:
    # These values must be quoted to add white space
    MyParam:
      default: 'My Parameter: '

# my_resources.yaml
MyLogicalKMSResourceName:
  Type: AWS::KMS::Key
  Properties:
    Description: |
      My KMS Example Resource
    Enabled: true
    ...

MyLogicalWaitResourceName:
  Type: AWS::CloudFormation::WaitConditionHandle

# my_outputs.yaml
MyLogicalKMSResourceOutput:
  Description: |
    KMS ARN Example
  Value:
    Ref: MyLogicalKMSResourceName
  Export:
    Name:
      Fn::Sub: ${AWS::StackName}-MyLogicalKMSResourceOutput

3 - Multiple Resources

In this example I show you can use excludes to abstract the body of each resource and output into their own Partial

...
Parameters:
  MyParam:
    Type: String

...

Resources:
  MyLogicalKMSResourceName:
    Fn::Transform:
      Name: AWS::Include
      Parameters:
        Location: s3://my-partials-bucket/resources/kms_key.yaml
  MyLogicalWaitResourceName:
    Fn::Transform:
      Name: AWS::Include
      Parameters:
        Location: s3://my-partials-bucket/resources/wait_handle.yaml

...

Outputs:
  MyLogicalKMSResourceOutput:
    Fn::Transform:
      Name: AWS::Include
      Parameters:
        Location: s3://my-partials-bucket/outputs/kms_key.yaml
...

Partials Templates

#resources/kms_key.yaml
Type: AWS::KMS::Key
Properties:
  Description: |
    My KMS Example Resource
  Enabled: true
  ...

#resources/kms_key.yaml
Type: AWS::CloudFormation::WaitConditionHandle

#outputs/kms_key.yaml
Description: |
  KMS ARN Example
Value:
  Ref: MyLogicalKMSResourceName
Export:
  Name:
    Fn::Sub: ${AWS::StackName}-MyLogicalKMSResourceOutput

4 - Within a resource

This example shows how to use Includes to provide some modularity to the end-user while maintaining some attributes which are common in the Partial. When you get to this method, as the warning below notates scope is very tricky and you should take care. In the example I create an RDSDBParameterGroup which allows the Master Template to specify what the Parameters are in use for this RDS. Additionally, I show a second method of advanced scoping which allows you to have an Includes at the end of a section which provides any “generic” items, as a result of scoping, any use of this must be at the end of the Section or it will be overridden. I also demonstrate that regardless of how a resource is declared (e.g. RDSDBParameterGroup is “within a resource”) the Logical ID persists in the template after compilation (e.g the Output for RDSDBParameterGroup is in the Generic Outputs Includes)

Note: This method is considered advanced and requires significant testing

Master Template

Mappings:
...

# Parameters cannot be in an Includes
Parameters:
MyParam:
  Type: String

...

Resources:
  RDSDBParameterGroup:
    Type: AWS::RDS::DBParameterGroup
      Properties:
        # Common items for all Parameter Groups
        Fn::Transform:
          Name: AWS::Include
          Parameters:
            Location: s3://my-partials-bucket/resources/rds_parameter_group.yaml
        # Custom Parameters per RDS which are set by the "stack owner"
        Parameters:
          sql_mode: IGNORE_SPACE
          timezone: UTC

  # Must be the last item in the section
  # Includes a series of generic resources
  Fn::Transform:
    Name: AWS::Include
    Parameters:
      Location: s3://my-partials-bucket/resources/general_resouces.yaml

Outputs:
  MyLogicalKMSResourceOutput:
    Fn::Transform:
      Name: AWS::Include
      Parameters:
        Location: s3://my-partials-bucket/outputs/kms_key.yaml

  # Must be the last item in the section
  # Includes a series of generic outputs
  Fn::Transform:
    Name: AWS::Include
    Parameters:
      Location: s3://my-partials-bucket/outputs/general_outputs.yaml

Partials Templates

# resources/rds_parameter_group.yaml
Description: RDS DB parameter group
Family: rdsengine-12.3
Tags:
-
  Key: Name
  Value: rds-engine-12.3-parameter-group

# resources/general_resouces.yaml
MyLogicalKMSResourceName:
  Type: AWS::KMS::Key
  Properties:
    Description: |
      My KMS Example Resource
    Enabled: true
    ...

MyLogicalWaitResourceName:
  Type: AWS::CloudFormation::WaitConditionHandle

# outputs/kms_key.yaml
Description: |
  KMS ARN Example
Value:
  Ref: MyLogicalKMSResourceName
Export:
  Name:
    Fn::Sub: ${AWS::StackName}-MyLogicalKMSResourceOutput

# outputs/general_outputs.yaml
RDSDBParameterGroupOutput:
  Description: |
    A RDS Parameter Group Example
    Value:
      Ref: RDSDBParameterGroup
    Export:
      Name:
        Fn::Sub: ${AWS::StackName}-RDSDBParameterGroup

Lessons Learned

Numbers on convert are not maintained with precision (13.00 becomes 13.0)
When debugging, make a “fat” template if you run into issues. Many “good errors” are hidden by Change Sets and Includes and therefore if you run into weird issues, make a template with everything in it first and then break it out once its working.
Make a test blank stack (see above) when trying to develop as a rollback rolls back normally preventing continual stack create/delete cycles
Validate all your YAML first and not through a CloudFormation YAML linter, the stricter the YAML linter the better
When in doubt, use cfn-flip and see if its still works
Develop IAM policies in IAM first, then use a JSON to YAML converter to embed into your templates
New features have new bugs, you may call support and want to hit your head on your desk when its something “simple”, but those occurrences are outweighed by the times its a true bug

References

Announcement: https://aws.amazon.com/about-aws/whats-new/2017/03/aws-cloudformation-supports-authoring-templates-with-code-references-and-amazon-vpc-peering/

AWS Documentation: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/create-reusable-transform-function-snippets-and-add-to-your-template-with-aws-include-transform.html

Making Modular CloudFormation with Includes was originally published by Thomas Vachon at Thomas Vachon on May 08, 2017.

Using AWS Athena to Query CloudTrail Logs

2017-01-26T04:00:00+00:00

Athena and CloudTrail: A Marriage made in the Cloud

One of the first things which came to mind when AWS announced AWS Athena at re:Invent 2016 was querying CloudTrail logs. Over the course of the past month, I have had intended to set this up, but current needs dictated I had to do it quickly. When I went looking at JSON imports for Hive/Presto, I was quite confused. Of course, as a trusty technologist I went to Google. Much to my surprise, no one had published an article about using Athena to do this, I was only able to locate EMR based posts which used a custom serde to support the nested CloudTrail format.

I had mild success at first, but thanks to some Athena guru’s, I was able to get the magic piece in place.

I have to provide credit to AWS for their help with a few issues and amazing documentation on the event types.

I have provided references at the end of each field section and the end of the post with specific and broader details for the event fields and their uses.

To set all of this up, you first must have your CloudTrail logs in a single S3 bucket, this will work with a single account or many, but I purposely set up delivery to a single bucket but I created a table per source in Athena under a common database.

This is an example create table which will provide the table/field sytax formats I used in the tables below.

CREATE EXTERNAL TABLE my_table_name (
         Records ARRAY< STRUCT< eventName: STRING,
         requestParameters: STRUCT< instancesSet: STRUCT< items: ARRAY< STRUCT< instanceId: STRING >>>,
         volumeSet: STRUCT< items: ARRAY< STRUCT< volumeId: STRING > > > >,
         eventType: STRING,
         eventSource: STRING,
         sourceIPAddress: STRING,
         userIdentity: STRUCT< arn: STRING,
         principalId: STRING,
         accountId: STRING,
         invokedBy: STRING,
         TYPE: STRING,
         sessionContext: STRUCT< sessionIssuer: STRUCT< arn: STRING,
         principalId: STRING,
         accountId: STRING,
         TYPE: STRING,
         userName: STRING >,
         attributes: STRUCT< creationDate: STRING,
         mfaAuthenticated: STRING > > >,
         eventVersion: STRING,
         responseElements: STRUCT< credentials: STRUCT< accessKeyId: STRING,
         expiration: STRING,
         sessionToken: STRING >,
         assumedRoleUser: STRUCT< arn: STRING,
         assumedRoleId: STRING > >,
         userAgent: STRING,
         eventID: STRING,
         awsRegion: STRING,
         sharedEventID: STRING,
         eventTime: STRING,
         resources: ARRAY< STRUCT< accountId: STRING,
         TYPE: STRING,
         ARN: STRING > >,
         requestID: STRING,
         recipientAccountId: STRING >>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH serdeproperties( 'ignore.malformed.json' = 'true' )
LOCATION 's3://my_consolidated_bucket/my-cross-account-prefix/AWSLogs/'

CloudTrail Record Query Columns

These are the columns you can reference in your queries, I have grouped them by purpose. This is not a full list of all CloudTrail fields, so if you need others such as VpcEndpoint, you should add that to the schema.

Event ID Fields

record.eventID	GUID generated by CloudTrail to uniquely identify each event
record.sharedEventID	GUID generated by CloudTrail to uniquely identify CloudTrail events from the same AWS action that is sent to different AWS accounts

Event Details

record.eventName	The requested action, which is one of the actions in the API for that service. (example: DescribeLoadBalancers)
record.eventSource	The service that the request was made to (e.g. ec2.amazonaws.com)
record.eventTime	The date and time the request was made, in coordinated universal time (UTC)
record.eventType	Identifies the type of event that generated the event record, one of AwsApiCall, ConsoleSignin, AwsServiceEvent (related to the trail itself, this can occur when another account made a call with a resource that you own)
record.eventVersion	The version of the log event format
record.sourceIPAddress	The IP address that the request was made from, when console is used, it will report console.amazonaws.com

Request Details

record.requestId	The value that identifies the request, generated by the service being called
record.requestParameters	The parameters, if any, that were sent with the request

Resource Details

record.resources	An array of the resources accessed in the event, used most often by STS or KMS
record.resources.accountId	The account ID of the impacted element

Response Details

record.responseElements.assumedRoleUser.arn	The arn of the assumed role for the unique session
record.responseElements.assumedRoleUser.assumedRoleId	The ID of the assumed role for the unique session
record.responseElements.credentials.accessKeyId	The access key of the caller
record.responseElements.credentials.expiration	The expiration of the current session
record.responseElements.credentials.sessionToken	The active token for the session References

References
http://docs.aws.amazon.com/IAM/latest/UserGuide/cloudtrail-integration.html#stscloudtrailexample http://docs.aws.amazon.com/kms/latest/developerguide/logging-using-cloudtrail.html

Miscellaneous

record.userAgent	The agent through which the request was made
record.recipientAccountId	Represents the account ID that received this event, may differ from the calling account if cross-account access occurred and will differ on the "remote" end

User Identity

record.userIdentity.accountId	The account that owns the entity that granted permissions for the request
record.userIdentity.arn	The Amazon Resource Name (ARN) of the principal that made the call
record.userIdentity.invokedBy	The name of the AWS service if that made the request
record.userIdentity.principalId	A unique identifier for the entity that made the call. For requests made with temporary security credentials, this value includes the session name that is passed to the AssumeRole, AssumeRoleWithWebIdentity, or GetFederationToken API call
record.userIdentity.sessionContext.attributes.creationDate	The date and time when the temporary security credentials were issued
record.userIdentity.sessionContext.attributes.mfaAuthenticated	The value is true if the root user or IAM user whose credentials were used for the request also was authenticated with an MFA device; otherwise, false
record.userIdentity.sessionContext.sessionIssuer.accountId	The account that owns the entity that was used to get credentials
record.userIdentity.sessionContext.sessionIssuer.arn	The internal ID of the entity that was used to get credentials
record.userIdentity.sessionContext.sessionIssuer.type	The source of the temporary security credentials, such as Root, IAMUser, or Role
record.userIdentity.sessionContext.sessionIssuer.userName	The friendly name of the user or role that issued the session. The value that appears depends on the sessionIssuer identity type. See reference material for more information
record.userIdentity.type	The type of the identity which is one of: Root, IAMUser, AssumedRole, FederatedIsr AWSAccount (cross-account access), AWSService (Access performed by an AWS service such as Elastic Beanstalk)

Reference: http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-user-identity.html#cloudtrail-event-reference-user-identity-fields

Example Queries

Find all event names by ARN by IP address and count them up as the highest totals

SELECT record.eventName, record.userIdentity.arn, record.sourceIPAddress, COUNT(*)
FROM
(SELECT record
FROM my_table_name
CROSS JOIN UNNEST(records) AS t (record)) AS records
GROUP BY record.eventName, record.userIdentity.arn, record.sourceIPAddress
ORDER BY COUNT(*) DESC
LIMIT 20;

Find all events where cross-account access occurred, group them by the source and the ARN and count the totals

SELECT record.eventName, record.eventSource, record.userIdentity.arn, COUNT(*)
FROM
(SELECT record
FROM my_table_name
CROSS JOIN UNNEST(records) AS t (record)) AS records
WHERE record.recipientAccountId <> record.userIdentity.accountId
GROUP BY record.eventName, record.eventSource, record.userIdentity.arn
ORDER BY COUNT(*) DESC
LIMIT 20;

Document Reference: http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference.html

Using AWS Athena to Query CloudTrail Logs was originally published by Thomas Vachon at Thomas Vachon on January 26, 2017.

NTP Peerstats Status Word Secret Decoder Ring

2014-05-16T04:00:00+00:00

In /var/log/ntpstats/peerstats you see lines like (I put them in table form for readability - normally space delimited)

Day	Seconds	Peer IP	Peer Status Word	Offset	Delay	Dispersion	Skew (variance)
56791	36043.625	10.39.32.12	8023	-0.000106166	0.000316335	7.946282622	0.000000119
56791	36824.626	10.39.32.11	9034	0.000068454	0.000453367	7.937500123	0.000000119
56791	36839.626	10.39.32.12	9034	-0.000027949	0.000240638	7.937500121	0.000000119
56791	37082.626	10.39.32.12	9034	-0.000047201	0.000307433	3.938467683	0.000115655
56791	37108.626	10.39.32.11	8023	-0.000128532	0.000392425	7.937500122	0.000000119
56791	37110.626	10.39.32.12	9034	-0.000071405	0.000344577	3.937507683	0.000057128
56791	37112.626	10.39.32.11	8023	-0.000142907	0.000267320	1.937515213	0.000051571
56791	38177.626	10.39.32.12	964a	-0.000114107	0.000233899	0.007741648	0.000038685

Decoding Peer Status Word

The Peer Status Word is a multi-byte disaster which represents things to NTP. Normally ntpq -c “associations” shows you this in english but to debug you need to look at the peerstats file which does not reveal.

Decoding the first byte

Using the table below you get the value. For instance 9XXX status means 10+80 (reachable and configured in ntp.conf). 8XXX means It is configured but it is not reachable (80).

Code	Message	Description
08	bcst	broadcast association
10	reach	host reachable
20	authenb	authentication enabled
40	auth	authentication ok
80	config	persistent association

Decoding the second byte

Given 96XX the host is reachable and configured (as seen above) the second byte from the table below means it is the system peer (you will also see a * next to it in ntpq -p)

Another example from the sample above is "8023" this means it is configured but it is NOT reachable and it is discarded (see bolded 0 meaning sel_reject)

Code	Message	T	Description
0	sel_reject		discarded as not valid (TEST10-TEST13)
1	sel_falsetick	x	discarded by intersection algorithm
2	sel_excess	.	discarded by table overflow (not used)
3	sel_outlyer	-	discarded by the cluster algorithm
4	sel_candidate	+	included by the combine algorithm
5	sel_backup	#	backup (more than tos maxclock sources)
6	sel_sys.peer	*	system peer
7	sel_pps.peer	o	PPS peer (when the prefer peer is valid)

Decoding the third and fourth byte

The third byte is the count of occurrences of the 4th byte (event code) and yes that seems backwards to normal thought

The fourth byte as seen below is the event code which is counted by the third byte.

Given the example of "941a" from above we know its configured and reachable its a candidate and there has been one occurrence of becoming a system peer (1 times of a)

Another example is "8023" from above we know its configured but its unreachable and there has been two occurrences of it being unreachable (2 times of 3).

Code	Message	Description
01	mobilize	association mobilized
02	demobilize	association demobilized
03	unreachable	server unreachable
04	reachable	server reachable
05	restart	association restart
06	no_reply	no server found (ntpdate mode)
07	rate_exceeded	rate exceeded (kiss code RATE)
08	access_denied	access denied (kiss code DENY)
09	leap_armed	leap armed from server LI code
0a	sys_peer	become system peer
0b	clock_event	see clock status word
0c	bad_auth	authentication failure
0d	popcorn	popcorn spike suppressor
0e	interleave_mode	entering interleave mode
0f	interleave_error	interleave error (recovered)

NTP Peerstats Status Word Secret Decoder Ring was originally published by Thomas Vachon at Thomas Vachon on May 16, 2014.

Using Your Mac's Screen Remotely Without People Watching

2012-04-03T04:00:00+00:00

Long since property of Apple Remote Desktop Enterprise, the ability to remote into your computer without leaving the screen on for all to see has finally shown up, but its in 10.7 only.

Open terminal and run open vnc://yourcomputername

After that is open, click "View" in the menu bar and the click "Switch to Virtual Display". This turns off your iMac screen and allows you to work without watching or playing pranks.

In ARD it was called "Curtain Viewing", I'm very happy to see this finally become a much needed item for everyone

Using Your Mac's Screen Remotely Without People Watching was originally published by Thomas Vachon at Thomas Vachon on April 03, 2012.

New Site

2012-03-08T05:00:00+00:00

Yup it is about that time again, I got bored and redesigned my site. Hopefully this will get me to write more posts. You can expect to see posts about AWS, Mobile, General Tech Items, or anything else I think is important enough to put into the permanence of cyber-space. Stay tuned...

New Site was originally published by Thomas Vachon at Thomas Vachon on March 08, 2012.

Adding VMNet's in VMWare Fusion 4

2011-09-14T04:00:00+00:00

With the release of VMWare Fusion 4 (and its CONTINUED lack of GUI for network manager), I bring you the instructions on how to add networks to VMWare Fusion 4 (now that I can write about it).

In good news, you no longer have to fully restart the network stack via boot.sh, just restarting Fusion will dynamically pick up the changes

All network configuration files are now found in /Library/Preferences/VMware\ Fusion

The networking file contains information about the VMNET's and will be where you do most of your configuration.

For example if you want to create a VMNET4 and have no DHCP and host only networking, you would append to the networking file

answer VNET_4_DHCP no
answer VNET_4_HOSTONLY_NETMASK 255.255.255.0
answer VNET_4_HOSTONLY_SUBNET 172.16.128.0
answer VNET_4_VIRTUAL_ADAPTER yes

Now you HAVE to edit the .vmx of your VM directly.

Something along the lines of:

ethernet0.connectionType = "custom"
ethernet0.vnet = "vmnet4"
ethernet0.bsdName = "vmnet4"
ethernet0.displayName = "Custom Host Only VMnet4"

Special things to note, do NOT use the GUI network selector after you do this and your network will always be grey, don't worry.

If you want to create a NAT'ed network with now DHCP you would do the same as the above, however, the easiest way to set it up would be copy vmnet8/ to vmnet#/ and remove the dhcpd.conf and dhcpd.conf.bak. Edit the nat.conf to have the appropriate subnet and vmnet (also at the bottom you can create port forwarding if you would like). Also edit the nat.mac and update it to something with the same 3 first groupings and change the last three to something unused on your system.

Adding VMNet's in VMWare Fusion 4 was originally published by Thomas Vachon at Thomas Vachon on September 14, 2011.

Security in the 'Cloud'

2011-09-13T04:00:00+00:00

As many of you know I am a very big proponent of using the cloud with high automation. At my job we do this in a big way. However, one question always comes to mind, if you share your servers with other physical machines; how can one guarantee security?

In short you can't, but there are things you can do.

When I say you "can't" guarantee security, I mean that anyone could find a "local" exploit in the hypervisor, just look here if you think hypervisors are secure.

Then you must be thinking to yourself, I am basically screwed and will never be able to deploy in the cloud. You would be wrong. The high automation I prefer using (e.g. puppet) reports on changes to files which you control and you control what servers you automate via the PKI system. For very important systems, things like OSSEC are great tools.

One thing people should be very aware of is what Amazon calls "Security Groups". They are akin to a virtual firewall and even site between servers of the same layer. So by nature, if you have two servers in one security group, they will be unable to SSH to one another or even ping unless you explicitly allow it. That was a fantastic decision by Amazon and really helps security engineers facilitate a more secure environment.

Amazon in particular, in my opinion, has done an excellent job looking at security from the worst-case scenario. You can go from as locked down as a VPC cluster, to as open as a wide-open security group. It is your choice how good or bad at security you are. Also, since the Elastic Load Balancer product belongs to a security group, you get an extra layer of firewalls by only letting those load balancers to direct access the web servers (removing some external DDoS attacks). Amazon Web Services has also been verified as a PCI Level 1 Service Provider. I can say from experience that it is a very difficult thing to do and an extremely big commitment from Amazon do to on such a massive scale.

In a future posts I will write about how to best architect a cloud system for minimal failures and how to put your worst fears onto paper in the very important Recovery Time Objective/Recovery Point Objective Disaster Recovery document from a cloud setting.

Security in the 'Cloud' was originally published by Thomas Vachon at Thomas Vachon on September 13, 2011.

IPv6 Redux

2011-04-14T04:00:00+00:00

I have 6to6 capability at my house, yet I had noticed the lynch pin in fast 6to6 browsing in local dns resolution. I have an Airport Extreme, but it refuses to hand out public DNS servers which I put it in, and it runs its own version of the dscaching daemon built into OSX. This daemon in OSX works as intended, but on the ABES 6to6 resolution can take over 15 seconds. Doing directly lookups by forcing external native v4 and v6 DNS servers in my Mac's DNS server configuration (airport in this case), eliminates the problem.

All the Googling I have done shows there is no way to force the ABES to give out designated DNS servers via DHCP and for it NOT to handle DHCP (but to do 4to6 and 4to4 NAT). This is a pitty and something Apple should address, or get its resolver more IPv6 savvy.

IPv6 Redux was originally published by Thomas Vachon at Thomas Vachon on April 14, 2011.

IPv6

2011-02-19T05:00:00+00:00

We all have heard quite a bit about IANA running out of IPv4 addresses. While it will be a while until the affects are fully felt, I am doing my part and adding AAAA records to my website. IPv6 is the future whether we are ready or not, there is no time like the present to start thinking about it.

IPv6 was originally published by Thomas Vachon at Thomas Vachon on February 19, 2011.

Puppet Continuous Integration

2010-08-13T04:00:00+00:00

Puppet is an amazing configuration management system as I have previously written, but one downfall is that no system exists where you check in code, it runs, and if it fails, it alerts. Continuous Integration is a very important thing to have. It saves dev and production environments from being destroyed or otherwise screwed up. After searching all over the web, I was unable to find anyone who has done a full CI system for puppet, so I developed my own.
My CI system consists of 3 parts: The Foreman, XenServer, and Git. I have a cron job which runs every 5 minutes and pulls down the latest code from the "central" git server. I have multiple people merging into this server from multiple companies, for CI was a must-have for us.

If the code on the server is newer than the code on the client, I update the code and rsync it from its staging directory into /etc/puppet on the puppetmaster.

The biggest problem with puppet is that you can't just run syntax checks against the system. Puppet is a stateful system which requires catalogs to be run against the same types of servers that exist in your target environments. My answer? Virtualization.

I have 3 VM's running (one per a "role"), on my XenServer. Each one simulates how the systems are designed, named, and used in production.

After the run is kicked off and the CI sees new changes and they are applied, you need to be able to figure out what worked, what didn't, and get alerted as to the breakage. The Foreman comes into play. The Foreman, for those who don't know, is a Web UI to Puppet reports. It can perform many other functions like complete unattended kickstart installs, but that is not what I needed it for. The Foreman runs and analyzes runtime analytics as well as states. If for some reason the puppet run fails on the client, it will immediately email the failure to a mailing list I have setup.

The system has been tested and I have only encountered one problem which I have opened a bug with The Foreman team on, it will not detect puppetmaster catalog compile errors.

All in all, this system will allow multiple sysadmins to be committing and working on various modules at the same time and the code to be validated in an automated fashion. Still, as a good practice as it is, a CI system should never fully replace a set of second human eyes.

Puppet Continuous Integration was originally published by Thomas Vachon at Thomas Vachon on August 13, 2010.

Cron Job to Ensure Your Puppet Clients Stay Happy

2010-07-22T04:00:00+00:00

I wrote a Perl script which is used in combination with cron to make sure that puppet clients don't stray too far from their master. The script can be found here and is available under the GPL v3.

Cron Job to Ensure Your Puppet Clients Stay Happy was originally published by Thomas Vachon at Thomas Vachon on July 22, 2010.

Adding VMNet's in VMWare Fusion 3

2010-02-03T05:00:00+00:00

This problem has come up a couple times and I figured out how to do it. It isn't a pretty thing to do, but it works.

First, open your terminal and go to /Library/Application\ Support/VMware\ Fusion/

sudo ./vmnet-apps.sh --stop

If you want a host only net, cp -R the vmnet1 folder, if you want a NAT network cp -R the vmnet8 folder. Name the new folder vmnetX where X is your new network name.

Edit the files inside. There is a dhcpd.conf which must be changed to suit your needs, if it is a nat network there is a nat.conf and a nat.mac. Change these to match the dhcpd.conf changes in networks.

Now edit the networking file. If you want a host only network, copy the VNET1 entries, if you want NAT copy the VNET8 entries. Paste and modify the entries to match your vmnet folder's #.

Now delete the VNET_X_DHCP_CFG_HASH line, it will auto-regenerate.

Edit the lines to match the network etc. If you want the Mac to NOT have a connection (aka a self contained vm network) set VNET_X_VIRTUAL_ADAPTER to no

Now run sudo ./vmnet-apps.sh --start

Do an ifconfig and make sure your new vmnet is up and correctly configured.

Now go into ~/Documents/Virtual\ Machines.localized/

cd into your VM you want to mess with (note ADD THE ADAPTERS IN THE UI FIRST)

Modify the .vmx of the guest

From source:

For a VM with VMware tools installed:

ethernet0.present= "true"
ethernet0.startConnected = "true"
ethernet0.virtualDev = "vmxnet"
ethernet0.connectionType = "custom"
ethernet0.vnet = "vmnetX"

For a VM without VMware tools:

ethernet0.present= "true"
ethernet0.startConnected = "true"
ethernet0.virtualDev = "e1000"
ethernet0.connectionType = "custom"
ethernet0.vnet = "vmnetX"

Adding VMNet's in VMWare Fusion 3 was originally published by Thomas Vachon at Thomas Vachon on February 03, 2010.

How to Upgrade a Cisco Pix 515 With Serial Failover From 6.3 -> 8.0

2009-10-03T04:00:00+00:00

Well it sounds simple doesn't it? Cisco says you reload the OS, you make a couple changes and voila, you have a working Pix 515 running the latest and greatest code (which by the way is the same code those ASA's run which cost quite a bit more). Well, not so fast.

First of all, make sure you meet the requirements for running anything over 6.3. This means a 515 or higher Pix (I do not recommend 515 but 515e's as the minimum as the code is much heavier and the Pentium II in the 515's are slower). Also, you need to have enough room on your flash. If you don't use the god forsaken Pix Device Manager (which by all accounts no one ever should) you are fine. Finally, you need RAM. Luckily, as long as you are not covered my SmartNet, feel free to crack open your Pix to reveal its true nature. It runs a Intel motherboard and PC-100 RAM. It supports a maximum of 256 MB (2x128mb) and RAM is cheap so go for it and upgrade it to the max. One caveat, is that you MUST run an unrestricted license to support 256 MB of RAM. I was able to upgrade a restricted version (as 128 MB is the minimum, but I soon found its flash chip was fried and bought a replacement 515e off the used market).

Ok so you pass the pre-req's. Now what to do, well you need 2 separate OS images. You need 7.2 and an 8.0 or greater. They are available on Cisco's website for registered users. Also, you need to make sure if you are stateful failover you have a free ethernet interface or sub-interface for replication (which I haven't done yet).

Now on to the procedure, Cisco's website is a little fuzzy on how to do this on a pair of failover 515's so this will be of the best use to you. This is certainly a maintenance window activity as doing it incorrectly will cause arp poisoning and other awfulness.

First, BACK UP YOUR CONFIG! (not that this has to be said) Then disconnect the serial cable between the two Pix's. Start the upgrade on the Primary Pix (the one with the Primary side of the Serial cable). Upgrade via: copy tftp: flash: from 6.3 to 7.2. The Pix will start complaining about re-writing rules, this is ok right now. One you are at the prompt, write your config and reboot again. From here you can now go to 8.0 via: copy tftp: flash:image.bin

Reboot the Pix again and you will be in 8.0. You may get some warnings about stateful failover (how to solve that hopefully coming later). Any other warnings should be looked and and confirmed as ok or fixed. Errors must be fixed at this point as well. Now comes the tricky part. For every interface which has a standby IP associated re-input the ip address line without the standby ip. Also make sure ALL failover lines are gone. Save your config and now its time to move to the second Pix.

This time the upgrade starts off a bit differently. Make sure the serial cable is disconnected (as it already should be) and write erase. You want a blank config for this. Reload and do the same 6.3 -> 7.2 (don't bother saving the config this time) and then 7.2 -> 8.0. At this time write erase again to be sure its a clean Pix. Power off the Secondary Pix and connect the serial cable on both ends.

Now put your additions back on your ip address lines (yes, you have to type it all out) and wr your config. Now do a show fail. It should report partner is powered off. This is correct as it should be. Finally in configure mode type "failover" on the Primary Pix. Boot up your Secondary Pix and go into configure and type "failover". Magically you should see "show fail" pair up and start replicating the conf over the serial link to the blank standby unit.

Once everything is up and good, you have upgraded from 6.3->8.0 and now have almost all the features of an ASA. This is a very worthwhile activity as it gives you a huge bump in features and ease of use. Once I get stateful failover working on a subinterface/Trunk, I will post how to finish off the job. However, do heed Cisco's warnings, doing stateful failover using a data bearing interface is NOT supported, it will not nat, blow away your acl's and every reference to that interface, just don't try it.

I hope this helps your upgrade go smoother than ours did (its only a mild concussion the doctor says from hitting our heads against the wall so much)

How to Upgrade a Cisco Pix 515 With Serial Failover From 6.3 -> 8.0 was originally published by Thomas Vachon at Thomas Vachon on October 03, 2009.

Dance Puppets, DANCE!

2009-05-04T04:00:00+00:00

What an odd title? What has to deal with puppets and system administration? Well, in fact there is a program called puppet. What is puppet? Puppet is a client/server software system put out by Reductive Labs which allows for simplistic management of *Nix (OS X included).

"So it manages things, but I can do that with some custom scripts." Well that is a sentiment I have run into in my current position, but it was quickly overcome when shown how puppet differs from other in-house scripts. Most in-house systems are a mix of OSS software combined into a single usable instance. For instance, using rsync to sync up scripts to a list of servers, and ssh to execute and read back any output. While that works on a homogenized environment where all conventions are followed, but what happens if someone spins up a new site where they didn't or more likely could not follow such conventions. This is the problem I found myself in. I inherited a EC2 environment and due to the way EC2 servers are built and the OS version we had to run, none of our pre-made scripts would work. Thus begins my adventures in configuration management.

My first experiment was similar to our in-house system which was a combination of rsync and SSH systems which synced up the root file system to a copy of it on the "admin" server. It then would execute any necessary commands via ssh. The "system" did have package management, using a combination of dpkg --get-selections and ssh commands, but it was far from easily manageable and required a run per a server type. Overall, while the system worked, it was far from scaleable.

Thus began my adventure of looking at configuration management solutions. The three that stood out were cfengine, puppet, and bfcfg2. While cfengine and puppet are the closest related, there are some significant differences in design/philosophy which set them apart. Most importantly, you have to clearly define different OS'es in cfengine which puppet just handles. More info can be found here. bfcfg2 was not selected for a variety of reasons. It lacks a good way to bundle up servers into classes, which some workarounds have been developed. More importantly, its configuration language is not easily understandable by multiple people. It is written in XML and would require extensive comments to make it clear to a multi-person operations team.

This left me with puppet. Puppet comes highly recommended by several sources including Digg and Google, both of whose recommendations are not easy to come by. Digg likes to use some OSS software but build upon it. They did not do this for puppet which is a testament of its features and flexibility. Google uses puppet to manage all their linux desktops and will be expanding it in the near term to whole data centers. So what is so great about puppet? Some of the best things is its mutliplatform abilities, code re-usability, and just plain readability/codeability.

Ok, so you've ranted about puppet, what's the big deal? I can do this type of stuff in my sleep. Sure you can, but can you install a package on hundreds of servers in a matter of minutes? If you have scripts pre-written, no problem. What if you want to install a new server and make its package versions match EXACTLY every other server and you only have 1 hour to do it, well you are pretty certainly screwed, unless you have a puppet system set up. A timed install of a server using apache2, rails, mod_rails, and about 15 other gems, takes my puppet install all of 10 minutes. This is the beauty of puppet.

So what does the configuration look like? We'll its very similar to cfengine, as puppet is an outgrowth, but it also runs on Ruby on Rails, so you have the power of ERB templating at your fingertips. Let's start with a simple script to ensure SSH is installed and running on boot and its configuration files are in place.

File: ssh.pp

  class sshd{
    file { "/etc/ssh/sshd_config":
    owner => root,
    group => root,
    mode => 0444,
    source => "puppet:///files/sshd_config"
    notify => Service["ssh"]
  }
  file{ "/etc/ssh/ssh_config":
    owner => root
    group => root
    mode => 444
    source => "puppet:///files/ssh_config"
    notify => Service["ssh"]
  }
  service{ ssh:
    ensure => running
  }
}

Wow, thats kinda cool right, but what does it all mean? Well puppet is broken down into 3 major building blocks, the node, which is the server entry in a file called nodes.pp; the class, which is a container for a bunch of stuff; and finally the resource, which is the meat and potatoes of the system. The resources, as seen in this example are the "file" resource and the "service" resource.

The file resource can take a bunch of options, the most interesting one here is source. the puppet:/// tells the client (which parses these manifests) to look at a embedded webrick server (which the puppetmaster runs) and grab the file from there. It then places it at the file path specified in the first line of the resource. The notify line says, if this file is updated, restart ssh.

The service (which has to be in the same class as any notify declarations) in this case says that it has to be running on boot. Puppet is intelligent enough to know if something has to be running on boot, it also needs to be installed. Note, the package name HAS to match the installable version for you OS. Now you say, well how is THAT portable, well watch this, change your service delcaration to match

service{ ssh:
  name => operatingsystem ? {
    'debian' => "ssh",
    'centos' => "sshd"
  },
  ensure => running
}

This queries the underlying OS and installs the appropriate package. You can do similar things for the “path” attribute if you need the ssh configurations in a different spot.</p>

So that is a quick overview of how puppet works. I will be going more in-depth on how I have chosen to deploy it in a later post.

Dance Puppets, DANCE! was originally published by Thomas Vachon at Thomas Vachon on May 04, 2009.

Where Has the Time Gone...

2009-04-12T04:00:00+00:00

Wow, I certainly have not updated this in forever. I should get back to this, most likely with a better theme soon. I need to update the resume as well. Stay tuned...

Where Has the Time Gone... was originally published by Thomas Vachon at Thomas Vachon on April 12, 2009.

Why Linux?

2008-05-08T04:00:00+00:00

Linux is now rapidly becoming the operating system of choice in many core areas of business. It is transforming information technology in many exciting ways from being used in products ranging from cell phones and PDAs to cars and mainframe computers. In addition to being cost-effective, it is constantly being updated and refined with the latest technologies. As Linux gains greater acceptance in todays Information and Communication Technology, more and more companies are supporting Linux both application and hardware compatibility.

Like its many uses, Linux has a variety of printed and electronic guides to show you what to do. The specialist guides are highly detailed focusing on narrow areas of excellence. The encyclopedic guides for beginners focus on Linux fundamentals and then only introduce you to more specialized topics. Everyone can start learning this spectacular and versatile Operating System from beginner users to having the confidence of an expert.

Types of Linux:

Distribution PCLinuxOS - (Desktop Linux)
Home Page http://www.pclinuxos.com/
Mailing Lists http://docs.mypclinuxos.com/Mailing-list
Documentation http://docs.pclinuxos.com/

Distribution Ubuntu - (Desktop/Server Linux)
Home Page http://www.ubuntu.com/
Mailing Lists http://lists.ubuntu.com/mailman/listinfo/
Documentation https://wiki.ubuntu.com/UserDocumentation

Distribution
openSUSE - (Desktop Linux)
Home Page http://www.opensuse.org/
Mailing Lists http://en.opensuse.org/Communicate/Mailinglists
Documentation http://en.opensuse.org/Documentation

Distribution Fedora Project - (Desktop Linux)
Home Page http://fedoraproject.org/
Mailing Lists http://fedoraproject.org/wiki/Communicate
Documentation http://docs.fedoraproject.org/ http://fedoraproject.org/wiki/Docs

Distribution Debian GNU/Linux - (Desktop/Server Linux)
Home Page http://www.debian.org/
Mailing Lists http://lists.debian.org/
Documentation http://www.debian.org/doc/

Distribution Mandriva Linux - (Desktop Linux)
Home Page http://www.mandrivalinux.com/
Mailing Lists http://www.mandriva.com/en/mailing_lists
Documentation http://www.mandriva.com/en/community/users/documentation

Distribution CentOS - (Server Linux)

Home Page http://www.centos.org/
Mailing Lists http://www.centos.org/modules/tinycontent/
Documentation http://www.centos.org/docs/

Distribution KNOPPIX - (Desktop Linux)

Home Page http://www.knoppix.com/
Mailing Lists http://lists.debian.org/debian-knoppix/
Documentation http://www.knoppix.net/docs

Why Linux? was originally published by Thomas Vachon at Thomas Vachon on May 08, 2008.

Restricting Login in Linux

2008-05-08T04:00:00+00:00

When we talk about forcing a user to log off, what we're really talking about is time restrictions on certain account system access and services. The easiest way I've found to implement time restrictions is by using software called Linux-PAM

Pluggable Authentication Module (PAM) is a mechanism for authenticating users. Specifically, we're going to use the pam_time module to control timed access for users to services.

Using the pam_time module, we can set access restrictions to a system and/or specific applications at various times of the day as well as on specific days. Depending on the configuration, you can use this module to deny access to individual users based on their name, the time of day, the day of week, the service they're applying for, and their terminal from which they're making the request.

When using pam_time, you must terminate the syntax or rule in the /etc/security/time.conf file with a newline.

Always remeber that pound sign [#] is a comment and the system will ignore that text inline to it.

This is an example configuration file for the pam_time module.
The syntax of the lines is as follows:

services;ttys;users;times

The first field services = list of PAM service names.
The second field tty = logic list of terminal names.
The third field user = is a logic list of users or a netgroup of users.
The fourth field times = indicates the applicable times.

Here's an example of a typical set of rules:</p>

http ; \* ; !root; 0800-2000

ftp ; \* ; !root; 0800-2000

These rules restrict user ron from logging on between the hours of 0800 and 2000. They also restrict http and ftp access during these hours.

Root would be able to logon at any time and browse the Internet during all times as well.

Restricting Login in Linux was originally published by Thomas Vachon at Thomas Vachon on May 08, 2008.