I'm really, really, happy about this. I've been complaining about the lack of cloud servers with exposed performance counters to any cloud vendor that'll listen (though of course nothing ever came of that). Kudos AWS, this is really cool.
Thanks! Would love to hear more about the counters that your interested in. We've exposed more in C5 than in previous instance types and we are trying to make more available over time in a safe way.
- General performance analysis. For this more counters is generally incrementally better.
- Running https://github.com/mozilla/rr. This requires the retired-branch-counter to be available (and accurate - sometimes virtualization messes that up)
The second one I actually care more about, because I've pretty much stopped trying to debug software when rr is not available, too painful ;). Feel free to email me (email is in my profile) for gory details.
For the benefit of anyone reading this, KVM and VMWare virtualization generally work. Xen has problems because of a stupid Xen workaround for a stupid Intel hardware bug from a decade ago. I can provide more details about that via email (in my profile) if desired.
Seconding paulie_a, We're running a Xen stack right now and I haven't heard of this. We've worked around a few nasty bugs with Xen and linux doms already, but I'm wondering if we have this problem you're referring to and don't even know it.
One of the things the performance monitoring unit (PMU) is capable of doing is triggering an interrupt (the PMI) when a counter overflows. When combined with the ability to write to the counters, this lets you program the PMU to interrupt after a certain number of counted events. Nehalem supposedly had a bug where the PMI fires not on overflow but instead whenever the counter is zero. Xen added a workaround to set the value to 1 whenever it would instead be 0. Later this was observed on microarchitectures other than Nehalem and Xen broadened the workaround to run on every x86 CPU. Intel never provided any help in narrowing it down and there don't seem to be official errata for this behavior too.
This behavior is ok for statistically profiling frequent events but if you depend on exact counts (as rr does) or are profiling infrequent events it can mess up your day.
rr works fine on multithreaded (and multiprocess) applications. It does emulate a single core machine though, so depending on your workload and how much parallelism your application actually has it might be painful.
Even though they are billed hourly, the deployment times (hours, last time I checked) make it not a real replacement as cloud servers. Scaleway servers deploy in seconds and packet.net in minutes.
I am a customer of packet's, along with other virtual and dedicated hosting providers. I don't use aws ec2.
I've been pleased with Packet, and their offerings are much more diverse than this initial offering from aws.
I just now took a look at Packet's web site and their data center locations. They categorize each location as either "core" or "edge", but I couldn't find anything to indicate what those terms mean in this context. Are you familiar with that distinction?
The location nearest me is an "edge", not a "core". I wonder what I would be missing out on, if it's not "core".
Also a happy Packet customer. We use their small instances for things like service monitoring (where VM pauses cause false positives) and for routing infrastructure where bare metal is required to achieve VoIP-acceptable jitter. They’re also one of the few cloud hosting providers to support BGP.
Scaleway.com also offers baremetal servers at a really attractive price. The CLI is just awesome, it's great to see other cloud providers joining the game.
You end up having to build your own moving parts...
Building a high-availability metadata store is not easy. And ensuring that incoming request IPs aren't spoof is a little non-trivial to reason about.
UserData is a good way to provide a one-time token that can be used to fetch data...
Using SSH for provisioning is just plain dirty... and almost impossible to do reliably... You'll need global locks and timeouts to recover in case one of your master crashes... Plus some garbage collection to cleanup things that where not fully provisioned.
This is a LOT of unreliable state to manage. And ton of corner cases. Having the right architecture matters for reliable automation.
Impressive hardware but I wonder what will be the cost considering even the regular VMs of EC2 are generally more expensive than dedicated offerings of other providers.
Then it needs to be rewritten, because it is impossible to tell what the machine specs are (none of them have GPU specs listed) and there is no documentation on which tests are run under the GPU and which are not. The repos contain vague information on GPU options, but there is no information on what was used in the tests.
There is nothing in this article that has any information on GPUs. It doesn't even list the actual machine instances used (would not the AWS tier name be useful here, for example?).
What differentiates these from dedicated boxes in server rack? Is their dedicated "cloud" hardware somehow managing access to RAM/storage/etc?
On another tangent - how do Google Cloud and EC2 attach GPUs to instances - given that you can choose CPU and RAM the GPUs must somehow be modularized away from a dedicated server?
is there any information about nitro or ENA (assuming this is the "hardware accelerators" that are mentioned in tfa) that is publicly available? it seems like the most nifty little thing
> how do Google Cloud and EC2 attach GPUs to instances - given that you can choose CPU and RAM the GPUs must somehow be modularized away from a dedicated server?
Rack A of servers has a base_server_x. Rack B of servers is base_server_x + GPU_Y.
You ask for no GPU, you get a server from rack A. You ask for a GPU, you get a server from rack B.
With them leaning bare-metal and low cost, I wonder if services like these could be used to bootstrap clouds in VAR form for niche OS's. Might be useful at the least for getting bugs out of the virtualization software using diverse workloads. If costs kept minimal, might even be profitable if the niche OS has enough users.
It's exactly the same as with the i3.16xlarge instance type. There are eight 1900 GB drives. In an i3.16xlarge, those eight drives are passed through to the instance with PCIe passthrough but for the i3.metal instance, you avoid going through a hypervisor and IOMMU and have direct access.
- If one of those drives fails, will Amazon hotswap them out, or do you need to migrate to a new instance (moving TBs of data to a new box without causing outages can be painful.)
- Is there a hardware RAID controller for those drives, or is it software only?
- Can anyone with access to one of these boxes produce some IO performance stats on them? Bonus points for stats on single drive vs concurrent across all drives (i.e is there any throttling). More points for RAID10 performance across the whole 8.
The local NVMe storage for i3.metal is the same as i3.16xlarge. There are 8 NVMe PCI devices. For i3.16xlarge those PCI devices are assigned to the instance running under the Xen hypervisor. When running i3.metal, there simply isn't a hypervisor and the PCI devices are accessed directly.
- There is no hot swap for the NVMe storage.
- The 8 NVMe devices are discrete, there is no hardware RAID controller
- Anyone can get I/O performance stats on i3.16xlarge as a baseline. Intel VT-d can introduce some overhead from the handling (and caching) of DMA remapping requests in the IOMMU and interrupt delivery so I/O performance may be a bit higher on i3.metal, with a few microseconds lower latency.
For all this progress the billing on AWS is so damn confusing to figure out if some machine is left on unused that I won’t use AWS again. GCE and Azure miles ahead here.
Most servers have some sort of "lights out" management, which gives KVM + remote imaging and bios control.
With amazon, they have complete control over the network in and out, so cutting you off and re-imaging a server is pretty trivial.
To be fair, its not that hard to do even if you're not amazon.
Most of the big server vendor's out of band interfaces have an API, so telling a server to reboot from a network image is pretty trivial. Providing a netboot infrastructure to install images with a 'userdata' script is also not that difficult.
you'll need a DHCP server, tftp to serve the boot image, and usuaally an NFS server to pull the rest of the image over. With some engineering work that could be made to use HTTP.
It's a bit harder if you host something like this for the general public to use (vs administrating machines in your private DC). Normal setups aren't really hardened against someone flashing firmware, messing with UEFI, ..., all of which mean you can't entirely trust a machine coming back from customer control. I wouldn't be surprised if Amazon took this seriously and invested effort in stopping such things. At their scale, they probably can customize the hardware enough.
Everyone who sells bare metal as a service takes this seriously. As AWS build their own hardware, especially in these newer machines, I would guess that its not possible to flash firmware from the user machine, only from the control node.
EC2 Bare Metal instances boot from an EBS volume that is accessed via a NVMe PCI device (implemented in ASICs built by Annapurna Labs), just like virtualized C5 instances.
NVMe is just how the storage is surfaced -- the hardware programming interface for the block device. Hardware iSCSI initiators (HBAs) also have a hardware programming interface, but at the end of the day you talk SCSI over that interface.
NVMe is a better match for the the storage operations supported by EBS. A bonus is that by surfacing EBS over NVMe there is a common storage interface for both managed storage volumes and local NVMe storage.
These were my exact same thoughts. I suppose its almost like a step back from the framework of "virtualize everything"... what's old is new..
addon thoughts: nonetheless, the specs on the bare metal box are ridiculous. buying something like that will cost you $50k (someone correct me?) - then you need to find a place to host it... thats not easy to do.
Because they're still virtualizing literally everything but the actual computer. You can attach NVMe backed EBS volumes, snapshot them as normal, etc. You can have this thing exist in a vpc next to your virtualized components, with 25gbps dedicated link. They're virtualizing the things you shouldn't need to care about, leaving you with a free Cpu and access to all the things that make aws aws
Since EC2 Bare Metal instances will use the same pricing models as all other EC2 instances (on demand, reserved instances, dedicated host, spot), the same information is relevant.
Will there be smaller instances available eventually? I'm interested in bare metal performance but I don't need an instance that huge for my current workload.
Our goal is to for the majority of virtualized EC2 instances to be indistinguishable from bare metal (if not better). In most CPU and memory intensive benchmarks there is very little difference between an virtualized EC2 instance and bare metal, especially for smaller numbers of cores and memory sizes.
Not quite: this is cloud-provisioned so you can do things like supply your own image and it integrates with all the other AWS services like virtual machines do. Provisioning is automated and self-serve. Also per-second billing which you couldn't get in the olden days with hosting.
Blackhats, state actors, etc all trying to attack Amazon or colocated services. As an example (I don't know the extent of "bare metal" access, so I couldn't be sure) with the ability to run their own operating system, a client could potentially get all the way down to the NIC to form arbitrary network packets. With this they could potentially map and attack Amazon's internal network protocols (routers, etc). Any kind of vulnerability within Amazon's software stack on other servers now gets a whole lot worse. If the client did this at a very low rate, it would be difficult to detect. Firewalling off these servers only helps so much, since they could still attack colocated servers of other clients, or could potentially spoof the protocol of Amazon's own server management.
I hope they have thought this through carefully, because it potentially exposes everyone on EC2 to more, potentially worse, attacks.
The NIC that is used by EC2 Bare Metal instances is an Elastic Network Adapter (ENA) PCI device that surfaces a logical VPC Elastic Network Interface. ENA is implemented in an ASIC that we design and build.
When ENA is used in virtualized instances, Intel VT-d and SR-IOV are used to bypass the hypervisor. When ENA is used in a bare metal instance, the OS simply has direct access to the PCI device. In either case the device is a controlled surface, and VPC software defined networking deals with verifying and encapsulating network traffic.
That's completely off topic. In fact, the question is so broad that I cannot think of anyplace other than the water cooler or Quora to ask it.
Career advice: Never go "foobar-only". Make an effort to learn "foobar" but understand whatever is one layer below it in the stack. Want to go "cloud-only"? Learn OpenCloud, not AWS.
It's definitely worthwhile to learn Lambda, S3 and serverless apps but all that stuff can be learned on the job. S3 is especially easy to use for most use-cases and any decent programmer can learn to use it in an hour or two.
However, I would definitely learn a SQL dialect and learn how RDBMSes such as Postgres work (especially what is meant by ACID) as most companies are based around a database. Don't believe the hype - SQL is not dead. Dynamo is a great technology but there are many problems it can't solve for you.
Finally, I personally don't know Azure or GCP so well. Only knowing AWS in-depth hasn't held me back so far. I've used a few of Azure's services but I've never built a serious app on it.
My recommendation is to not really worry about individual technologies and to focus on safely handling and working with data.
I learned React and later React-Native. Selling myself as a "mobile consultant" then worked fine, nobody cared "how" I made these mobile apps.
My idea was the same with back-end, learning some framework and start selling myself as "mobile cloud consultant" or something, with the hopes that clients also don't care "how" I create these cloud back-ends.
I know SQL, worked most of my time with RDBMSs, so this wouldn't be big of an issue. As I said I already did a few back-ends, but my focus was on front-end, usability and such.
I just mentioned DynamoDB because I had the impression that it was "the AWS DB", do they offer an SQL service besides Redshift?
It allows you to launch many common database engines, which are managed and backed up by AWS. I've been using it for a few years and for my use-case it's great.
I knew that DynamoDB is a noSQL DB, I thought with the noSQL hype and everyone doing MongoDB/RethinkDB back-ends now, they would simply say "In the cloud you have to use this and that's it"
RDS somehow sounded like the Redis service of AWS, hehe.
Learning your way around cloud services is a great idea, but I would be hesitant about starting with Lambda and Serverless, or doing only that. It's somewhat of a different paradigm, kind of back-end for front-end developers, or at least people who don't want to deal with infrastructure. While that is a great thing, I think there is value in understanding what a more tradition webserver on AWS looks like with an EC2 instance, EBS volumes, AMIs, security groups, load balancer, SSH access, etc.