v1 - ami-0185c61124653544c

Propagate .NET HPC — Supercomputing EditionAMI

AMI
A Documentation & User Guide

Overview

~~Propagate .NET HPC is a production-ready Amazon Machine Image (AMI) built on~~performance-tuned Ubuntu 24.04 ~~LTS,~~LTS ~~purpose-configured~~image with .NET 6, 8 and 10 SDKs installed side by side, ready for ~~high-performance~~compute-intensive .NET ~~workloads~~workloads. onThe ~~AWS.~~system Itships ~~provides~~pre-tuned ~~three~~for high performance computing (Server GC, CPU governor, NUMA, huge pages, kernel and network tuning) and includes an optional containerized .NET ~~runtimes,~~app-hosting amode ~~full~~behind ~~parallel~~nginx-proxy ~~computing~~with ~~stack,~~Let's ~~kernel-level~~Encrypt.

~~performance~~

~~tuning,~~
Security ~~and~~is ~~integrated~~handled ~~management~~by ~~tooling~~AWS Security Groups only. There is no host firewall on this AMI — ~~eliminating~~open ~~days~~just ofthe ~~manual~~ports ~~setup~~you ~~and~~need ~~configuration.~~in your Security Group.

~~This AMI is designed for teams running scientific simulations, parallel number crunching, high-throughput server applications, and any workload where .NET performance matters.~~

What's InstalledRequirements

.NET SDKs & Runtimes

c7ilarge ~~performance~~

~~Version~~Resource	~~Type~~Minimum	~~Support Window~~	~~Use Case~~Recommended
~~.NET~~Instance ~~6.0.428~~type	~~SDK~~t3.large +(2 ~~Runtime~~vCPU)	~~Legacy~~Compute-optimized ~~(EOL~~c6i ~~Nov~~/ ~~2024)~~	~~Existing~~/ ~~codebases that haven't migrated~~hpc6a
~~.NET 8.0.125~~RAM	~~SDK~~4 ~~+ Runtime~~GB	~~LTS~~8 GB+ (~~through~~more ~~Nov~~for ~~2026)~~	~~Current production stable~~datasets)
~~.NET~~Root ~~10.0.104~~volume	~~SDK~~30 ~~+ Runtime~~GB	~~LTS~~30 GB+ (~~through~~3 ~~Nov~~SDKs ~~2028)~~use ~3 GB)
Data volume (EBS)	~~Latest~~optional	Attach ~~features,~~for ~~AOT~~persistent ~~compilation~~app/job data

For real HPC throughput, pick a compute-optimized or HPC instance family and attach a second EBS volume for your data.

First boot

On first launch the instance auto-configures via dotnet-hpc-firstboot.service:

No user-data → applies the throughput HPC profile, mounts any attached data volume, sets .NET 8 as default, and does not start any container. SSH in and start building immediately.

JSON user-data → applies your settings and (optionally) deploys a containerized .NET app. Example user-data:

{
  "dotnet_version": "8",
  "hpc_profile": "throughput",
  "hugepages_mb": 0,
  "enable_app": true,
  "app_image": "your-registry/your-app:latest",
  "internal_port": 8080,
  "domain": "app.example.com",
  "enable_https": true,
  "letsencrypt_email": "admin@example.com",
  "admin_user": "admin",
  "admin_password": "ChangeMe123!"
}

To configure (or re-configure) interactively at any time:

sudo bash /opt/dotnet-hpc/configure-dotnet-hpc.sh

After configuring, open a new shell (or source /etc/profile.d/dotnet-hpc.sh) so the .NET environment variables load.

Using .NET

All three ~~versions~~SDKs ~~are~~live under /usr/share/dotnet and dotnet is on the PATH.

dotnet --list-sdks         # show installed sideSDKs
bydotnet side.--list-runtimes     Switch# betweenshow theminstalled per-projectruntimes
usingdotnet new console -o app  # uses the default SDK

Set the default SDK for new projects (writes a global.json ~~file~~template):

sudo set a system-widedotnet-hpc default with10
propagate


dotnet-default <version>

A.NET ~~template~~ global.json6 is provided atfor /opt/propagate/config/global.json.

legacy

.NETcompatibility Globaland Profilingis Tools

past Microsoft'ssupportwindow. 8or10hpctune

~~Tool~~	~~Version~~	~~Purpose~~
~~dotnet-trace~~	~~9.0.x~~	~~Collect diagnostic traces from running~~Use .NET ~~processes~~
for new work. HPC tuning The image applies Server GC, the `performance` CPU governor, NUMA settings, raised file/socket limits and network buffer tuning. Switch profiles anytime: `sudo dotnet-counters`	~~9.0.x~~	~~Monitor~~throughput ~~real-time~~# max compute throughput (default) sudo dotnet-hpc tune latency # low/steady latency for services sudo dotnet-hpc tune balanced # general purpose Verify the environment and run a quick parallel benchmark: `dotnet-hpc status dotnet-hpc bench 8 # Monte Carlo Pi across all vCPUs` Optional .NET ~~performance counters (GC, threadpool, exceptions)~~
~~dotnet-dump~~	~~9.0.x~~	~~Capture and analyze process dumps for debugging~~
~~dotnet-gcdump~~	~~9.0.x~~	~~Capture~~large-page GC ~~heap snapshots for memory analysis~~

~~All tools are installed to~~ /usr/local/share/dotnet-tools ~~and symlinked to~~ /usr/local/bin ~~so they are available to all users without PATH configuration.~~

Parallel Computing Libraries

~~Library~~	~~Version~~	~~Purpose~~
~~OpenMPI~~	~~4.1.6~~	~~Distributed message passing for multi-process and multi-node parallel computing~~
~~OpenBLAS~~	~~System~~	~~Optimized Basic Linear Algebra Subprograms~~ (~~matrix operations, vector math)~~
~~LAPACK / LAPACKE~~	~~System~~	~~Linear algebra routines (eigenvalues, SVD, least squares)~~
~~FFTW3~~	~~System~~	~~Fast Fourier Transform library, including MPI-distributed FFT support~~
~~HDF5 (OpenMPI)~~	~~System~~	~~High-performance data format for large scientific datasets~~
~~Eigen3~~	~~System~~	~~C++ template library for linear algebra (for native interop via P/Invoke)~~

Performance Profiling Tools

~~Tool~~	~~Location~~	~~Purpose~~
~~Linux perf~~	~~System~~	~~Hardware-level CPU profiling (cache misses, branch prediction, cycles)~~
~~FlameGraph~~	~~/opt/FlameGraph~~	~~Stack trace visualization toolkit for generating flame graphs from perf data~~
~~BCC/eBPF tools~~	~~System~~	~~Dynamic kernel tracing and analysis without recompilation~~
~~dotnet-trace~~	~~/usr/local/bin~~	~~.NET-specific event tracing (GC events, JIT, threadpool)~~
~~dotnet-counters~~	~~/usr/local/bin~~	~~Real-time .NET runtime metrics~~
~~htop~~	~~System~~	~~Interactive process viewer~~
~~sysstat (sar, iostat)~~	~~System~~	~~System activity reporting and I/O statistics~~
~~numactl~~	~~System~~	~~NUMA policy control for process binding~~
~~hwloc~~	~~System~~	~~Hardware topology discovery and visualization~~

System Utilities

~~Package~~	~~Purpose~~
~~build-essential~~	~~GCC, G++, make — for compiling native interop libraries~~
~~cmake~~	~~Build system for C/C++ dependencies~~
~~git~~	~~Version control~~
jq	~~JSON processing from the command line~~
~~curl, wget~~	~~HTTP clients for downloading packages and data~~
~~zip, unzip~~	~~Archive management~~

Security

~~Component~~	~~Configuration~~
~~UFW (Uncomplicated Firewall)~~	~~Enabled. Default deny incoming, allow outgoing. SSH (port 22) and OpenMPI (ports 10000-10100) allowed.~~
~~fail2ban~~	~~Enabled. SSH brute-force protection with 3 max retries, 1 hour ban time.~~
~~SSH~~	~~Key-based authentication only. No default passwords.~~

Kernel & System Optimizations

CPU Performance

~~Setting~~	~~Value~~	~~Effect~~
~~CPU governor~~	~~performance~~	~~CPU runs at maximum frequency at all times. Eliminates frequency scaling latency that can cause inconsistent benchmark results and computation stalls. Configured via systemd service~~ `propagate-cpu-governor`.
~~CPU idle latency~~	~~Minimized~~	~~Reduces C-state transition latency by writing to~~ `/dev/cpu_dma_latency`~~. Configured via systemd service~~ `propagate-cpu-latency`.
~~Scheduler migration cost~~	~~Profile-dependent~~	~~Controls how aggressively the kernel migrates processes between CPUs. Higher values reduce migration (better cache locality).~~
~~Scheduler autogroup~~	~~Profile-dependent~~	~~Controls automatic task grouping. Disabled in compute-heavy profile to give the scheduler full control.~~
~~NUMA balancing~~	~~Disabled~~	~~Automatic NUMA page migration is turned off (~~`kernel.numa_balancing=0`~~). This prevents the kernel from moving memory pages between NUMA nodes during computation, which causes unpredictable latency spikes. Applications should manage their own NUMA placement using~~ `numactl`.

Memory

~~Setting~~	~~Value~~	~~Effect~~
~~Huge pages (2MB)~~	~~512 default~~	~~Pre-allocated~~reserve huge pages ~~reduce~~first, ~~TLB~~e.g. ~~misses~~2048 ~~for large memory allocations. Configurable via setup wizard or~~ `propagate hugepages <count>`~~. The recommended value is calculated during setup based on available RAM.~~
~~Swappiness~~	~~10 (default) / 1 (compute-heavy)~~	~~Controls how aggressively the kernel swaps memory to disk. Low values keep compute data in RAM.~~
~~Dirty ratio~~	~~40%~~	~~Percentage of RAM that can be filled with dirty (unwritten) pages before the process must write to disk.~~
~~Dirty background ratio~~	~~10%~~	~~Percentage of RAM with dirty pages before background writeback starts.~~
~~Shared memory max~~	~~64 GB~~	`kernel.shmmax` ~~set to 68719476736 bytes. Required for large MPI shared memory segments.~~
~~Shared memory total pages~~	~~4 billion~~	`kernel.shmall` ~~set to 4294967296. Total shared memory pages available system-wide.~~
~~Max memory map count~~	~~Profile-dependent~~	~~Increased to 1048576 in memory-heavy profile for applications that memory-map many files.~~

Network

~~Setting~~	~~Value~~	~~Effect~~
~~TCP receive buffer max~~	~~16 MB~~	`net.core.rmem_max=16777216`~~. Allows large TCP receive windows for high-throughput MPI communication between nodes.~~
~~TCP send buffer max~~	~~16 MB~~	`net.core.wmem_max=16777216`~~. Allows large TCP send windows.~~
~~TCP congestion control~~	~~HTCP~~	~~Hamilton TCP — designed for high-bandwidth, high-latency networks. Better throughput than default Cubic for inter-node MPI traffic.~~
~~MTU probing~~	~~Enabled~~	`net.ipv4.tcp_mtu_probing=1`~~. Automatically discovers the maximum segment size, avoiding fragmentation on networks with jumbo frames.~~
~~Backlog queue~~	~~30000~~	`net.core.netdev_max_backlog=30000`~~. Prevents packet drops during burst MPI communication.~~

Process Limits

~~Limit~~	~~Value~~	~~Why~~
~~Open files (nofile)~~	~~1,048,576~~	~~Large parallel jobs may open thousands of file descriptors simultaneously (sockets, data files, shared memory segments).~~
~~Max processes (nproc)~~	~~Unlimited~~	~~MPI applications spawn one process per core per node. No artificial limit.~~
~~Locked memory (memlock)~~	~~Unlimited~~	~~Required for MPI shared memory and RDMA. Prevents the kernel from swapping pinned buffers.~~
~~Stack size~~	~~Unlimited~~	~~Deep recursion in scientific computing code (solvers, tree searches) needs large stacks.~~

File System

~~Setting~~	~~Value~~	~~Effect~~
~~fs.file-max~~	~~2,097,152~~	~~System-wide maximum file descriptors.~~
~~fs.nr_open~~	~~2,097,152~~	~~Per-process maximum file descriptors.~~

.NET Runtime Environment Variables

~~The following environment variables are set globally via~~ /etc/profile.d/dotnet-hpc.sh ~~and apply to all .NET processes:~~

~~Variable~~	~~Value~~	~~Effect~~
`DOTNET_gcServer`	`1`	Enables server garbage collection. Uses one GC thread per logical processor, reducing pause times for multi-threaded applications. Critical for compute workloads — workstation GC (default) uses a single GC thread that blocks all application threads.
`DOTNET_EnableAVX2`	`1`	~~Enables AVX2 SIMD instructions (256-bit vector operations). Allows~~ `Vector<T>` ~~to process 8 floats or 4 doubles per instruction.~~
`DOTNET_EnableSSE41`	`1`	~~Enables SSE4.1 instructions. Provides additional vectorized operations for string processing, integer operations, and rounding.~~
`DOTNET_TieredCompilation`	`1`	Enables tiered JIT compilation. Methods are first quickly JIT-compiled (Tier 0), then recompiled with full optimizations (Tier 1) after repeated execution. Balances startup speed with steady-state performance.
`DOTNET_TC_QuickJitForLoops`	`1`	~~Allows Tier 0 compilation for methods containing loops. Without this, loop-containing methods skip Tier 0 and wait for full optimization, slowing initial execution.~~
`DOTNET_ReadyToRun`	`1`	~~Uses pre-compiled (ReadyToRun) framework assemblies. Reduces startup time by avoiding JIT compilation of framework code.~~

~~Override any variable per-process by setting it before the command:~~MiB):

DOTNET_gcServer=0 dotnet run    # Useduring workstationconfigure, GCanswer forthe this"huge runpages" onlyprompt with a MiB value

HPCManagement Tuning Profilescommands

~~Four~~

dotnet-hpc pre-configuredstatus            profilesShow areversions, availabletuning viastate sudoand propagateapp stack
dotnet-hpc versions          List installed SDKs and runtimes
dotnet-hpc default <6|8|10>  Set default SDK
dotnet-hpc tune <profile>.    EachRe-apply adjustsHPC kernelprofile
parametersdotnet-hpc forbench different[6|8|10]    workloadRun types.
a compute-heavy
compute Bestbenchmark
fordotnet-hpc CPU-boundinfo              simulations,Show numericalsaved methods,configuration
Montedotnet-hpc Carlo,backup            rayTar.gz tracing.
the data























 Parameter Value
vm.swappiness 1
kernel.sched_migration_cost_ns 5000000
kernel.sched_autogroup_enabled 0
CPU governor performance

What it does: Virtually eliminates swapping, keeps processes pinnedvolume to their current CPU longer (better L1/L2 cache hit rates), and disables automatic task grouping so the scheduler treats every process individually. Best when every CPU cycle matters.

balanced

Best for mixed workloads, development, testing, web APIs.

Uses the default kernel tuning applied during provisioning. No additional changes.

What it does: Provides the HPC network tuning and memory configuration without aggressive CPU pinning. Good starting point when you're not sure which profile to use.

memory-heavy

Best for large dataset processing, in-memory databases, genomics, bioinformatics.






















Parameter Value
vm.swappiness 5
vm.overcommit_memory 1
vm.max_map_count 1048576

What it does: Allows memory overcommit (the kernel won't refuse allocations based on available RAM), increases the maximum number of memory-mapped regions (important for memory-mapped files and databases like MongoDB), and keeps swappiness very low.

mpi-cluster

Best for distributed computing across multiple EC2 instances.


























Parameter Value
net.core.rmem_max 33554432 (32 MB)
net.core.wmem_max 33554432 (32 MB)
net.ipv4.tcp_rmem 4096 1048576 33554432
net.ipv4.tcp_wmem 4096 1048576 33554432

What it does: Doubles the network buffer sizes from the default HPC configuration. Designed for high-throughput MPI message passing between nodes where large messages need to be buffered in-kernel.


SIMD Capabilities

This AMI runs on x86_64 EC2 instances and automatically enables all available SIMD instruction sets. The actual capabilities depend on your instance type:




































Instruction Set Vector Width Operations per Instruction (float) Supported Instance Families
SSE4.2 128-bit 4 floats / 2 doubles All x86_64
AVX2 256-bit 8 floats / 4 doubles All current-gen (c5+, m5+, r5+)
AVX-512 512-bit 16 floats / 8 doubles c5.metal, m5zn, c6i, r6i, hpc6a
FMA 256-bit 8 fused multiply-adds All current-gen

.NET's System.Numerics.Vector<T> and System.Runtime.Intrinsics.X86 namespaces automatically use the best available instruction set. No code changes required — the JIT compiler detects CPU capabilities at runtime.

Verify your instance's SIMD support:

propagate benchmarkroot

# Runsapp thestack built-in(only SIMDif capabilitycontainerized checkhosting is enabled)
dotnet-hpc start | stop | restart | logs | update


Propagate Management CLI

The propagate command is available system-wide and provides all management operations.

Commands







































































Command Requires sudo Description
propagate status No Full system overview: instance info, CPU, memory, .NET versions, OpenMPI, load, storage, HPC profile
propagate benchmark No Run the built-in benchmark suite: vector/SIMD throughput, parallel math, memory bandwidth, MPI test
propagate dotnet-list No List all installed .NET SDKs, runtimes, and global tools
propagate dotnet-default <ver> Yes Set the default .NET SDK version (6.0, 8.0, or 10.0)
propagate tune <profile> Yes Apply an HPC tuning profile (compute-heavy, balanced, memory-heavy, mpi-cluster)
propagate mpi-test [n] No Run an MPI hello world test with n processes (default: 4)
propagate profile <PID> Yes Profile a running .NET process using dotnet-trace, with perf fallback
propagate hugepages [count] Yes (to set) Show current huge pages status, or set a new count
propagate numa No Display NUMA topology and CPU affinity map
propagate ebs No Show EBS volume status and mount info
propagate logs No View provisioning and setup logs
propagate setup Yes Re-run the interactive setup wizard


Multi-Node MPI Setup

For distributed computing across multiple EC2 instances:

Prerequisites


Launch all instances from this AMI in the same VPC and subnet

Use a placement group (cluster strategy) for lowest latency

Configure the security group to allow TCP ports 10000-10100 between instances

Use the same SSH key pair for all instances


Configuration Steps



Set up passwordless SSH between nodes:

On the primary node, generate a key and distribute it:

ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
# Copy to each worker node:
ssh-copy-id -i ~/.ssh/id_ed25519 ubuntu@<worker-ip>



Edit the MPI hostfile:

sudo nano /opt/propagate/config/mpi_hosts

Example for three 32-vCPU nodes:

10.0.1.10 slots=32
10.0.1.11 slots=32
10.0.1.12 slots=32



Apply the MPI cluster tuning profile on all nodes:

sudo propagate tune mpi-cluster



Test connectivity:

propagate mpi-test 8



Run your application:

mpirun --hostfile /opt/propagate/config/mpi_hosts \
  -np 96 --map-by node \
  dotnet run -c Release



Recommended Instance Types for MPI


































Instance vCPUs RAM Network Notes
hpc6a.48xlarge 96 384 GB 100 Gbps EFA Purpose-built HPC, best MPI performance
c6i.32xlarge 128 256 GB 50 Gbps High CPU count, good price/performance
c7i.48xlarge 192 384 GB 50 Gbps Latest gen compute, highest single-node CPU count


Recommended Instance Types





































































Instance vCPUs RAM Best For Approx. EC2 Cost/hr
c6i.xlarge 4 8 GB Development, testing, small simulations $0.17
c6i.4xlarge 16 32 GB Medium parallel workloads $0.68
c6i.8xlarge 32 64 GB Production compute jobs $1.36
c7i.16xlarge 64 128 GB Large parallel simulations $2.86
c7i.metal-48xl 192 384 GB Maximum single-node performance $8.57
m5zn.6xlarge 24 96 GB High clock speed (4.5 GHz), latency-sensitive $1.98
r6i.8xlarge 32 256 GB Memory-heavy scientific computing $2.02
hpc6a.48xlarge 96 384 GB Dedicated HPC with EFA networking $2.88


Firewall RulesPorts














Port
Protocol Purpose
DefaultWhen




22
TCPSSH
SSH access OpenAlways (configurableopen viato setupyour wizard)IP only)


10000-1010080
TCPHTTP (nginx-proxy)
OpenMPIApp inter-nodehosting communicationenabled
443
OpenHTTPS (nginx-proxy) App hosting enabled with Let's Encrypt
8080 App container (internal) Never exposed; reached via the proxy



AllOpen otherthe incomingrelevant ports arein blockedyour Security Group. Pure compute use needs
only port 22.


HTTPS notes


HTTPS uses nginx-proxy + acme-companion (Let's Encrypt).

Let's Encrypt requires a domain name, not a bare IP. Point an A record at
the instance's public IP before enabling HTTPS.

The first certificate may take 1–2 minutes to issue after the first request.

The app is protected by default.HTTP Addbasic-auth rulesusing asthe needed:
admin credentials you set.

Public IPs change on stop/start. Use an Elastic IP or a domain for stable
access.



Backup
sudo ufwdotnet-hpc allow 8080/tcp comment 'Web API'
sudo ufw allow 5000/tcp comment 'Kestrel'
sudo ufw statusbackup

Writes 
File System Layout










Fordurablebackups,copyS3,ortake











































Path Contents
/opt/propagate/bin/root/dotnet-hpc-backup-<timestamp>.tar.gz  Managementof CLIthe anddata setupvolume. scripts
  /opt/propagate/config/  Configurationthat filesarchive (hpc.conf,to mpi_hosts,Amazon global.json)  
  /opt/propagate/benchmarks/ Built-in benchmark source code (SimdCheck.cs)
/opt/propagate/docs/ Documentation
/data/ Defaultan EBS snapshot of the
attached data volume mount point (configured via setup wizard)
/usr/lib/dotnet/sdk/ .NET SDK installations
/usr/share/dotnet/ .NET shared runtime components
/usr/local/share/dotnet-tools/ Global .NET diagnostic tools
/opt/FlameGraph/ Brendan Gregg's FlameGraph toolkit
/etc/sysctl.d/99-propagate-hpc.conf HPC kernel tuning parameters
/etc/profile.d/dotnet-hpc.sh .NET environment variables (loaded on login)
/etc/security/limits.d/99-propagate-hpc.conf Process resource limits
/var/log/propagate-provision.log Provisioning log


Quick Start Examples

Hello World with .NET 10

dotnet new console -n HelloHPC -f net10.0
cd HelloHPC
dotnet run

SIMD Vector Addition

using System.Numerics;

var a = new float[1024];
var b = new float[1024];
var c = new float[1024];

// Fill with data...
for (int i = 0; i <= a.Length - Vector<float>.Count; i += Vector<float>.Count)
{
    var va = new Vector<float>(a, i);
    var vb = new Vector<float>(b, i);
    (va + vb).CopyTo(c, i);
}

Console.WriteLine($"Vector width: {Vector<float>.Count} floats");
Console.WriteLine($"Hardware accelerated: {Vector.IsHardwareAccelerated}");

Parallel Computation

using System.Threading.Tasks;

var data = new double[10_000_000];
var results = new double[data.Length];

Parallel.For(0, data.Length, i =>
{
    results[i] = Math.Sin(data[i]) * Math.Cos(data[i]);
});

Profiling a Running Application

# Find the PID
ps aux | grep dotnet

# Collect a 30-second trace
dotnet-trace collect -p <PID> --duration 00:00:30

# Monitor real-time counters
dotnet-counters monitor -p <PID>

# Generate a flame graph with perf
sudo perf record -g -p <PID> -- sleep 30
sudo perf script | /opt/FlameGraph/stackcollapse-perf.pl | /opt/FlameGraph/flamegraph.pl > flame.svg


Troubleshooting

.NET SDK not found

If dotnet --list-sdks doesn't show all three versions, source the environment:

source /etc/profile.d/dotnet-hpc.sh
dotnet --list-sdks

Instance type shows "unknown"

The instance metadata service may require IMDSv2 with a hop limit of 2. Check with:

curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 60"

If this fails, update the instance metadata options infrom the AWS Consoleconsole to/ allow IMDSv1 or increase the hop limit.

CPU governor shows "N/A"

This is normal for virtualized EC2 instances. The hypervisor manages CPU frequency directly. The governor service is included for bare metal instances (.metal types) where it does take effect.

High memory usage after boot

Huge pages are pre-allocated at boot time. 512 huge pages × 2 MB = 1 GB of reserved memory. This is intentional and reduces TLB misses during computation. Adjust with:

sudo propagate hugepages 256    # Reduce to 256 pages (512 MB)

MPI test fails

For single-node MPI, use --oversubscribe if you're requesting more slots than available cores:

mpirun -np 8 --oversubscribe ./your_program

For multi-node, ensure SSH connectivity between all nodes and that the security group allows TCP ports 10000-10100.CLI.

Support
Documentation: 
Vendor:https://docs.propagate.solutions
Propagate30-day LLC
money-back Product:guarantee.
.NET HPC — Supercomputing Edition

Base OS: Ubuntu 24.04 LTS

Architecture: x86_64 (amd64)

~~Parameter~~	~~Value~~
~~vm.swappiness~~	1
~~kernel.sched_migration_cost_ns~~	~~5000000~~
~~kernel.sched_autogroup_enabled~~	0
~~CPU governor~~	~~performance~~

~~Parameter~~	~~Value~~
~~vm.swappiness~~	5
~~vm.overcommit_memory~~	1
~~vm.max_map_count~~	~~1048576~~

~~Parameter~~	~~Value~~
~~net.core.rmem_max~~	~~33554432 (32 MB)~~
~~net.core.wmem_max~~	~~33554432 (32 MB)~~
~~net.ipv4.tcp_rmem~~	~~4096 1048576 33554432~~
~~net.ipv4.tcp_wmem~~	~~4096 1048576 33554432~~

~~Instruction Set~~	~~Vector Width~~	~~Operations per Instruction (float)~~	~~Supported Instance Families~~
~~SSE4.2~~	~~128-bit~~	~~4 floats~~ / ~~2 doubles~~	~~All x86_64~~
~~AVX2~~	~~256-bit~~	~~8 floats / 4 doubles~~	~~All current-gen (c5+, m5+, r5+)~~
~~AVX-512~~	~~512-bit~~	~~16 floats / 8 doubles~~	~~c5.metal, m5zn, c6i, r6i, hpc6a~~
~~FMA~~	~~256-bit~~	~~8 fused multiply-adds~~	~~All current-gen~~

~~Command~~	~~Requires sudo~~	~~Description~~
`propagate status`	No	~~Full system overview: instance info, CPU, memory, .NET versions, OpenMPI, load, storage, HPC profile~~
`propagate benchmark`	No	~~Run the built-in benchmark suite: vector/SIMD throughput, parallel math, memory bandwidth, MPI test~~
`propagate dotnet-list`	No	~~List all installed .NET SDKs, runtimes, and global tools~~
`propagate dotnet-default <ver>`	~~Yes~~	~~Set the default .NET SDK version (6.0, 8.0, or 10.0)~~
`propagate tune <profile>`	~~Yes~~	~~Apply an HPC tuning profile (compute-heavy, balanced, memory-heavy, mpi-cluster)~~
`propagate mpi-test [n]`	No	~~Run an MPI hello world test with n processes (default: 4)~~
`propagate profile <PID>`	~~Yes~~	~~Profile a running .NET process using dotnet-trace, with perf fallback~~
`propagate hugepages [count]`	~~Yes (to set)~~	~~Show current huge pages status, or set a new count~~
`propagate numa`	No	~~Display NUMA topology and CPU affinity map~~
`propagate ebs`	No	~~Show EBS volume status and mount info~~
`propagate logs`	No	~~View provisioning and setup logs~~
`propagate setup`	~~Yes~~	~~Re-run the interactive setup wizard~~

~~Instance~~	~~vCPUs~~	~~RAM~~	~~Network~~	~~Notes~~
~~hpc6a.48xlarge~~	96	~~384 GB~~	~~100 Gbps EFA~~	~~Purpose-built HPC, best MPI performance~~
~~c6i.32xlarge~~	~~128~~	~~256 GB~~	~~50 Gbps~~	~~High CPU count, good price/performance~~
~~c7i.48xlarge~~	~~192~~	~~384 GB~~	~~50 Gbps~~	~~Latest gen compute, highest single-node CPU count~~

~~Instance~~	~~vCPUs~~	~~RAM~~	~~Best For~~	~~Approx. EC2 Cost/hr~~
~~c6i.xlarge~~	4	~~8 GB~~	~~Development, testing, small simulations~~	~~$0.17~~
~~c6i.4xlarge~~	16	~~32 GB~~	~~Medium parallel workloads~~	~~$0.68~~
~~c6i.8xlarge~~	32	~~64 GB~~	~~Production compute jobs~~	~~$1.36~~
~~c7i.16xlarge~~	64	~~128 GB~~	~~Large parallel simulations~~	~~$2.86~~
~~c7i.metal-48xl~~	~~192~~	~~384 GB~~	~~Maximum single-node performance~~	~~$8.57~~
~~m5zn.6xlarge~~	24	~~96 GB~~	~~High clock speed (4.5 GHz), latency-sensitive~~	~~$1.98~~
~~r6i.8xlarge~~	32	~~256 GB~~	~~Memory-heavy scientific computing~~	~~$2.02~~
~~hpc6a.48xlarge~~	96	~~384 GB~~	~~Dedicated HPC with EFA networking~~	~~$2.88~~

Port	~~Protocol~~	Purpose	~~Default~~When
22	~~TCP~~SSH	~~SSH access~~	~~Open~~Always (~~configurable~~open ~~via~~to ~~setup~~your ~~wizard)~~IP only)
~~10000-10100~~80	~~TCP~~HTTP (nginx-proxy)	~~OpenMPI~~App ~~inter-node~~hosting ~~communication~~enabled
443	~~Open~~HTTPS (nginx-proxy)	App hosting enabled with Let's Encrypt
8080	App container (internal)	Never exposed; reached via the proxy

~~Path~~	~~Contents~~
`/opt/propagate/bin/root/dotnet-hpc-backup-<timestamp>.tar.gz`	~~Management~~of ~~CLI~~the ~~and~~data ~~setup~~volume. ~~scripts~~
`/opt/propagate/config/`	~~Configuration~~that ~~files~~archive ~~(hpc.conf,~~to ~~mpi_hosts,~~Amazon ~~global.json)~~
`/opt/propagate/benchmarks/`	~~Built-in benchmark source code (SimdCheck.cs)~~
`/opt/propagate/docs/`	~~Documentation~~
`/data/`	~~Default~~an EBS snapshot of the attached data volume ~~mount point (configured via setup wizard)~~
`/usr/lib/dotnet/sdk/`	~~.NET SDK installations~~
`/usr/share/dotnet/`	~~.NET shared runtime components~~
`/usr/local/share/dotnet-tools/`	~~Global .NET diagnostic tools~~
`/opt/FlameGraph/`	~~Brendan Gregg's FlameGraph toolkit~~
`/etc/sysctl.d/99-propagate-hpc.conf`	~~HPC kernel tuning parameters~~
`/etc/profile.d/dotnet-hpc.sh`	~~.NET environment variables (loaded on login)~~
`/etc/security/limits.d/99-propagate-hpc.conf`	~~Process resource limits~~
`/var/log/propagate-provision.log`	~~Provisioning log~~

v1 - ami-0185c61124653544c

Propagate .NET HPC — Supercomputing EditionAMI

AMIA Documentation & User Guide

Overview

What's InstalledRequirements

.NET SDKs & Runtimes

First boot

Using .NET

.NETcompatibility Globaland Profilingis Tools

HPC tuning

Parallel Computing Libraries

Performance Profiling Tools

System Utilities

Security

Kernel & System Optimizations

CPU Performance

Memory

Network

Process Limits

File System

.NET Runtime Environment Variables

HPCManagement Tuning Profilescommands

compute-heavy

balanced

memory-heavy

mpi-cluster

SIMD Capabilities

Propagate Management CLI

Commands

Multi-Node MPI Setup

Prerequisites

Configuration Steps

Recommended Instance Types for MPI

Recommended Instance Types

Firewall RulesPorts

HTTPS notes

Backup

File System Layout

Quick Start Examples

Hello World with .NET 10

SIMD Vector Addition

Parallel Computation

Profiling a Running Application

Troubleshooting

.NET SDK not found

Instance type shows "unknown"

CPU governor shows "N/A"

High memory usage after boot

MPI test fails

Support

AMI
A Documentation & User Guide