The system that builds itself — and what it builds. Every project here is live or actively in development. The same autonomous infrastructure that deploys its own improvements is applying those capabilities to real industrial datasets and live markets.
The foundation: a network of specialized agents that operates, monitors, and improves itself continuously — without being asked. This is what everything else runs on top of.
This is what the infrastructure produces. Four real-world ML projects — manufacturing, energy, oil & gas — each built, trained, evaluated, and iterated by the autonomous system. No data scientists. No notebooks. The pipeline owns the full loop from raw dataset to deployed model.
Binary fault classifier on Microsoft Azure's predictive maintenance benchmark — 100 machines, 4 component types, hourly telemetry joined with error history, maintenance records, and failure events. The model identifies which machines are about to fail before they do, enabling pre-emptive maintenance instead of reactive repair.
Binary fault classifier on power transmission grid data — identifying electrical fault events from voltage and current sensor readings across three phases. Auto-engineered features include inter-phase ratios and rolling statistics. The model separates genuine fault signatures from normal load variation and transient noise with high precision across the full test set.
Fault detection on real SCADA data from Kelmarsh wind farm (UK), operated by Cubico Sustainable Investments. Senvion MM92 turbines — a different manufacturer, climate, and label derivation method from the training source. The model achieves near-identical F1 on completely unseen hardware, validating that the learned fault signatures generalize beyond the original fleet.
Anomaly classifier on EV battery pack telemetry — detecting cell degradation, thermal runaway precursors, and pack-level faults from voltage, current, temperature, and SoC signals. Small dataset, high signal-to-noise: the harness auto-tunes threshold selection and engineers 71 features from raw pack readings. Perfect recall on the test set — zero missed anomalies.
Autonomous fault and anomaly detection on real SCADA telemetry from Wind Farm A (EDP dataset). 54 sensors at 10-minute intervals, 1.8 million rows. The harness auto-engineers 591 features including cross-sensor ratios, rolling statistics, and lag features. Drift detection is live — the model triggers its own retraining when production distribution shifts beyond a PSI threshold.
Predicts daily oil production rate (Sm³/day) from wellhead pressure, choke size, and downhole sensor readings — replacing expensive physical well tests with a data-driven virtual flow meter. Norwegian North Sea, Volve field, 2008–2016, 6 producing wells. Four root-cause fixes were applied autonomously (distribution shift encoding, split cutoff enforcement, config key correction, spurious proxy removal). Model retrained and producing results.
Classifies FDA adverse event reports to surface genuine drug-event pharmacovigilance signals from noise — separating real safety signals from reporting bias, concomitant medication confounders, and the Weber effect. Trained on full-year 2024 FAERS data (4 quarters, 984K reports). The harness's first mixed-domain healthcare project — tabular structured fields alongside derived signal features.
Predicts remaining useful life (cycles to failure) of turbofan engines from multivariate sensor time-series — the harness's first degradation modeling problem. NASA CMAPSS dataset (FD001–FD004): 234K training rows across four fault-mode subsets with variable altitude, Mach, and throttle conditions. The model learns how sensor patterns drift as fan, HPC, and HPT components wear, giving a per-cycle RUL estimate rather than a binary fault flag.
Autonomous cryptocurrency trading running live capital on Kraken. Momentum strategies with adaptive parameters, continuous trailing stops, breakeven locks, and ratcheting stop-loss logic. The same self-improving infrastructure manages real money on a physically isolated node — no shared memory, no shared network path with the pipeline.
Continuous trailing stops activate at ≥4% profit and trail at current price minus stop %. Breakeven lock and ratchet SL prevent winners from turning to losers. Every 15 minutes a separate verification process cross-checks all positions against live Kraken open orders. The A9 Max node has no shared memory or network path with the pipeline — the trading system can't be modified by an autonomous deploy.
Every night the system reads its own activity log — deploys, failures, research findings, trade moves — and writes a narrative of what it built, fixed, and learned. Published automatically. No human writes it, edits it, or approves it. The system documents its own evolution in its own voice.
Every night at 23:45, Chronicle pulls the full epoch summary, deploy history, test results, TradeShadow trade log, ML project metrics, and Sherlock's market analysis. It synthesizes these into a narrative post — what improved, what failed and why, what the models are doing, and what the system is building next. The voice is consistent. The analysis is real.
Real estate market intelligence. Score ZIP codes by growth indicators — population trends, job growth, permit activity, price appreciation. Drill into any area for Grok-powered analysis and live listing links.
Enter any US ZIP code and Realtix scores it across population growth, employment trends, building permit activity, and price appreciation. Grok provides a plain-language market narrative. Live listing links connect directly to current inventory. Built and deployed by the pipeline.
Open Realtix →Construction takeoff directly in the browser. Drop in a PDF plan, draw dimensions, count materials by type, and export — no desktop software needed. Built for speed on any device.
Upload any construction plan PDF, set a scale reference, then draw lines and mark areas directly on the plan. Takeoff tallies quantities by material category in real time and exports a summary. No install, no account, no desktop app required.
Open Takeoff →What comes next — both for the ML projects the harness will run and for the system capabilities being built into the infrastructure itself. These aren't aspirational slides. They're the actual roadmap, scoped and queued.
The energy transition is creating a grid stability crisis. Solar and wind generation is intermittent — output swings with weather, not demand. When a large solar farm drops offline suddenly or wind generation undershoots forecast, grid operators have minutes to dispatch backup capacity before frequency deviates enough to trigger automatic shutoffs.
The model predicts grid stability margins given current generation mix, demand curve, weather forecast, and interconnect flows. A second module handles multi-step load forecasting — predicting demand 1h, 6h, and 24h out — so operators can pre-position reserves. This builds on the wind energy work already in production, adding the demand-side and interconnect complexity of a real grid.
Drilling a single well costs $5M–$50M. A significant portion of that cost is Non-Productive Time (NPT) — stuck pipe, lost circulation, well control events, equipment failure. These events don't come out of nowhere: drilling parameters and downhole sensor readings change in characteristic patterns 30–60 minutes before the event, but the patterns are subtle enough that a driller running in real time will miss them.
This project builds a real-time anomaly model on drilling logs — WOB, RPM, ECD, torque, ROP, gamma ray, pressure — that flags the precursor signature before the event materializes. It's the natural next step from volve-prod-001 (same North Sea domain, same Equinor data), and it introduces a new requirement for the harness: streaming inference, not batch retraining.
The system's intelligence layer depends on which model is running. Right now, model selection is manual — a human notices a new release, evaluates it, and decides whether to swap. This is a bottleneck. Autonomous benchmarking removes it: the system monitors model release feeds, pulls candidates to a staging partition, runs them against a fixed suite of real past pipeline tasks, and reports quality/speed/memory tradeoffs. The swap decision stays human. Everything else is automated.
The pipeline currently improves its code but doesn't learn from its own improvement patterns. Self-evaluation closes that loop. Every patch gets a quality score based on what happens downstream — did it pass Tester? Did it hold up 48 hours later? Did Artemis reference it in future tasks as a working pattern? Every research finding gets scored: did it lead to real improvements, and how long before it materialized? Over time, agents tune their strategy toward what actually works.
TradeShadow is running live on six crypto pairs with the same self-improving infrastructure maintaining it. The next step is market expansion: an equities module (momentum + sector rotation) and a higher-volatility crypto module (SOL, AVAX, LINK with tighter position sizing) as independently isolated components. Each runs its own risk envelope. A bad run in equities has zero contact with the crypto module — shared infrastructure, zero shared state.
The ML harness is proving its architecture on public datasets: fault detection at F1 0.942, cross-fleet generalization at F1 0.9375, 1.8M-row SCADA processing. The next phase is private industrial data partnerships — the same harness deployed against a customer's live sensor stream. Autonomous retraining fires when drift thresholds are crossed. The customer gets a model that improves itself as their equipment ages, not one that degrades silently.