That means they're actually incentivized at least short term, to benefit PCs becoming strong enough to do local LLMs. Which makes this play make even more sense. Though, I've been saying for a while that the local AI inflectiom point is the death knell for these frontier labs.
"Death knell" is a touch hyperbolic. Hardware that can only run quantized models that take up GBs in VRAM falls short of even an A100 (by almost an order of magnitude[0]), which in turn falls short of what an 8xH100 cluster can do (also by another order of magnitude[0]).
I'm an avid believer in local LLMs, but I cannot deceive myself - data center accelerators will win on power dissipation numbers alone[1], even when giving generous allowances for higher efficiency on Apple chips - and assuming the Apple-efficiency advantage persists on the same TSMC process node.
0. Based on my unscientific fine-tuning training experiments across local and rented GPUs. YMMV for inference.
1. Unless Apple surprises everyone and brings back the XServe with M7, if not, then laptop and desktop for factors simply can't dump heat fast enough to compete head-to-head, and will be designed for lower input wattage.
The frontier models are faster, and better at coding, but not so much that i’ll pay $200/month for them.
They use a technique where you only load between 1B and 4B of a 20B dense model for an entire prompt run, not token by token like a MoE, and use mostly the low power ANE instead of GPU cores.
Now, imagine if/when they scale up to 100B or more? On a chip using 2W?
If someone could splinter or fragment the models into more specific tasks i.e "spellchecker AI" and get these working as well as Sonnet 4.6-4.8 on those tasks on a personal laptop. You then question the $100 a month fee.
Bear in mind these laptops are likely to be $5000 or so because of the memory, HDD and M7 chip they likely need.
It feels to me like the beginning of the inflection point but software updates not hardware updates will be the accelerant.
"That’s where EMO comes in.
We show that EMO – a 1B-active, 14B-total-parameter (8-expert active, 128-expert total) MoE trained on 1 trillion tokens – supports selective expert use: for a given task or domain, we can use only a small subset of experts (just 12.5% of total experts) while retaining near full-model performance."
https://allenai.org/blog/emoI want to live in this world too, but these numbers, as of today, are very aspirational and far removed from reality.
I'm no tokenmaxxer; I find my modest local setup useful, I also know the limitations, it's slow and it sucks (relatively) at high-level and/or long-context planning, compared to frontier models. Only a minority of my prompts are max-effort - its not all I do, but, it also means frontier labs aren't dying any time soon
I love local models - I have a machine at home that runs a few for me and it's a lot of fun - but for the time being they are not super trustworthy on tool calls and staying on script. Another year or so might change all that!
*corrected llama version to 3
The weights they “etched” into the FPGA card that’s used for the ChatJimmy demo are that of a Llama 3-something 8b model.
The actually impressive and novel thing is that Taalas’ve managed to automate that process (clearly – nobody transforms 8 billion numbers into a physical representation by hand).
So now, they can work on scaling this process up, and with low enough lead times (I’ll be convinced they have inside connections to TSMC if they can actually deliver on the promised mere 3-4 months delay), will be able to offer 30-100b+ parameter models under half a year after they’re released, at thousands of tokens per second while probably drawing less wattage (per token, not sure about overall).
Exciting times ahead, folks.
The real question is, what are 90% of people going to ask llms to do. I’d argue mostly it’s going to be stuff that works-now or almost-works on local models, but that’s just an opinion. It also depends on the frontier models hitting a wall of steeply diminishing returns, since they set the expectations for all of this stuff - my gut says that’s happened already they just won’t admit it for a while - but we’ll see.
Sometimes, I need a quick throwaway bit of python. That can take 30 minutes of my time.
Apple is the only player here where it would play into their natural hardware incentive to get you to pay more for better hardware. It would make sense for them to find a way to run LLM locally (eg, newer architectures that others here have pointed out).
Interesting times.
Of course, these are a lot of ifs.
There is some signal that this is possible through both hardware innovation and training/data improvements.
Cloud models have their own constraints - I can’t have opus4.8 spend 4 hours on a deep research question I had in the shower without spending money. I can’t do real time video game upscaling and graphics work in the cloud period.
A laptop is about an order of magnitude cheaper than a cloud server thanks to economies of scale, uptime requirements, and other factors.
I'm not talking local Gemma/Qwen vs cloud Opus, but against OpenRouter same Gemma/Qwen
there are reasons to run local - privacy, availability, but cost is not one of them
Now, this all brings the upfront costs way up, the solar panels are cheap, its all the rest around them that tends to cost money.
There has been a lot of market-subsidy in AI which is starting to fade away: e.g. the copilot quotas/pricing. When VC switches from investing to wanting a return, the price equation is likely to change.
You buy a big GPU, you serve LLMs, you print money.
Could you give an example with real figures?
for me it would be about $2 per day in electricity to generate 8 mil tok of Gemma4-26B at 4 bit quantization. this is excluding how much the GPU cost (no amortization)
ignoring the fact that I could get more free tokens per day for this model from Google/OpenRouter, it would cost $4 per day on OpenRouter if paid, but they would run it at full 16 bit precission
this would be the most "profitable" model for me
for Gemma4-31B I can generate only 1 mil tok per day, and so I pay more to get less quality than OpenRouter (ignoring that this model is also free on Google)
Benchmarks maybe? Real world, no.
You just need the context otherwise. There's no way around it.
Whether such a model exists or not is a different question.
That's the today hardware.
Now suppose Apple goes to any of Samsung/Micron/Hynix and says "we'll pay you the entire cost of building another DRAM fab and in exchange we want its entire output" and then releases M7 devices with enough memory and compute to run bigger models.
> Unless Apple surprises everyone and brings back the XServe with M7, if not, then laptop and desktop for factors simply can't dump heat fast enough to compete head-to-head, and will be designed for lower input wattage.
Laptops maybe. Desktops can dissipate more heat than the amount of electricity you can draw from a typical household wall outlet.
It's revealing that they aren't doing this: no one wants to fund that gamble on the state of AI demand 12-18 months out, but ate happy to capitalize on their current product lines/capacity.
> Desktops can dissipate more heat than the amount of electricity you can draw from a typical household wall outlet
100% agree, but the data center power and cooling infra are not limited by home wiring, and go way beyond what a wall outlet can safely provide (1,440W max on a typical 15A circuit at 120V). A single H100 maxes out at 700W
So yeah, commercially it might be a death knell. Yes there's still a market for super computers, but would your rather own Apple or Cray?
I would consider an HPE tower server with a processor on the same league as an M6 or M7 under the Cray brand.
At some point there will be diminishing returns towards the "just throw more RAM at it" approach the current frontier models are taking. Commoditization is just as inevitable as it ever was... and in doing so will enable actual leaps of what AI/ML is capable of. That's not to say there won't be a place for 99.999999% accurate vs 99.99999% but those cases will be limited and likely prime to disruption based on real innovation vs access to capital.
SOCs with unified memory have shifted this a bit forward, but they're also expensive as shit.
10TB ram in a consumer device is simply not happening in the next 10 years.
Is this true though? I don't really have time to do the research and have no dog in the race but I'm sceptical...
But then again I'm not there one making our profiting from the claims that there is billions being invested into infra...
Enjoy paying $1000 or more for a little 4 GiB cloud terminal that connects you to all your online accounts where all your actual work gets done. This is the future.
This is highly doubtful.
Rule of thumb: everything people think is exponential is actually an S curve.
What’s frontier now is prosumer in a couple years and commonplace in a couple more.
We need one of those specialized inference chip startups to succeed and a PC manufacturer willing to bet on them against Nvidia for the local AI to find mass market appeal.
For Immich, the cheapest option will either be a NAS or a used laptop depending on the amount of data you need, I wouldn't buy a mac for that.
(I just run the defaults on my CPU, works for me)
I think the decision comes primarily on how much data you would like to store for Immich, if you want to go cheaply, a 100 bucks used laptop will do the job, if you have too much data, a NAS will be more suitable (and you are certainly not going to get a mac where you can plug multiple internal hard drives for the price of a NAS)
That needs to be in (v)ram for searches.
Apple has always been the most cost effective choice for the value you get going all the way back to the Apple II, it's just that the floor of that cost has always been high. Anyone who thinks otherwise is a just a fanboy one way or the other.
That's how much many developers currently spend on tokens - every day. Whatever "Apple Tax" applies to a device that can run a capable model offline will amortise itself in a blink.
Current high-end Mac Studio with 32-core M3 Ultra chip and 96 GB of memory is $6800, 96GB is not enough to run GLM 5.2 without extreme quantization or stacking HW; but for the sake of discussion let's run quantized version on a single high end Mac Studio.
GLM 5.2 Max plan costs $ 112/m, so it would take ~60 months to recover the costs assuming the machine was bought just for AI. By then the current AI landscape would have changed drastically.
I use local AI on both Linux and Mac every single day, there's freedom, privacy and peace of mind in running the model locally. But I feel cost/value of local AI is overblown.
1. When Apple came out with the real Macbook Air in 2010/2011 (not the silly 2008 one), nobody could compete with it with those specs at that price and they couldn't for years. And every competitor usually sucked in some major way, most often the trackpad;
2. The Mac Mini is an outstanding piece of hardware for $600. Or was;
3. I've generally found that "Apple tax" complaints levelled against the iPhone to be nothing more than Android cope;
4. The M-series silicon has been an absolute game-changer. I honestly thought the first-generation M1s would be not great but they came out swinging. And the price points for these Macbooks have all been great, much better than the last-gasp-of-Johnny-Ive touch bar butterfly keyboard series, which were objectively awful.
Apple has pretty good competition in every segment with the exception of maybe the iPad, but I'm not a tablet user.
Sure, you can use the App Store and use all the stuff that integrates with iPhone, iCloud, etc
But you can also just treat it as Linux for Laptops (that actually works), and roll with all the standard open source tools.
While they don't _prevent_ Asahi from doing what they're doing, they certainly don't go out of their way to make it easy for them.
But if you don't like it, switch. I don't see vendor lock-in.
Notes sync, Copy/Paste would be hard to give up and took zero effort
And in the rare occasions in which I have to use someone's MacBook, I'm completely lost - like some elderly person.
So maybe they were assholes.
(All the usual caveats about geekbench scores apply, but they're not nothing.)
If people can get opus4.6/gpt5.5-like models locally, labs could raise their prices and sell token speed, better reasoning, mobile-focused improvements, you name it.
Not all consumers are power users and many will be happy to pay for flexibility.
why not just say "I think that"
do you see yourself as some kind of visionary about this particular topic? literally EVERYONE is saying that, it's the most obvious fact about AI
768GB is 64-times of 12GB which is rumored to be amount of RAM in new iPhones. Imagine what profit margin 768GB Mac Studio gonna need in order to justify making one instead of 64 iPhones.
Apple is the company that is okay about selling microfiber cloth for $100 and wheels for $700. Imagine how bad price hike for M3 Ultra 256GB / 512GB had to be in order for them to just discontinue them instead of getting free money out of desperate local AI folks.
So yeah the only way I see them selling it is usual "call us" enterprise price tag.
But since its not what Apple usually do its easier to sell 4x Mac Studio 256GB RAM boxes with interconnect for lets say $12,000 - $15,000 each.
And the reason people rarely use that for AI is that the enterprise GPUs from AMD and Nvidia are only moderately more expensive but are significantly faster because they use HBM instead of DDR5.
> a M5 studio would probably beat that performance for around half the cost.
A barebones 2S system with no CPUs or memory is ~$2000, a pair of 16 core CPUs another ~$1000 each, and then however much memory you want. The price seems pretty comparable. The "problem" with doing this is actually that 128GB is too little memory, because you want to populate all the channels, but even using 16GB sticks, 24x16GB is already 384GB.
> You also get to use the GPU/NPU cores of the mac vs CPU only on the servers.
You only need enough cores to make sure the bottleneck is memory bandwidth.
> A barebones 2S system with no CPUs or memory is ~$2000, a pair of 16 core CPUs another ~$1000 each, and then however much memory you want.
As you say, the thing is it's not 'however much memory you want' it's 24 sticks which at $300 a stick for 16GB is $7200, then you also need at least one NVME disk so you're looking at what $13,000?
Question: you need 16GB sticks because they're the smallest doublesided ones, which you need for maximum BW, right? Otherwise why not 8G?
People said that about the M1 Ultra Mac Pro, a few years before it was discontinued. I don't think there are many HPC customers looking at Apple hardware.
There is a clear difference between $25k and $100k.
64 iPhones at retail price is already around $64k. For something at 768GB to be profitable at Apple's terms, this has to retail at $100k for it to be profitable. That was the OP's point.
If you're talking about chaining together multiple GPUs you're talking about a different game -- I suspect, anyway. Seems like a high-spec Mac would be good for development and testing. Arrays of GPUs, better aimed at production use.
Iphones/tablets drive app sales/apple subscription services, if they force a user to move to android they may never return.
Why do you think they sell the iPhone 17e/se? They need to maximise their user base as its ongoing recurring income stream.
Once/if the ram shortage ends, they will continue increasing the ram caps as they were already doing, because then selling ram-heavy macs will not interfere with the rest of their products.
What’s really happening is that the effort of securing additional stock isn’t worth it because the price is so high that there aren’t enough buyers.
If ground beef were to suddenly cost $50/pound, McDonald’s doesn’t raise the price of the Big Mac to $25 and hope people buy it, because it makes zero sense for their business model. Sure, some fancy restaurant will still be selling hamburgers, but not your chain of thousands of working class fast food restaurants. McDonald’s would find some other alternative item to sell.
The truth is that nobody’s going to be buying Mac Studios that cost $25,000. Not even enterprises.
Apple failed to predict the demand for Mac Studios. Many other companies in its supply chain likely failed to predict that Apple would come back asking for more. There is no excess stock for some key components or the spare capacity to make them on demand. Apple would have to scour them from the market, likely paying much higher prices than it will pay for scheduled deliveries.
Their target market is composed of people that would pay for it nonetheless. They should have tightened the screws a bit more.
However, I think without the very high end machines Apple is also seeding a lot of professional middle market too.
If the choice is between, say, a Framework desktop vs nothing from Apple I'll obviously pick the Framework.
If I get used to a Framework desktop running Linux then I'd probably stop buying MacBook Pros.
Right now Apple has a chance at capturing local AI but that opportunity won't last forever.
Initially when it happened everyone expected they did it because they planned to announce M5 Ultra shortly, but its not looks like this is happening.
Now IMHO its indicates they simply run out of RAM supply.
Mea culpa, dyslexic moment.
Of course Apple is massive, but if they announce inference boxes that everyone wants they need to make 10,000s or even 100,000s of them.
And it's very much likely they'd rather sell 600,000 - 10,000,000 more iPhones or Macbooks Neo. End users bring Apple money with every OpenAI / Claude subscriprion sold through their platform.
And inference boxes is just one-off sale of hardware that will bring no further income.
They are assuming that they are able to get ram in the future, once the AI bubble either dissipates or pops. Its far easier to build something you planned for 3 years ago, than crash build it in 3 months.
Right now RAM shortages are bad to the point where likely even Apple have to decide what products they make and what they discontinue.
There been short time where M3 Ultra with 256GB / 512GB been best offet on market because Apple lagged with price increase. Now HN crowd expect Apple of all companies jump into price war with Nvidia and to subsidize their inference hardware.
M1 had 70 GB/s, M1 Pro: 200, M1 Max 400, M1 Ultra 800.
Modern RTX 6000: ~1,600 or so.
If we get a 1,200-1,500 GB/s bandwidth M7 variant in late 2027 with 512GB of RAM, that will be a very interesting chip. Tracking LLM size and performance improvements, I can imagine that being a sort of inflection point for local inference. I wonder what the power budget would be in desktop format.
You're look at about 100 tokens/s for a 1T MoE 37B active 4bit model.
It'd probably cost $30k or more I'm guessing if memory prices do not come down. Even at $30k, it could still be a relative bargain since an RTX Pro 6000 Blackwell 96GB card costs $12k today. The M3 Ultra with 512GB was around $8k before Apple discontinued it. I expect an M7 Ultra to have 768GB or 1024GB.
Apple Silicon Macs were on their way to becoming cheap local LLM machines relative to professional GPUs before this memory crisis. It may still emerge as such in a few years.
Here's some interesting math: At 512GB, an Ultra chip could make 42 pro iPhones. Assume a 55% profit margins, and $1200 ASP, you're looking at $28,160 in profit from making iPhones instead. No wonder Apple discontinued the M3 Ultra 512GB. If they only have a limited supply of RAM for all their products, it makes no sense to produce an $8000 M3 Ultra 512GB when you can produce 42 pro iPhones. You can only configure an M3 Ultra up to 96GB today as of June 2026.
Apple would have to raise the price of a 512GB Ultra Mac to around $50k to match iPhone profits.
How would that work? They purchase 512GB from Samsung and then it doesn't matter if that's like 128x 4GB or 4x 128GB?
If comapnies keep spending half a macbook neo worth of subscription on AI plans monthly per person, Apple is going to have a hard time competing.
That’s indeed very hypothetical considering that Apple silicon uses on-package HBM.
The base model was $9k, that much RAM got you into $14k range.
Edit: for those of you downvoting I don’t celebrate this prospect. I’m merely realistic about where things are going given the rapid vibe shift from the administration on AI since the start of June.
The article didn't state the M5 Ultra won't be released. It will probably provide 1228GB/s of memory bandwidth this year.
(This is assuming Apple will deliver, but this area is one of the biggest ones they have in AI, and they need the developer ecosystem to exist and survive)
Maybe this strategy works, even in that world.
Remember when we all thought (were told we thought) the world was heading to 3D views of our 2D lived experience like a solid Cube of GUI we could rotate around and live inside? Well Apple took the simple 2D square pane of virtual desktops and .. made it a SONY strip. One variable: sideways.
So here we are being told AI is the future. Apple seems to be saying "yes but it will run local" which might be a safe bet if AI comes true but I wonder how many of us want the AI outcome, which is morally speaking the 3D immersive GUI cube here: what if we don't want that?
So I think Apple has the right instinct. In fact, I've had the thought multiple times that I really want a lot of workflows just running on my device. Workflows like fast vector search (already fast on the m4, but I want it more common place), or realtime transcription and summarization to be even faster, on device, etc.
Investors are expecting trillions of dollars to come out of this play. That's only possible through monopoly pricing essentially.
"Being a profitable business", which selling tokens could absolutely be, is not sufficient. They need to own a large chunk of the world
there isn't a future where we all just decide that nah, we don't want AI anymore. usefuly things don't disappear.
Can't it do both? The M1 Pro with 16gb+ is still more than nearly everyone needs.
It’s all fairly easy bets to make and correct.
So what happens? Nothing. If Apple make M7 Max/Ultra computes with 128-768GB of RAM and nobody buys them then... nobody buys them. Apple isn't betting the entire company on AI just like Google isn't. The rest of the internals are the same Macbook, Mac Mini or Mac Studio. You're just selling something with less RAM.
Google sold (will sell?) about $70bn worth of Google shares to fund AI infrastructure build outs. It's also issued bonds (=debt; I forget the number, $30bn?) to pay for more infrastructure. Fairly sure it also has established a shadow company, a Special Purpose Vehicle (SPV) to stash away unpleasant financial things it doesn't want to show, also for the AI build out.
Amazon, Google, Meta, Oracle are overstretching at the moment. They are predicted to become cash flow negative (more money going out than coming in) if they keep going at this rate, some time in 2027 or 2028.
Now, they won't go bankrupt but it's possible they will be hit by huge restructuring waves once the dust settles.
$100 billion in equity and bond offerings is not betting the farm on AI.
Additionally, Google's data centers are notoriously efficient and they build their own networking hardware as as their TPUs. Now their TPUs lag behind NVidia offerings obviously but they'll keep getting better.
Google is the company I'm least concerned about in the AI space.
As for Amazon, I'm not sure what their split is on in-house AI build out and usage vs AWS. I suspect the majority is for AWS and I also suspect their ROI is insanely short (ie less than 5 years) on any AI capex.
A lot of us on HN dismiss Oracle with good reason but it's also fair to say they've survived and thrived through the dot-com crash, the GFC and the pandemic until now. They're clearly doing something right.
Meta I think is the Sick Man of Tech. Their social media assets are of declining value (IMHO) and they seem completely unable to adapt to changing conditions. The whole Metaverse was a massive $70B+ boondoggle on something long before they had anything resembling a product-market fit. They seem unable to leverage their social media assets for building any competitive AI products or tech. Plus they seem unable to hire and retain key staff in this space.
> Google has a market cap over $4 trillion
For the purposes of most financial discussions, the market cap can be ignored. It tells almost nothing about the business fundamentals.
> and it's 2025 financials were $130 billion profit on $400 billion revenue, which was something like 15% Y/Y growth.
Most of that growth, which has re-accelerated, is AI driven. Aka eggs in a single basket.
> $100 billion in equity and bond offerings is not betting the farm on AI.
That's on top of capex of $185bn in 2026. In 2023 it was $32bn, 2024 $52bn, 2025 $91bn.
All your financial alarm bells should start ringing.
Amazon is doing similar things.
> A lot of us on HN dismiss Oracle with good reason but it's also fair to say they've survived and thrived through the dot-com crash, the GFC and the pandemic until now. They're clearly doing something right.
https://www.reuters.com/business/autos-transportation/cost-i...
Oracle is the most likely to go bankrupt out of all these companies outside of AI labs. They've been very adept at financials until now, when they've bet the farm on explosive AI revenue and especially profit growth. If those don't materialize soon, bye bye Oracle! Which would probably make a great many engineer happy :-))
If you issue $70B in stock, which was your point btw, market cap absolutely matters. $70B is less than 2% new shares issued. If it was a $100B company, it would be 70%. That's why market cap matters.
> All your financial alarm bells should start ringing.
Only if you start with the premise of "Google is imploding" and then go looking for evidence.
Oh, so we're using CDS rates as a proxy for alarm now? Ok, Google's rates are hovering under 50 basis points [1]. For comparison, this is only slightly higher than the sovereign debt of Canada [2].
[1]: https://finance.yahoo.com/markets/stocks/articles/alphabet-v...
I think reducing the die area dedicated to ai stuff is not going to be a problem.
And in fairness apple already has essentially ai-less hardware in the form of the MacBook neo and it’s been an astonishing success.
I have one and it’s a very good laptop, particularly for the price i paid it.
Do we have a choice? It's being forced upon us by folks who have the power to distort any market they want. Energy prices are rising, and the PC industry is about to be destroyed by component prices. It will be dumb clients that run the software our feudal overlords of the data centers will have the grace to grant us. And the government lets it happen because it furthers their interests.
https://bontechlabs.com/news/apple-is-reportedly-using-intel...
Given the risks involved in establishing Apple Silicon designs with a new fab, I would expect early M7 parts to be in test production right now.
The fundamental M7 design is already set in stone.
Mark Gurman's Bloomberg article does not mention fabrication partners or processes.
I haven't seen any competitor even try to address the backside power delivery of 18A. I suspect that Samsung,TSMC have something similar and doesn't talk about it.
The design rules for the standard cell (sort of corresponding to the die area required by a transistor) for the Intel 18A seem to target dense, high performance designs. That's not a particularly meaningful insight - of course Intel wants to have the highest performance of all the fabs.
Intel's packaging expertise used to be a generation ahead, and indeed their server chips currently use a mad mix of chiplets and through-silicon visas for direct stacking, all heaped onto a reticule-limited monster interposer die. All of this expensive complexity might be sustainable as long as Intel can keep its enterprise customers happy. That hasn't turned out too well for them.
AMD has found a mass-market winner with mainstream gaming CPU with extra level 3 cache die stacked on top. Compared to Intel servers, it's brutally simple. But extremely effective in its consumer market.
But the Intel chiplets and packaging could be a great toolbox for M7 generation of Apple Silicon. Now that the M5 Pro and Max are multi chip packages, they more resemble the Intel and AMD designs, with chiplets dedicated to I/O or GPU.
(Speculation and dreams. That's all I got, and I'm writing it in the face of an absolutely psychotic autocorrect on a tablet.)
As we all know, Intel used to be famous for their engineering and their ability to scale up a newer, smaller process with way earlier commercial viability. This all ended with the Sisyphean 10nm move that was years late and honestly Intel just don't seem to have recovered from it.
So Intel seemingly has underutilized fab capacity whereas the likes of TSMC and Samsung can probably produce every chip they make with demand to spare. Given the CHIPS Act that was passed under Biden, the Trump admin taking a stake in Intel and the environment of tariffs and a push for American manufacturing, everything seems to be lining up for someone to take advantage of Inte's physical fabs and American production and that could be Apple.
If they have Apple's designs months prior to launch, rather than after launch.
The real advantage is knowing exactly what Apple is launching months or years in advance, because that can inform strategic planning.
This has both a technical and human component.
On the human side, top x86 execs refused to see any threat coming. They must have thought Apple couldn't overcome the x86 software moat, thought the chips were for servers, consoles, or some other non-PC device, or perhaps they simply couldn't believe what their investigative teams told them.
At the same time, we're 6 years post-launch. The proof of ARM's capability is clear. x86 server marketshare is about to hit just 50% and Microsoft is pushing ARM hard as a replacement for x86. Either all the x86 engineers are completely incompetent and incapable of learning from years of ARM designs or there are aspects of x86 that makes copying those designs infeasible.
> teams from Intel and AMD absolutely HAD to know the truth.
These people are professionals that acknowledge IPC is a stupid metric. If you switch your statement to SIMD throughput, now ARM NEON has the lower IPC and x86 looks like space age technology. They're optimized for different workloads.
x86 vendors recognized that they could recoup the majority of efficiency that Apple Silicon has without buying an architectural license for ARM. Intel invested early on big.LITTLE, and AMD drilled down on denser nodes for their preexisting designs. As both businesses converge on each other's ideas, their SOCs have adapted most of ARMs' greatest mobile innovations. Even before that, x86 hardware was always usable - AMD was shipping faster integrated GPUs than the M1 Pro before the M1 ever hit shelves.
All of this makes sense, nothing objectively prevents the x86 architecture from being power-efficient. Arm LTD. would have gouged any of those vendors for their IP, and even with an architectural license it's not like AMD or Intel would get usable core designs from Arm. There was no reason to pivot to ARM for either company, they both saw Qualcomm and could read the writing on the wall.
> x86 server marketshare is about to hit just 50% and Microsoft is pushing ARM hard as a replacement for x86
That's Nvidia's work, no credit is due to Microsoft or Apple for reshaping the server market. Apple's early ARM hardware was outright ignored for server/HPC applications, leading to the discontinuation of the Mac Pro. Apple was entirely incapable of pivoting their mobile chipsets to the server scale, surprising nobody that had paid attention to Apple's godawful raster/GPGPU acceleration stack. The Ultra hardware looked like a dog's dinner compared to x86 arches like CDNA.
The Graviton and Grace chips that displaced x86 servers did it because they are slower, cheaper and less feature-dense. Graviton for the bare minimum of Raspberry Pi-tier web serving, and Grace for the high-end of "we need CUDA and enough bandwidth for Infiniband" that made trillions in the HPC market.
While I'm sure some level of internal leakage does take place, at least on paper the fab's planning needs to be firewalled off from their own chip roadmap.
I'm also not sure how much Apple actually cares, tbh. Yes, they currently have an edge in silicon, but it's heavily due to being willing to outspend everyone else, and their real superpower is vertical integration - which Intel isn't in a position to compete with.
Also the AI boom means NVIDIA et al. can afford to buy TSMC's best processes at scale, which means less available capacity for Apple.
I'm sure given no other forces at work, Apple would prefer to stick with what they were doing previously, buying the lion's share of TSMC's best process.
It's not simply marketing since the Pro/Max chips of a generation use the same cores as the regular version, just more of them or different combinations of performance and efficiency cores.
The claim is that M6 will be released, but the only variants will be lower end.
When they get to the M7 generation, they will make high end variants.
It's a real distinction because each generation of parts shares an architecture.
The article has an entire section speculating what the M6 parts will be, but says they'll top out around 200GB/s memory bandwidth and 12 graphics cores.
Why would it? Each generation of the M series has an architectural improvement on their chipsets. The difference between an M1 and an M1 Pro is the allocation and arrangement not the architecture. M6 to M7 presumably will have architectural changes.
Or did this announcement also add an M6 chip, and they're just skipping pro?
It's the same thing as how the Mac Studio got an M4 Max refresh, but they didn't make an M4 Ultra so if you want the 28+ core CPU or 60+ core GPU, that's still using an M3 Ultra.
This time it'll be across all the Pro, Max, and Ultra versions, if you want those they'll stay at the previous generation for the M6 cycle.
Not that weird - Apple has a huge set of chips and hardware and software products. Putting every single thing on a fixed identical update cycle together won't always make sense.
It can still be a very real, not made-up distinction, if the actual facts on the ground are that Apple designed an M6 line, but then scrapped that design and asked the team to create a new design with emphasis on AI-focused specs.
It's not the name that's important (the M7 could still come out as M6), is them skipping a design, or cpu "Tick-Tock model" step.
Are you thinking Apple is leaking that there will be a long wait for much more expensive chips in order to… what?
I am still skeptical that Apple intentionally leaked this because they normally are so tight-lipped, but there are reasons in favor of leaking this.
If they are pulling out all the stops to make the M7 more competitive.. guess I can wait for that?
What do you mean by 'win'?
For a normal coder/person's use cases, yes. But AI companies are becoming more specialised in different fields and these tailored models will be leagues ahead in those niches.
And if a local mcahine can run something like Opus 4.8, who is to say that those "specialized" models would just not come at a later date, or even loading open models wouldn't be an option with something like M7-verified flag from huggingface that would make it extremely easy for any consumer to just play around.
As of yet no indication that small models that can fit in 8gb/16gb can be fully relied upon?
The problem is basically that we can't have nice things. AI chat logs themselves become another commodity to sell and to train on. We recently had a story about how Chinese firms are reselling Claude tokens [2]. The chat logs are a commodity here.
The only way to avoid this is to run LLMs locally. Even if you trust someone like Anthropic or Google, case law simply hasn't been established that the chat logs aren't discoverable.
Add to that that a sub-$5000 PC with a 5090 can already run a 31B model at reasonable inference speeds. Not amazing but good enough for many applications. Obviously that can't compete with Mythos but it doesn't have to. It also shows where the trend line is going for hardware. A $10k Nvidia GPU from 10 years ago now sells for scrap. What a consumer-level computer in 5 years can run locally will probably shock a lot of people.
[1]: https://www.williamsmullen.com/insights/news/legal-news/ai-t...
The M7 Pro and M7 Max are scheduled for as early as the end of 2027, while the M7 Ultra is on track for 2028.
This means there won't be a redesigned MBP this year since there won't be M6 Pro/Max chips. People were expecting a redesigned slimmer MBP with OLED display later this year, myself included.I was holding out for one until I decided to switch from an M1 Pro 16" MBP to an M5 Air 15" due to the expected price increase. I think many M1 Pro/Max generation people were waiting to upgrade this year.
The extra ports are nice along with better speakers.
The actual laptop I want is an Air 15" with 120hz OLED screen.
They can release a redesigned MBP with the base M6 chip.
They don't want to tell the world how the new redesigned MBP is the best laptop in the world but it's slower than the older MBPs.
If they make a deal with say google to delay their own chips, could they profit more than by selling their production?
Demand is so crazy idk if this would begin to make sense
Are you upgrading from a perfectly good machine? Then wait.
I read it as the M6 being "high-end" in general, and Apple skipping the whole generation, which made no sense to me. But they are going to use the M6 at all, just not bother to create Max and Ultra versions of it.
But in terms of “noticing it” you are correct. You won’t pay attention after a day or two.
EDIT: this menu managing app will need permissios to make screen captures. So much for the privacy. Forgot to mention.
Elaborate on my problems... or help me fix them.
some kind of private-public partnership
sorry if thats already happening in some capacity, like i said - "stupid question"
but can the gov not just fast track this as a "national security" or something?
i think the usa should be the one who make 1nm or smaller chips on demand, even if it takes 5-10. years to do.
and yes i realize i might sound dumb here but i'm the one suffering from high hardware prices!!
I wonder how much the rumored 768GB RAM version will cost.
A top of range Mac is a depreciating asset and looks exactly the same as the other models physically.
hyperscalers better all IPO in the next 8 quarters
They need to pull out of this half assed bandwagon approach.
They don't need to pull out of this approach.
Do you really think the average Apple user will use it when there’s already better AI provided by OpenAI and Anthropic which don’t require advanced local hardware?
As for how it helps: we're not talking about this year's AI ecosystem, or even next year's. This rumor, assuming it's true, is talking about two chip generations into the future — and probably at least three or four chip generations before it's a mature AI platform. What will AI be doing for us in five years from now? How does Apple plan for that future? Will concerns of privacy increase or decrease in that time?
I don’t know, but the majority of it will be running in data centers, not high end consumer grade workstations.
>Will concerns of privacy increase or decrease in that time?
People have been more concerned about privacy than ever, it hasn’t seemed to stop people from using cloud AI services.
The fact that Apple missed the starting gun on AI to the point that they’re using Googles also inferior Gemini, I’m not compelled these Apple AI chips have a consumer.
Anyone savvy enough to do their own AI hosting is more likely to use Nvdia boxes, etc. Apple workstations always been more successful in Art/Graphic design. That’s why they cut off their server hardware long ago.
>How does Apple plan for that future?
Ideally by making attractive simple products regular users can buy. The iPhone and MacBook Neo seem a step in that direction. AI vision and Apple Vision scream 90s pre-iMac Apple.
1. NVidia aggressively segments the market on VRAM and will continue to do so. A 5090 with 32GB of RAM, ~21k CUDA cores and 1800GB/s of memory bandwidth is $3-4k. An RTX 6000 Pro with 96GB of RAM, ~24k CUDA cores and 1800GB/s memory bandwidth is ~$11k;
2. The 5090 won't be replaced until late 2028 or even 2029. There has been no mid-cycle refresh (eg 4080 Super vs 4080) and likely won't be either at all or for at least a year. If there is in a year, it basically confirms that the 6000 series won't be until 2028/2029. Also, the x090 never got a mid-cycle refresh so the current consumer high-end is staying that way for years;
3. The 6090 whenever it comes will still have 32GB of VRAM unless the memory market drastically changes;
4. Many have anticipated an M5 Max/Ultra refresh of the Mac Studio line in Q3. Given that Apple chose to hike the prices on Studios rather than discontinue them, I now think this isn't going to happen. We may not see a Studio refresh for up to 2 years. Apple has done this before with the Mac Pro;
5. M7 Max/Ultra will probably go to a memory bandwidth of 1.2-1.8TB/s vs the current tops of M3 Ultra, M4 Max and M5 Max of 600-900GB/s. This simply needs to go up to boost inference speed;
6. You'll also see the number of GPU cores go up. All of this will add up to an M7 Max being 50-80%+ of the performance of a 5090. That's huge given the shared memory architecture;
7. We may see the return of Apple using its massive cash pile for vendor-financing of an exclusive memory supply. This was one of Tim Apple's [sic] big innovations.
I guess it should be https://www.bloomberg.com/news/articles/2026-06-25/apple-to-...
EDIT: gift link if paywalled (archive.is capture is truncated): https://www.bloomberg.com/news/articles/2026-06-25/apple-to-...
You'd probably get faster prefill speeds, as well as better drivers for accelerated transcode and gaming applications.
> What sets the A20 apart isn’t just the node shrink—it’s the revolution in packaging. Apple is transitioning to Wafer-Level Multi-Chip Module (WLCM) integration, meaning that RAM will no longer be situated beside the chip, but rather on the chip wafer itself, integrated alongside the CPU, GPU, and Neural Engine.
This shift eliminates the need for silicon interposers and substrates, thereby enhancing signal integrity, improving thermal dissipation, and facilitating faster memory access with lower latency. The benefits? Better multitasking, smoother AI processing (hello, Apple Intelligence), improved battery life, and potentially a smaller chip footprint—freeing up space for other components.
https://hwbusters.com/news/apples-a20-chip-ushers-in-a-new-e...
It's entirely possible that TSMC is ramping up more slowly than expected.
And their explanation isn't really passing the smell test for me for other reasons, for instance the fact that DRAM processes are pretty radically different than bulk logic processes, which wouldn't really let you put it all on the same wafer, much less the same die. Even back in the day when you had eDRAM blocks (like the Xbox 360's eDRAM die), that was really a DRAM process with a bit of logic cells that wouldn't be competitive if they weren't sitting right next to the DRAM blocks.
I could be wrong here though, my examples are more than a bit long in the tooth.
> CoWoS (Chip-on-Wafer-on-Substrate)
https://semiwiki.com/wikis/industry-wikis/cowos-chip-on-wafe...
It's a more advanced update from their older InFO tech.
Also, InFO-oS is a CoWoS technique.
The article compares CoWoS to InFO.
> CoWoS uses a passive silicon interposer—etched with thousands of fine interconnects—that sits between the active dies and the package substrate. This interposer provides high-density connections between chiplets and supports memory integration, most notably High Bandwidth Memory (HBM) stacks.
It also mentions that CoWoS is the tech used to build Nvidia's high end AI accelerators.
> AI Accelerators: Nvidia H100, B100 (HBM3 via CoWoS)
Apple is taking a tech that has previously only been used in very high end enterprise applications and is using it to make consumer SOCs starting with this years iPhone.
Something's got to give here. I think it's your original article that's wrong, and it's poorly trying to describe InFO-M.
‘isn’t just / it’s also’ AI-ism should at least turn your AI radar on, then you get this weirdly formal structure that sounds like a trying-to-be-relatable press release:
“This shift eliminates A, thereby enhancing B, improving C, and facilitating D. The benefits? Better U, smoother V (hello, W), improved X, and potentially Y—freeing up Z.”
Question to self interjection, chipper ‘(hello, W)’ aside, topped off with a zero spaced emdash. 10000% AI, stylistically this isn’t text tuned to an HN audience, comments never sound like this. What’s funny is the paragraph it’s quoting has nearly the same style, the LLM probably picked that up.
Not trying to call anyone out, just pointing out the stylistic tells we should all be aware of.
As someone who wants to run effective llms locally for many things their other big benefit has been the unified memory studios for a small bit.