Photos and videos on a menu item

A diner deciding between three pasta dishes scrolls past five lines of description and stops on the photo. The photo sells the dish; the missing photo loses the order. This page is for the people who set up that gallery — owners, marketing managers, chefs running a menu refresh — and explains where the pictures come from, how the system stores them, and why the same one photo ends up on three different screens at three different sizes.

This page is for two readers. If you’re the owner, “Why this page exists” is for you — start there. If you’re the chef, jump to “How to use it” and skim down.

Why this page exists

The photo on the dish is the most expensive piece of real estate on the menu. A bad photo doesn’t just look bad — it lowers the price the diner is willing to pay, drops the click-through on the social share, and leaves the AI menu assistant guessing at what the dish actually looks like. Every photo on every dish should be deliberate.

The other reason this page exists is consistency. A kitchen with thirty dishes and thirty photos taken on three different phones in three different lights is a kitchen whose menu feels random — even if every dish is excellent. The gallery is where that gets fixed: upload everything in one place, edit captions and alt text in one place, set the hero (the photo that lands on the card in the grid and on the social share) in one place. One photo lives once; the system renders it everywhere it needs to.

The third reason — quieter but strategic — is AI discoverability. When the support AI is asked “what does the Pappardelle al cinghiale look like?” or the recommendation engine is choosing which dishes to surface, the image and its caption and its alt text feed the answer. Without them, the AI is guessing.

The rule

One photo, three sizes, every screen. The system stores three variants of every picture you upload — a small thumbnail for the grid card, a medium version for the detail page, the original for the social share and the download. The right size streams to the right screen automatically. You never pick the size; the system does.

What you can do here

The gallery lives inside the Edit menu item page, in the Media section about halfway down. Below the section header, every photo and video already on the item appears as a tile in a grid of three columns. A dotted Add media button at the bottom of the grid opens the upload modal.

Three ways to add a photo. The first is the chef’s phone — open the upload modal, the Upload tab is already selected, tap the dotted area, choose a file from the photo roll. Photos go up to 10 MB; the system accepts the usual phone formats (the iPhone’s HEIC, JPG, PNG, WebP) and converts whatever it gets into the canonical web format on the way in. The second is AI Generate — the other tab in the same modal. Pick a style curated for the dish’s category (Vintage Cookbook for pastas, Marble Studio for wines), pick an aspect ratio, hit Generate image. Ten to twenty seconds later a new tile appears with an AI generated badge. The third is AI Enhance on any photo already in the gallery — open the photo, the right-hand panel has an Enhance section that takes the source and produces a polished version in the same style language. The original stays put; the enhanced version lands beside it with an AI enhanced badge.

Video, too. The same upload area accepts a short video clip — MP4 or WebM, up to 50 MB. A 5-to-10-second muted autoplay clip is the sweet spot: steam rising off a bowl of ragù, a cocktail being shaken, a pizza coming out of the wood-fired oven. Videos play on the rich item-detail page; the grid card always uses the hero photo. Other video containers (.mov, .avi, .mkv) play unevenly across phones and the system refuses them at upload — convert to MP4 first.

The hero, the order, the hide, the download. Every item has exactly one hero — the photo used on the grid card, the social-share image, and as the first slide of the detail-page gallery. Tap the star in any tile’s top-right corner to mark it; picking a new one un-marks the old. Drag any tile to a new position and the order saves automatically; the diner sees the gallery in this order (except the hero, which always leads). The eye icon toggles whether a tile is public — hidden tiles stay in your gallery for staging but never reach the diner. The down-arrow pulls the full-resolution original back to your device for re-use on flyers or invitations.

How to use it

Open the menu item and scroll to the Media section. The first photo for a new dish is almost always a phone photo from the chef. Tap Add media, the modal opens on the Upload tab, tap the dotted area, pick the file off the phone. Twenty seconds later the new tile appears with an Uploaded badge in the top-left so you can tell at a glance which source produced it.

Tap that tile to open the editor. The left shows a big preview; the right has three fields. Caption (EN) is the line shown beneath the photo on the public detail page — short, evocative, optional. Hand-rolled in our kitchen. Caption (TH) is the Thai translation if you sell in Thai. Alt text is the one-sentence description for accessibility — what a screen reader reads aloud, and what the AI uses when it can’t actually see the photo. Plate of cacio e pepe topped with cracked pepper. Tap Save.

For a dish that doesn’t have a photo yet, switch the modal to AI Generate. The system reads the category and offers styles a platform curator has set up for it. Pick the style, choose an aspect ratio (1:1 for grid cards, 4:3 or 16:9 for hero banners), tap Generate image. A short wait, the tile appears with the AI generated badge, edit caption and alt text the same way.

For a photo that’s almost right — the lighting is flat, the framing is off — open the tile, scroll the right panel to the AI Enhance card, pick a style, tap Enhance with AI. A polished version appears as a new tile. Compare the two, keep whichever you prefer, delete the other.

Once the gallery has three or four tiles, pick a hero by tapping its star and drag the rest into the order you want the diner to see. The next time the menu refreshes — on the public site and inside the admin — the new gallery is live.

What happens behind the scenes

When you upload a photo, the system does three things in the background. First, if the file is in the iPhone’s HEIC format, it converts it to a web format in the browser before the upload — older browsers can’t read HEIC, and the AI tools downstream can’t either. Second, if the photo is larger than 4000 pixels wide, the system downscales it so storage and bandwidth stay reasonable. Third, the system publishes three variants to cloud storage: a small thumbnail for the grid card (400 pixels wide), a medium version for the detail page (1000 pixels wide), and the original (up to 1600 pixels wide) that the download button serves. The page that needs the small version streams the small version; the page that needs the medium streams the medium. The diner on a slow connection sees the thumbnail crisp and fast; the diner on a fast connection sees the medium with the original ready for a tap to zoom.

Video doesn’t get the small/medium/large fan-out — the cloud-storage transformer that handles photos doesn’t do video. The file is validated, uploaded once, and stored as a single file referenced by the gallery row. The public site renders a video tag pointing at it directly.

When the diner opens the item, the grid card uses the hero photo (or the first image if no hero is set; videos are skipped on the card). The detail page shows the full gallery as a swipeable carousel. Videos autoplay muted when they scroll into view, pause when they scroll out, and don’t autoplay at all if the diner has the system Reduce Motion setting on — accessibility comes first. Tapping a video brings up the native player controls so the diner can scrub or unmute.

The AI reads the photo and its alt text on three surfaces. The recommendation engine on the home page reads the alt text to decide which dishes to surface. The support AI uses the photo when a diner asks “what does the carbonara look like?” — the image feeds the answer. The image schema on the public page tells search engines what’s pictured, so the dish appears in image search.

Worked example

Sara is the head chef at your venue. The new wild boar pasta is going on the menu tomorrow — Pappardelle al cinghiale. The dish doesn’t have a photo yet.

She plates a fresh portion at the pass at the end of lunch service — a tangle of broad pasta ribbons with the dark sugo, a sprig of rosemary on top — and takes one photo on her phone, daylight from the side window. She opens the dish in the admin right there. Edit menu item → Media → Add media → Upload → choose file → done. The tile appears with the Uploaded badge. She taps the tile, types the caption Pappardelle hand-rolled, wild boar slow-braised in Chianti and the alt text Plate of pappardelle with wild boar ragù, sprinkled with rosemary, taps Save.

She wants something more polished than a phone shot for the hero — the dish is going on the homepage feature carousel next week and the grid card is what makes the click. She taps Add media again, switches to AI Generate, the system has auto-detected the category (Pasta), she picks Vintage Cookbook, leaves the aspect ratio at 1:1, taps Generate image. Twelve seconds later a new tile appears with the AI generated badge — same dish, warmer light, the kind of photo that looks like it walked out of an old Italian cookbook. She taps the star to mark it as the hero, drags it to the front of the grid for good measure.

Tomorrow morning her sous chef films a 6-second clip on the prep counter — wild boar braising in a copper pan, steam rolling off the surface — and uploads it. It plays on the detail page; the card still uses the AI-generated hero because cards don’t play video.

Two hours after the launch, Sara opens the public menu on her phone in a slow corner of the dining room. The grid card loads instantly because the small thumbnail is streaming first. She taps through; the medium version loads next, sharper, with the video autoplaying silently below the photo. Half an hour later the first three orders of Pappardelle al cinghiale come in. The new dish is live; the gallery is doing its job.

Menu items — overview — where the gallery sits inside the broader menu-item editor
How the AI thinks about your menu — why the alt text and the caption matter beyond accessibility — the image schema feeds the AI’s answer
Online vs in-venue — the difference between the gallery shown on the public site and the smaller card the in-venue till uses
Where your information lives — the five rooms of the system; the gallery lives on the menu item in Vetrina (the window)
Prep area — overview — unrelated but worth knowing: the prep board has its own photos on step-by-step recipe instructions, separate from the public gallery