OpenAI Releases GPT-4 with Multimodal Capabilities

OpenAI unveils GPT-4, a large multimodal model accepting image and text inputs and generating text outputs. GPT-4 demonstrates human-level performance on academic and professional benchmarks, passing simulated bar exam in top 10% and achieving 5 on AP Calculus BC. The model exhibits advanced reasoning, reduced hallucinations, and improved safety alignment compared to GPT-3.5.

Zeph Tech Research Lead

Research lead, Zeph Tech

Credibility scores for every source cited in this briefing. Source data (JSON)

OpenAI released GPT-4 on March 14, 2023, marking a significant advancement in large language model capabilities. GPT-4 accepts both text and image inputs, processes visual information alongside natural language, and generates text responses. The model demonstrates improvements across reasoning, factual accuracy, safety alignment, and reduced harmful outputs compared to GPT-3.5. Microsoft integrated GPT-4 into Bing Chat, while OpenAI made it available through ChatGPT Plus subscriptions and API access with waitlist.

Performance Benchmarks and Capabilities

GPT-4 achieved remarkable results on standardized tests, scoring in the 90th percentile on the Uniform Bar Exam (simulated), 93rd percentile on SAT Reading, and 89th percentile on SAT Math. The model attained a perfect score (5/5) on AP Calculus BC, AP Psychology, AP Statistics, and AP U.S. History exams. On professional exams, GPT-4 passed the simulated Sommelier, CFA Level II, and medical licensing examinations.

The model's reasoning capabilities improved substantially over GPT-3.5. On the MMLU (Massive Multitask Language Understanding) benchmark testing knowledge across 57 subjects, GPT-4 scored 86.4% compared to GPT-3.5's 70.0%. The model demonstrates better contextual understanding, nuanced instruction following, and ability to handle complex multi-step reasoning tasks. GPT-4 supports context windows up to 32,768 tokens (approximately 24,000 words), enabling analysis of long documents.

Multimodal Vision Capabilities

GPT-4's image understanding capability processes photographs, diagrams, screenshots, and documents containing text and visuals. The model interprets charts and graphs, reads handwritten text, explains memes and jokes requiring visual context, and analyzes spatial relationships in images. Organizations use vision capabilities for document processing, visual question answering, accessibility tools describing images for visually impaired users, and educational applications.

OpenAI initially limited image input access to select partners including Be My Eyes (assistive technology for blind users) and academic researchers. The vision capability enables applications like analyzing architectural plans, troubleshooting technical issues from screenshots, processing invoices and receipts, and generating code from UI mockups.

Safety and Alignment Improvements

OpenAI implemented six months of safety training using adversarial testing and reinforcement learning from human feedback (RLHF). GPT-4 is 82% less likely to respond to requests for disallowed content compared to GPT-3.5, and 40% more likely to produce factual responses according to internal evaluations. The company engaged external experts in AI safety, cybersecurity, and adversarial testing to identify risks before release.

The model exhibits reduced hallucinations and improved calibration—knowing when it lacks information rather than fabricating plausible-sounding false information. GPT-4 incorporates rule-based reward models guiding model behavior, with fine-tuning on human preferences for helpful, harmless, and honest outputs. OpenAI published a technical report detailing safety work, limitations, and failure modes to inform responsible deployment.

Enterprise API Access and Integration

OpenAI launched GPT-4 API access through a waitlist, prioritizing developers demonstrating track records of building with GPT-3.5 and implementing safety best practices. The API offers two variants: gpt-4 (8,192 token context window) and gpt-4-32k (32,768 token context). Pricing for gpt-4 is $0.03 per 1K prompt tokens and $0.06 per 1K completion tokens, while gpt-4-32k costs $0.06/$0.12 per 1K tokens respectively.

Early enterprise adopters included Duolingo (conversational language learning), Khan Academy (tutoring assistant Khanmigo), Morgan Stanley (knowledge base search), and Stripe (fraud detection and support automation). The model's improved reasoning enables more complex enterprise applications including legal document analysis, financial modeling, medical diagnosis support, and sophisticated coding assistance.

Technical Architecture and Training

While OpenAI did not disclose specific architecture details, GPT-4 represents a transformer-based language model trained on diverse internet text, books, and proprietary datasets. The company emphasized post-training work including RLHF, Constitutional AI methods, and extensive safety testing. GPT-4 training completed in August 2022, with subsequent months dedicated to safety improvements and alignment research.

The model incorporates improved tokenization, better handling of multilingual content, and enhanced code generation capabilities. GPT-4 demonstrates strong performance across Python, JavaScript, TypeScript, and other programming languages, with ability to explain code, identify bugs, and suggest optimizations. The model supports function calling, enabling integration with external tools, APIs, and databases.

Limitations and Known Issues

OpenAI documented several limitations in the GPT-4 technical report. The model exhibits social biases present in training data, though less pronounced than GPT-3.5. GPT-4 still hallucinates facts and makes reasoning errors, particularly on complex mathematical proofs and edge cases. The model has a knowledge cutoff (September 2021 initially), lacking awareness of recent events.

GPT-4 can produce harmful content when users circumvent safety guardrails, necessitating ongoing safety monitoring. The model occasionally makes simple mistakes in arithmetic and logical reasoning that humans would not make. Performance degrades on adversarially constructed inputs designed to expose weaknesses. OpenAI committed to iterative deployment, collecting usage data to identify failure modes and improve safety.

Strategic Implications for Organizations

CTIOs should evaluate GPT-4 for applications requiring advanced reasoning, complex instruction following, and multimodal understanding. The model enables new use cases including visual document processing, sophisticated coding assistance, and applications requiring long context understanding. Organizations must balance capabilities against costs—GPT-4 is 15-30x more expensive than GPT-3.5, requiring careful cost optimization.

Technical teams should implement robust testing frameworks to validate GPT-4 outputs, design human-in-the-loop review processes for high-stakes decisions, and establish monitoring systems detecting hallucinations and errors. Organizations must develop prompt engineering expertise, fine-tune applications for specific domains, and implement fallback strategies when GPT-4 produces incorrect outputs. Data governance policies should address data sent to OpenAI APIs, including data residency, retention, and confidentiality requirements.

Horizontal bar chart of credibility scores per cited source. — Credibility scores for every source cited in this briefing. Source data (JSON)

Visit pillar hub

Latest guides

AI Workforce Enablement and Safeguards Guide — Zeph Tech
Equip employees for AI adoption with skills pathways, worker protections, and transparency controls aligned to U.S. Department of Labor principles, ISO/IEC 42001, and EU AI Act…
AI Incident Response and Resilience Guide — Zeph Tech
Coordinate AI-specific detection, escalation, and regulatory reporting that satisfy EU AI Act serious incident rules, OMB M-24-10 Section 7, and CIRCIA preparation.
AI Model Evaluation Operations Guide — Zeph Tech
Build traceable AI evaluation programmes that satisfy EU AI Act Annex VIII controls, OMB M-24-10 Appendix C evidence, and AISIC benchmarking requirements.

Comments

Community

We publish only high-quality, respectful contributions. Every submission is reviewed for clarity, sourcing, and safety before it appears here.

First name

Last name (optional)

Comment

Submissions showing "Awaiting moderation" are in review. Spam, low-effort posts, or unverifiable claims will be rejected. We verify submissions with the email you provide, and we never publish or sell that address.

Verification

Complete the CAPTCHA to submit.

Performance Benchmarks and Capabilities

Multimodal Vision Capabilities

Safety and Alignment Improvements

Enterprise API Access and Integration

Technical Architecture and Training

Limitations and Known Issues

Strategic Implications for Organizations

Related briefings

AI Briefing — OpenAI Launches GPT-4 for Developers

Meta Releases Llama 2 Open-Source Large Language Models

AI Governance Briefing — May 13, 2024

AI Platform Briefing — July 18, 2024

UK Publishes Pro-Innovation AI White Paper — March 29, 2023

Continue in the AI pillar

Latest guides

Comments