Catchup

  • Ilya Sutskever’s Pivot: Why Scaling Alone Won’t Solve AGI

    This video highlights a significant shift in the AI research landscape, marking the moment Ilya Sutskever—a central figure in modern AI—aligns with the perspective that current methods are insufficient for achieving Artificial General Intelligence (AGI). While scaling laws and current paradigms will continue to yield improvements, Sutskever acknowledges that a fundamental, non-theorized breakthrough is required to bridge the gap between current models and true intelligence.

    The Disconnect Between Benchmarks and Reality

    A primary concern addressed is the paradox where AI models score exceptionally well on difficult evaluations yet fail to produce a comparable economic impact. Sutskever attributes this potentially to Reinforcement Learning (RL) training. Unlike pre-training, which consumes all available data, RL involves selective data curation. This process often inadvertently leads to models that are “taught to the test,” optimizing for specific benchmarks while lacking the robust generalization needed for real-world tasks.

    Redefining AGI: The Super-Learning Agent

    Sutskever proposes a redefined vision of AGI. Instead of a static “oracle” that knows how to perform every job immediately, he envisions a system akin to a “super-intelligent teenager.” This model would possess a superior, fundamental learning algorithm capable of mastering any domain through trial and error after deployment. The goal shifts from creating a finished product to creating a system capable of rapid, autonomous upskilling.

    The Necessity of Internal Value Functions

    To achieve this learning capability, AI must develop a mechanism for self-correction similar to humans. Sutskever uses the analogy of a teenage driver who knows when they are driving poorly without an instructor telling them. Humans possess a robust internal “value function” that guides learning; replicating this internal feedback loop is identified as a critical missing piece in current machine learning architectures.

    Timelines and The Research Landscape

    Despite the challenges, Sutskever remains optimistic, predicting the arrival of this self-learning superintelligence within 5 to 20 years. The video also touches on the current state of the industry, noting that the shift from open collaboration to closed, competitive silos among major labs (like OpenAI and Google) may actually be slowing progress due to the duplication of research efforts.

    Mentoring question

    If the definition of AGI is shifting from a ‘know-it-all’ system to a ‘learn-it-all’ system, how should you adapt your own continuous learning strategies to remain relevant alongside machines that can master new skills faster than humans? Source: https://youtube.com/watch?v=ye_HKsDcVsc&is=BeNC7xLbrKF9oZeM
  • 2025-49 From Chaos to Cathedral: Engineering Your Mind, Tech, and Life

    Welcome to this week’s Learning Capsule. As we navigate a world oscillating between rapid technological advancement and the timeless struggles of human nature, a common theme emerges: the shift from reactive survival to intentional design. Whether we are discussing the architecture of a software program or the architecture of our daily habits, the lesson is clear: relying on willpower and “vibes” is ending. It is time to build systems.

    The Mindset: Building Cathedrals, Not Just Laying Bricks

    Let’s start with the foundation of how we view our work. Are you a worker or a leader? As highlighted in Worker vs. Leader: 5 Mindset Shifts, this distinction isn’t about your job title. It’s about the difference between the bricklayer who says, “I’m laying bricks,” and the one who says, “I’m building a cathedral.” Leaders shift from asking “What must I do?” (tasks) to “Why does this matter?” (purpose). They value long-term vision over short-term survival.

    But how do we sustain that vision when we are tired? We often blame a lack of discipline, but Achieving Top 1% Success teaches us that willpower is a battery that drains. The top 1% don’t have more willpower; they have better systems. They use "forcing functions"—like public commitments or deleting apps—to make the hard work automatic. They don’t fight biology; they engineer their environment so success is the path of least resistance.

    Practical Tool: When things go wrong and you face blame, do not explain or defend. Use the strategy from Mastering High-Status Responses. Stop the momentum by asking, "Hang on, exactly what are you saying I did?" and then pivot to strategy: "What outcome are you specifically looking for?" This moves you from the accused to the architect of the solution.

    The Tech Shift: The End of “Vibe Coding”

    This need for structure is mirrored in the world of technology. The era of chaotic, experimental AI use is closing. We are witnessing The End of Vibe Coding. Organizations are moving from experimental chaos to rigorous engineering, requiring governance and "boring" reliability.

    This maturity shows up in the details. Just as a leader respects the team, a professional developer respects the file system. Top 5 File Naming Conventions reminds us that messy naming (spaces, special characters) breaks workflows. Consistency is the mark of a professional.

    Simultaneously, the "AI Arms Race" is heating up. With Gemini 3 solving handwriting recognition, we see the validation of the "Bitter Lesson": massive scale eventually beats specialized human tuning. However, raw power isn’t enough. As discussed in The Gemini 3 Reset, the battle is shifting from who has the smartest model to who owns the distribution (Apple/Google) and who can integrate into workflows most effectively. For developers, this means optimizing costs, perhaps by adopting new formats like TOON (Token Oriented Object Notation) to make communication with LLMs more efficient than standard JSON.

    The Human OS: Biology Over Force

    Just as we optimize our code, we must optimize our biology. If you struggle with focus, it might not be a character flaw; it might be your dopamine management. Mastering Your ADHD suggests protecting three critical 30-minute windows: morning activation (no phones), post-lunch movement (to fight the crash), and a pre-bed shutdown. You are debugging your own nervous system.

    This biological awareness extends to how we treat others. When a child (or an adult) acts out, Mirror Neurons dictate that if you respond with anger, you are literally teaching them to be angry. To teach respect, you must model regulation, not demand submission.

    Furthermore, understand the Running Leap. Sometimes, a lack of progress—in yourself or a child—is actually a strategic retreat to gather momentum for a bigger jump. And when dealing with adults, remember that true intelligence is the willingness to change your mind, and Upward Feedback is the ultimate test of psychological safety in a team.

    The Life Review: Connection Over Convenience

    Finally, let’s look at the quality of our lives. Are we letting algorithms dictate our tastes? Curating Music in an Algorithmic World warns against "digital excess" and encourages active discovery. Similarly, The Battle for the Soul of Skiing highlights how corporatization strips the soul from our communities. We must vote with our wallets to save the "independents."

    And for the long haul? A study on senior longevity reveals that intense exercise isn’t the magic bullet for everyone; recreational activity—gardening, socializing, playing—is often more effective. It turns out that joy and connection are the ultimate life-extenders.

    The Takeaway: Whether you are naming a file, raising a child, or leading a company, stop reacting to the noise. Pause. Design a system. And remember to build a cathedral, not just a wall.

    • In which area of your life are you currently operating like a ‘worker’ (checking boxes) rather than a ‘leader’ (building a legacy), and what one outcome could you redefine today?
    • If a new developer (or successor) inherited your current projects or workflows today, would they find a clean, professional structure, or a chaotic ‘vibe-coded’ mess?
    • Reflecting on the ‘Running Leap’ concept: Where have you interpreted a recent step backward as failure, when it might actually be necessary momentum for your next breakthrough?
    • Are you sacrificing the ‘soul’ of your hobbies (like music or skiing) for the convenience of algorithmic feeds and corporate passes?
    • When was the last time you actively chose to let go of the need to be right in a conversation to genuinely consider a contradictory perspective?
  • A Vibe Check on Vibe Coding: The Rise and Fall of an AI Trend

    “Vibe coding,” a term coined by Andrej Karpathy to describe a programming style where developers blindly trust AI agents to write code, rapidly evolved from a playful experiment into Silicon Valley’s latest obsession. Major tech companies like Microsoft and Meta quickly embraced the trend, predicting AI would soon generate the majority of codebases. However, the definition quickly blurred, conflating reckless “accept all” behaviors with general AI-assisted programming, normalizing a culture of permissiveness regarding code quality.

    When the Vibes Turned Bad

    The hype cycle hit a wall as practical issues emerged. High-profile incidents, such as Google and Replit AI agents deleting user files or hallucinating commands, highlighted the risks of unsupervised AI. Developer trust has significantly eroded; a Stack Overflow survey revealed that while usage is high, trust in accuracy has dropped to 29%. Many engineers report that fixing “almost right” AI code takes longer than writing it manually, challenging the narrative of increased productivity.

    Economic and Security Realities

    Beyond workflow frustrations, the economics of vibe coding are struggling. While revenue for tools like Cursor grew, inference costs exploded, leading to unsustainable pricing models and a subsequent 30-50% drop in user traffic by late 2024. Security audits present a grimmer picture: AI-assisted developers were found to generate ten times more security issues, including exposed credentials and architectural flaws.

    Conclusion: The Return to Human Oversight

    The article concludes that the “vibe coding” phenomenon is a classic case of tech enthusiasm outpacing reality. Even Karpathy has retreated from the methodology, revealing that he hand-coded his latest project because current agents were not reliable enough. The industry is now pivoting back to a model where AI augments human developers rather than replacing them, emphasizing the necessity of experience and strict code review.

    Mentoring question

    Considering the evidence that AI-generated code can introduce subtle bugs and significant security flaws, how would you redesign your team’s code review process to safely leverage AI tools?

    Source: https://share.google/ef9hHNmjW4EKaFEYj

  • Software development has a ‘996’ problem

    The article draws a critical parallel between the grueling ‘996’ work culture (9 a.m. to 9 p.m., 6 days a week) and the emerging trend of using AI to generate massive volumes of code. Author Matt Asay argues that just as human burnout rarely leads to innovation, using AI to brute-force code generation results in bloated, derivative, and unmanageable software.

    The High Cost of Code Churn

    Evidence from GitClear and GitHub suggests that while AI helps developers code significantly faster, it correlates with a spike in ‘code churn’—lines of code that are modified or deleted within two weeks. The data shows an increase in copy-pasted code and a decrease in refactoring. This creates a trap where the constraint on innovation is viewed as the number of characters typed rather than clarity of thought, leading to codebases that are harder to secure and maintain.

    Code is a Liability, Not an Asset

    A central argument is that software development is a decision-making process, not a typing contest. Every line of code shipped represents a liability that requires debugging and maintenance. The article emphasizes that:

    • Senior engineering is defined by knowing what code not to write.
    • AI-generated bloat creates a larger surface area for complexity and technical debt.
    • Innovation requires ‘slack’ time for deep thinking, which is lost if developers are constantly acting as janitors for AI-generated output.

    Human-Centric AI Strategy

    To avoid building a ‘996 culture on silicon,’ the author suggests using AI to handle drudgery (boilerplate, unit tests) specifically to buy back time for high-value human tasks:

    • Framing the problem: Determining if a feature is actually necessary for the customer.
    • Ruthless editing: Celebrating ‘negative code’ commits that delete complexity rather than adding to it.
    • Owning the blast radius: Ensuring engineers maintain enough system understanding to debug outages without relying on the AI that wrote the code.

    Mentoring question

    Are you using the efficiency gained from AI tools to simply ship more features faster, or are you investing that saved time into refining problem definitions and reducing the overall complexity of your codebase?

    Source: https://www.infoworld.com/article/4094801/software-development-has-a-996-problem.html

  • From Small Town to 420 Million PLN: Michał Lidzbarski’s “AI First” Business Strategy

    Michał Lidzbarski, the founder of the educational platform Web To Learn, shares his journey from a home office in a small Polish town to building a business where creators have generated over 420 million PLN in sales. Now serving 1.2 million users, Lidzbarski is pivoting the company toward an "AI First" model, launching new startups like MultiTools and debunking myths about Artificial Intelligence in the workplace.

    Building a Self-Funded Empire

    The inception of Web To Learn stemmed from Lidzbarski’s personal passion for teaching and the lack of suitable platforms to host online courses. For years, the business was entirely bootstrapped, developed after hours while he worked a full-time job. It wasn’t until the COVID-19 pandemic that the platform saw a massive surge in demand as schools and trainers moved online. The company operated without external funding until 2023, when an investor came on board to help capitalize on the emerging AI revolution.

    The “AI First” Approach

    Lidzbarski is currently restructuring the entire organization around an "AI First" philosophy. This integration occurs on multiple levels:

    • Customer Service: Chatbots handle repetitive inquiries, freeing up human staff.
    • Product Enhancement: AI assistants help students by analyzing their progress and explaining difficult concepts, acting as tutors rather than just tools.
    • New Products: The startup MultiTools allows businesses to create AI agents and automate workflows.

    Job Evolution, Not Elimination

    Contrary to the fear that AI destroys jobs, Lidzbarski cites his own company as proof of the opposite. Since integrating AI, his team has grown from a handful of people to 26 employees. He argues that AI transforms roles rather than eliminating them; for example, graphic designers now manage AI generation tools. The company actively updates its internal processes whenever a new AI breakthrough occurs, requiring constant adaptation from the team.

    Recruitment based on Passion

    Operating out of Kościerzyna (population 25,000), Lidzbarski faced challenges finding specialized talent. His solution was to hire based on passion and technological enthusiasm rather than specific keywords in a CV. The company developed robust internal training and onboarding programs (now assisted by AI) to upskill local talent, proving that tech success is possible outside major metropolitan hubs.

    Market Challenges and the Future

    While Web To Learn competes with global giants by offering affordable, "all-in-one" solutions tailored to the local market, Lidzbarski notes a significant lag in AI adoption in Poland. Only about 6% of Polish SMEs utilize AI, placing the country near the bottom of European rankings. He views AI adoption as crucial for Polish businesses to survive rising regulatory costs and taxes. The company’s future focus is entirely on developing accessible AI tools that support non-technical entrepreneurs.

    Mentoring question

    considering the article’s evidence that AI transforms rather than eliminates roles, which specific repetitive tasks in your current workflow could be automated to free up your time for higher-value creative or strategic work?

    Source: https://strefabiznesu.pl/z-malego-miasta-do-420-mln-zl-ze-sprzedazy-kursow-polak-stawia-wszystko-na-ai/ar/c3p2-28217717

  • Why Losing Weight Is Harder Than Becoming a Millionaire

    The Central Theme: Wealth vs. Health Dynamics

    The author argues that statistically, it is harder for an American man to reach 15% body fat than to become a millionaire. A key distinction is drawn between the two processes: wealth building gets easier over time due to compound interest and knowledge, whereas weight loss becomes progressively more difficult as the body adapts, metabolism slows, and hunger increases.

    The Adversarial Environment

    Two major external factors make weight loss exceptionally difficult:

    • The Food Industry: We are surrounded by cheap, ultra-processed, high-calorie foods that provide energy but destroy health.
    • Information Chaos: Unlike finance, where basic principles are generally agreed upon, nutrition is plagued by conflicting advice and tribalism, leading people to give up due to confusion.

    The Strategy: Boring Repetition and Consistency

    The author emphasizes that the secret to success is “boring repetition.” Just as buying stable stocks over years builds wealth, eating the same nutritious meals repeatedly minimizes decision fatigue and error. Novelty is often a trap set by influencers.

    Key Takeaways

    • 100% Consistency is Required: Unlike other areas where “good enough” works, getting to low body fat requires strict adherence (95-100%). One cheat meal can undo days of deficit.
    • Data over Intuition: Success required weighing food, cooking at home, and tracking everything manually (using apps like Fitatu).
    • Delayed Visual Gratification: Visual changes take a long time to appear. The author uses the analogy of draining a bathtub filled with stones: you must drain a lot of water before the stones at the bottom are revealed.

    Mentoring question

    In which area of your life are you seeking excitement and variety, when ‘boring repetition’ and strict consistency are actually what is required to achieve your goal?

    Source: https://52notatki.substack.com/p/dlaczego-trudniej-jest-schudnac-niz

  • 2025-48
  • The Nuance of Feedback: Avoiding Common Management Pitfalls

    The Crisis of Effective Feedback

    Despite the prevalence of regular feedback sessions in modern companies, statistics reveal a significant gap in their effectiveness. Research indicates that while 45% of employees receive weekly feedback, nearly half find it demotivating. Instead of fostering growth, improper feedback often leads to stress, decreased confidence, and damaged relationships. The core issue lies not in the frequency of the conversations, but in the quality, intention, and delivery of the message.

    The Critical Error: Judging the Person vs. The Behavior

    One of the most damaging mistakes managers make is criticizing a subordinate’s character rather than their specific actions. For instance, telling an employee “you are unengaged” attacks their identity, triggering defensive reactions or withdrawal. In contrast, stating “you did not deliver the report on time” focuses on objective facts. Experts argue that feedback must target behavior to maintain the employee’s psychological safety and readiness to change.

    Common Management Traps

    Beyond personal attacks, managers frequently fall into several other traps:

    • Lack of Specificity: Vague comments like “try harder” or “it could be better” leave employees confused about what actually needs to be fixed.
    • The Monologue: Effective feedback must be a dialogue. If a manager speaks without asking “how do you see this?”, they miss the chance for shared understanding and reflection.
    • Poor Timing: Feedback delivered in anger, frustration, or public settings destroys trust. Both parties need to be emotionally prepared for a constructive conversation.
    • Mixing Functions: Managers often confuse evaluation (which affects salary and status) with development (which requires openness). These should ideally be separate conversations.

    Abandoning the “Sandwich” Method

    The traditional “sandwich” method—hiding criticism between two compliments—is increasingly viewed as obsolete and manipulative. It tends to dilute the core message, leaving employees unsure of where they stand. Instead, experts recommend separating the celebration of success from corrective conversations. Corrective feedback should be task-oriented, specific, and focused on future improvement.

    The FUKO Model and Micro-Feedback

    To improve communication, the article suggests using the FUKO method (an acronym based on Polish terms, translating to Facts, Feelings, Consequences, Expectations):

    1. Facts: Describe the specific situation or behavior without judgment.
    2. Feelings: Express how this behavior affects the manager or team.
    3. Consequences: Explain the impact on the project or organization.
    4. Expectations: Clearly state what needs to change.

    Furthermore, adapting to modern communication habits, managers should utilize “micro-feedback”—short, frequent, and specific interactions that guide employees in real-time, similar to immediate reinforcement loops found in social media, but focused on competence building rather than approval seeking.

    Mentoring question

    Reflect on the last corrective conversation you had: did you clearly separate the employee’s identity from their actions, and was your primary intention to judge their past performance or to provide a specific roadmap for their future growth?

    Source: https://www.pulshr.pl/pr-wewnetrzny/drobna-roznica-w-slowach-ale-ogromna-w-skutkach-tak-nie-wolno-rozmawiac-z-pracownikami,115286.html

  • The Subtle Sign of High Intelligence: It’s Not Just About IQ

    Beyond Education and Vocabulary

    While society often associates intelligence with academic degrees, extensive vocabulary, and status, true wisdom manifests in less obvious ways. Dr. Emma Jones, a palliative care physician and burnout coach, suggests that high intelligence is better reflected by specific behavioral traits rather than just credentials.

    The Willingness to Change Your Mind

    The defining characteristic of a highly intelligent person is the ability to change their opinion. Dr. Jones explains that while most people prioritize protecting their ego and "saving face," smart individuals are comfortable saying, "I used to think that…" or "You make a good point, let me rethink this."

    Curiosity Over Defensiveness

    Instead of becoming defensive or obsessed with "winning" a debate, intelligent people remain curious. They ask questions like "What am I missing?" and view being wrong not as an insult, but as an opportunity to update their data. This aligns with Albert Einstein’s observation that the true measure of intelligence is the ability to change. Ultimately, the capacity to put the ego aside and accept new facts without shame is a strong indicator of superior intellect.

    Mentoring question

    When was the last time you actively chose to let go of the need to be right in a conversation to genuinely consider a contradictory perspective?

    Source: https://kobieta.interia.pl/psychologia/news-co-odroznia-ludzi-ponadprzecietnie-inteligentnych-ekspertka,nId,22456862

  • Upward Feedback: A Strategic Tool for Organizational Maturity and Leadership Growth

    The Central Theme: Breaking the Silence

    The article explores the concept of “upward feedback”—employees providing feedback to their supervisors. Despite its potential to transform organizations, it remains a rare practice due to fears of consequences, lack of psychological safety, and cultural misconceptions. The text argues that upward feedback is not just an act of courage but a definitive test of an organization’s maturity and a crucial “mirror” for leaders to understand their impact.

    Key Barriers and Misconceptions

    Several factors prevent employees from speaking up:

    • Fear of Retaliation: Employees worry about negative consequences or being perceived as disloyal or demanding.
    • Misinterpretation of Praise: Even positive feedback is withheld because employees fear being labeled as sycophants or manipulative.
    • Lack of Psychological Safety: As highlighted by Google’s Project Aristotle, without a safe environment, teams choose silence over improvement.
    • Cultural Norms: In some organizations, critiquing a superior is viewed as inappropriate or disrespectful.

    The Value of Feedback for Leaders

    When implemented correctly, upward feedback offers significant benefits:

    • The “Mirror” Effect: It helps managers see how their specific behaviors (e.g., micromanagement) are perceived by the team (e.g., as coldness or lack of trust).
    • Enhanced Engagement: Employees feel their voice matters, which increases agency and motivation.
    • Crisis Resilience: Organizations that listen in both directions learn faster and adapt better to challenges.

    Best Practices for Implementation

    Experts suggest that while leaders should ideally initiate the process to set an example, feedback can originate from any level. Effective delivery relies on specific guidelines:

    • Preparation and Consent: Ask if the person is ready to receive feedback.
    • Privacy and Timing: Conduct conversations one-on-one, never in public, and avoid times of high emotion.
    • Focus on Facts: Discuss specific behaviors and facts rather than attacking the person’s character.
    • Respectful Boundaries: The conversation must remain respectful. If it turns into a vent for frustration or aggression, boundaries have been crossed.

    Handling Defensive Reactions

    Not every manager is ready to accept feedback. If a leader reacts with defensiveness, anger, or dismissal (fight or flight response), it is best to pause the conversation. The article advises employees to respect their own safety and boundaries, noting that in toxic environments where trust is absent, it may be wiser to withhold feedback or use “feedforward”—focusing on future suggestions rather than past mistakes.

    Mentoring question

    If you were to ask your team today what one behavior of yours they would like you to change to help them succeed, are you prepared to listen to the answer without becoming defensive?

    Source: https://www.pulshr.pl/pr-wewnetrzny/informacja-ktora-moze-zmienic-wszystko-test-dojrzalosci-dla-firmy-lustro-dla-lidera,115364.html

  • New Study Reveals Which Activities Actually Extend Senior Life

    A recent study conducted in China, involving nearly 10,000 older adults, sheds new light on how different types of physical activity impact longevity. Contrary to popular belief, not all exercise is equally beneficial for seniors, and some forms may not provide the expected protection against aging.

    The Impact of Activity Types on Mortality

    Researchers analyzed data from the Chinese Longitudinal Health Longevity Survey to determine how specific activities influence mortality rates. They categorized physical exertion into three groups: physical labor, regular formal exercise, and recreational activity (such as gardening, housework, or playing cards). The study also accounted for participants’ Genetic Risk Score (GRS) regarding longevity.

    Key Findings: Recreation Over Intensity

    The study produced several significant findings regarding how seniors should approach movement:

    • Recreational Activity Wins: Activities like gardening and socializing provided the greatest health benefits. Seniors with high levels of recreational activity saw a 14% to 16% reduction in mortality risk, regardless of their genetic predisposition.
    • Formal Exercise Limitations: Regular, structured exercise was found to reduce mortality risk only in individuals who already possessed a high genetic potential for longevity.
    • Ineffectiveness of Hard Labor: Despite being physically intense, hard physical labor showed no significant impact on reducing mortality.

    Conclusions for Healthy Aging

    The results suggest that seniors should prioritize activities that are less physically demanding and more socially or mentally engaging. Recreational activities are accessible, low-risk, and offer benefits for both mental and social health, which are crucial for longevity. The study highlights the need for personalized exercise recommendations, suggesting that simple daily pleasures—like walking a dog or playing games—may be more effective for extending health span than intense physical exertion.

    Mentoring question

    Considering your current lifestyle, how can you shift your focus from intense exertion to consistent, enjoyable recreational activities to better support your long-term health?

    Source: https://www.onet.pl/styl-zycia/onetkobieta/jakie-cwiczenia-przedluzaja-zycie-seniora-nowe-wyniki-z-chin-zaskakuja/6hzpbsb,2b83378a

  • Gemini 3 Solves Handwriting Recognition and it’s a Bitter Lesson

    A Major Breakthrough in Handwriting Recognition

    The article reports that Gemini 3 Pro has effectively solved the long-standing problem of Handwriting Text Recognition (HTR) for English documents. Sixty years after early computer scientists first dreamed of machines reading human text, Gemini 3 has achieved performance levels comparable to expert human typists. Tests on 18th and 19th-century documents reveal that the model consistently produces trustworthy transcripts without hallucinations, fulfilling the promise of the “Golden Age of AI.”

    Performance vs. Specialized Tools

    Traditionally, HTR relied on specialized systems like Transkribus, which require fine-tuning to achieve Character Error Rates (CER) of around 3%. In contrast, Gemini 3 Pro, a generalist Large Language Model (LLM), achieved a strict CER of 1.67% and a modified CER of 0.69% (excluding minor punctuation corrections) without specific training on the test set. This performance significantly outpaces competitors like Claude Opus 4.5 and OpenAI models, making it approximately 50% better than the best fine-tuned specialized models.

    Understanding Error Patterns and Configuration

    The author highlights distinct differences between human and AI errors. While humans make mechanical typos, Gemini’s errors are predictive; it tends to “fix” historical spelling, capitalization, and punctuation based on statistical probabilities. Crucially, the tests found that the model performs best when its “thinking” or “reasoning” parameters are set to the minimum. Excessive reasoning time causes the model to over-analyze visual data, leading to poorer results.

    The “Bitter Lesson” for Historians

    The success of Gemini 3 validates Richard Sutton’s “Bitter Lesson”: that general methods leveraging massive computation eventually outperform specialized, human-designed systems. For the historical community, this signals a paradigm shift. Historians and archivists can now process vast amounts of handwritten text cheaply (approx. 1 cent per page) and accurately, moving away from complex, rule-based HTR software toward generalist AI scaling.

    Mentoring question

    As generalist AI models begin to outperform specialized tools in niche tasks like handwriting recognition, how should you adapt your current workflows to leverage these scaling capabilities rather than relying on legacy software?

    Source: https://open.substack.com/pub/generativehistory/p/gemini-3-solves-handwriting-recognition?utm_source=share&utm_medium=android&r=4ncjv