Jump to content

Property:Description

From AI Wiki

Usage60Improper assignments5

previous 20 20 50 100 250 500 next 20

Filter<p>The <a rel="nofollow" class="external text" href="https://www.semantic-mediawiki.org/wiki/Help:Property_page/Filter">search filter</a> allows the inclusion of <a rel="nofollow" class="external text" href="https://www.semantic-mediawiki.org/wiki/Help:Query_expressions">query expressions</a> such as <code>~</code> or <code>!</code>. The selected <a rel="nofollow" class="external text" href="https://www.semantic-mediawiki.org/wiki/Query_engine">query engine</a> might also support case insensitive matching or other short expressions like:</p><ul><li><code>in:</code> result should include the term, e.g. '<code>in:Foo</code>'</li></ul><ul><li><code>not:</code> result should to not include the term, e.g. '<code>not:Bar</code>'</li></ul>

Showing 20 pages using this property.

W

ChatGPT with unbiased access to the Web, can build products using No-Code playgrounds, and use API's. Powered by Web Requests. +

G

GPT Public Directory +

A directory assistant for finding and registering GPTs. With 11,000+ GPTs Available! +

A

A benchmark for measuring general intelligence through abstract reasoning and pattern recognition tasks +

H

Humanity's Last Exam +

Multi-modal AI benchmark testing frontier knowledge across 100+ academic subjects, designed to be the final robust academic test for large language models +

M

A more robust and challenging multi-task language understanding benchmark with 10-choice questions +

A

Aider Polyglot +

A challenging multi-language code generation benchmark testing LLMs on 225 difficult Exercism coding exercises across six programming languages +

A challenging mathematical reasoning benchmark based on the American Invitational Mathematics Examination 2024 problems, designed to evaluate AI models' ability to solve complex high school mathematics problems requiring multi-step reasoning +

A challenging mathematical reasoning benchmark based on the American Invitational Mathematics Examination 2025 problems, testing olympiad-level mathematical reasoning with complex multi-step problem solving +

G

A challenging subset of graduate-level, Google-proof science questions testing PhD-level knowledge in biology, physics, and chemistry +

M

Multilingual evaluation frameworks based on the Massive Multitask Language Understanding benchmark, including translations and adaptations for 26+ languages +

T

A benchmark for evaluating AI agents' ability to complete complex tasks through realistic tool-agent-user interactions in real-world domains +

A

A benchmark evaluating long context reasoning across multiple real-world documents (~100k tokens) +

C

Creative Writing v3 +

An LLM-judged creative writing benchmark using hybrid rubric and Elo scoring for enhanced discrimination +

E

An LLM-judged benchmark testing emotional intelligence through challenging role-plays and analysis tasks +

I

A benchmark for evaluating precise instruction following with verifiable out-of-domain constraints +

L

LiveCodeBench +

A holistic and contamination-free evaluation benchmark for code LLMs with continuous updates +

Longform Creative Writing +

An LLM-judged benchmark evaluating extended narrative generation across 8 chapters +

M

A massive multi-discipline multimodal benchmark evaluating expert-level understanding and reasoning across college-level subjects +

S

A research coding benchmark curated by scientists for realistic scientific problem-solving +

T

Terminal-Bench +

A benchmark for evaluating AI agents' ability to complete real-world, end-to-end tasks in terminal environments +

Showing 5 related entities.

L

LiveBench +

M

MathArena +
MMLU +

S

Retrieved from "https://aiwiki.ai/index.php?title=Property:Description&oldid=11033"