PaperTrails
HomeDiscover

Highest Rated Papers

Publication Date:
NewestOldest
Average Rating:
HighestLowestClear

Creating a Cooperative AI Policymaking Platform through Open Source Collaboration

Aiden Lewington, Alekhya Vittalam, Anshumaan Singh, Anuja Uppuluri, Arjun Ashok, Ashrith Mandayam Athmaram, Austin Milt, Benjamin Smith, Charlie Weinberger, Chatanya Sarin, Christoph Bergmeir, Cliff Chang, Daivik Patel, Daniel Li, David Bell, Defu Cao, Donghwa Shin, Edward Kang, Edwin Zhang, Enhui Li, Felix Chen, Gabe Smithline, Haipeng Chen, Henry Gasztowtt, Hoon Shin, Jiayun Zhang, Joshua Gray, Khai Hern Low, Kishan Patel, Lauren Hannah Cooke, Marco Burstein, Maya Kalapatapu, Mitali Mittal, Raymond Chen, Rosie Zhao, Sameen Majid, Samya Potlapalli, Shang Wang, Shrenik Patel, Shuheng Li, Siva Komaragiri, Song Lu, Sorawit Siangjaeo, Sunghoo Jung, Tianyu Zhang, Valery Mao, Vikram Krishnakumar, Vincent Zhu, Wesley Kam, Xingzhe Li, Yumeng Liu
arXiv·2024
Advances in artificial intelligence (AI) present significant risks and opportunities, requiring improved governance to mitigate societal harms and promote equitable benefits. Current incentive structures and regulatory delays may hinder responsible AI development and deployment, particularly in light of the transformative potential of large language models (LLMs). To address these challenges, we propose developing the following three contributions: (1) a large multimodal text and economic-timeseries foundation model that integrates economic and natural language policy data for enhanced forecasting and decision-making, (2) algorithmic mechanisms for eliciting diverse and representative perspectives, enabling the creation of data-driven public policy recommendations, and (3) an AI-driven web platform for supporting transparent, inclusive, and data-driven policymaking.
No ratings yet
View paper →

PaperBench: Evaluating AI's Ability to Replicate AI Research

Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Jun Shern Chan, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, Johannes Heidecke, Amelia Glaese, Tejal Patwardhan
arXiv·2025
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research. Agents must replicate 20 ICML 2024 Spotlight and Oral papers from scratch, including understanding paper contributions, developing a codebase, and successfully executing experiments. For objective evaluation, we develop rubrics that hierarchically decompose each replication task into smaller sub-tasks with clear grading criteria. In total, PaperBench contains 8,316 individually gradable tasks. Rubrics are co-developed with the author(s) of each ICML paper for accuracy and realism. To enable scalable evaluation, we also develop an LLM-based judge to automatically grade replication attempts against rubrics, and assess our judge's performance by creating a separate benchmark for judges. We evaluate several frontier models on PaperBench, finding that the best-performing tested agent, Claude 3.5 Sonnet (New) with open-source scaffolding, achieves an average replication score of 21.0%. Finally, we recruit top ML PhDs to attempt a subset of PaperBench, finding that models do not yet outperform the human baseline. We open-source our code (https://github.com/openai/preparedness) to facilitate future research in understanding the AI engineering capabilities of AI agents.
No ratings yet
View paper →

Commutative rings with $n$-$1$-absorbing prime factorization

Abdelhaq El Khalfi, Hicham Laarabi, Suat Koç
arXiv·2025
Let $R$ be a commutative ring with $1\neq 0$ and $n$ be a fixed positive integer. A proper ideal $I$ of $R$ is said to be an \textit{$n$-OA ideal} if whenever $a_1a_2\cdots a_{n+1}\in I$ for some nonunits $a_1,a_2,\ldots,a_{n+1}\in R$, then $a_1a_2\cdots a_n\in I$ or $a_{n+1}\in I$. A commutative ring $R$ is said to be an \textit{$n$-OAF ring} if every proper ideal $I$ of $R$ is a product of finitely many $n$-OA ideals. In fact, $1$-OAF rings and $2$-OAF $2$-OAF-rings are exactly the general ZPI rings and OAF rings, respectively. In addition to giving various properties of $n$-OAF rings, we give a characterization of Noetherian von Neumann regular rings in terms of our new concept. Furthermore, we investigate the $n$-OAF property of some extension of rings such as the polynomial ring $R[X]$, the formal power series ring $R[[X]]$, the ring of $A+XB[X]$, and the trivial extension $R=A\propto E$ of an $A$-module $E$.
No ratings yet
View paper →

Situational Awareness: The Decade Ahead

Leopold Aschenbrenner
situational-awareness.ai·2024
A series on the implications of AGI being within reach. On the path to superintelligence, covering topics including: AGI by 2027, the intelligence explosion, the challenges of alignment, the race between the US and China, and the project to secure superintelligence.
No ratings yet
View paper →

Weighted Tensor Decompositions for Context-aware Collaborative Filtering

Joey De Pauw, Bart Goethals
arXiv·2025
Over recent years it has become well accepted that user interest is not static or immutable. There are a variety of contextual factors, such as time of day, the weather or the user's mood, that influence the current interests of the user. Modelling approaches need to take these factors into account if they want to succeed at finding the most relevant content to recommend given the situation. A popular method for context-aware recommendation is to encode context attributes as extra dimensions of the classic user-item interaction matrix, effectively turning it into a tensor, followed by applying the appropriate tensor decomposition methods to learn missing values. However, unlike with matrix factorization, where all decompositions are essentially a product of matrices, there exist many more options for decomposing tensors by combining vector, matrix and tensor products. We study the most successful decomposition methods that use weighted square loss and categorize them based on their tensor structure and regularization strategy. Additionally, we further extend the pool of methods by filling in the missing combinations. In this paper we provide an overview of the properties of the different decomposition methods, such as their complexity, scalability, and modelling capacity. These benefits are then contrasted with the performances achieved in offline experiments to gain more insight into which method to choose depending on a specific situation and constraints.
No ratings yet
View paper →

Outcome-based Reinforcement Learning to Predict the Future

Benjamin Turtel, Danny Franklin, Kris Skotheim, Luke Hewitt, Philipp Schoenegger
arXiv·2025
Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that outcome-only online RL on a 14B model can match frontier-scale accuracy and surpass it in calibration and hypothetical prediction market betting by adapting two leading algorithms, Group-Relative Policy Optimisation (GRPO) and ReMax, to the forecasting setting. Our adaptations remove per-question variance scaling in GRPO, apply baseline-subtracted advantages in ReMax, hydrate training with 100k temporally consistent synthetic questions, and introduce lightweight guard-rails that penalise gibberish, non-English responses and missing rationales, enabling a single stable pass over 110k events. Scaling ReMax to 110k questions and ensembling seven predictions yields a 14B model that matches frontier baseline o1 on accuracy on our holdout set (Brier = 0.193, p = 0.23) while beating it in calibration (ECE = 0.042, p < 0.001). A simple trading rule turns this calibration edge into \$127 of hypothetical profit versus \$92 for o1 (p = 0.037). This demonstrates that refined RLVR methods can convert small-scale LLMs into potentially economically valuable forecasting tools, with implications for scaling this to larger models.
No ratings yet
View paper →

Why AI Safety Won't Make America Lose The Race With China

Scott Alexander
Substack·2025
...
No ratings yet
View paper →

Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor--Critic and Deep Deterministic Policy Gradient Algorithms

Kamal Paykan
arXiv·2025
This paper proposes a reinforcement learning--based framework for cryptocurrency portfolio management using the Soft Actor--Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Traditional portfolio optimization methods often struggle to adapt to the highly volatile and nonlinear dynamics of cryptocurrency markets. To address this, we design an agent that learns continuous trading actions directly from historical market data through interaction with a simulated trading environment. The agent optimizes portfolio weights to maximize cumulative returns while minimizing downside risk and transaction costs. Experimental evaluations on multiple cryptocurrencies demonstrate that the SAC and DDPG agents outperform baseline strategies such as equal-weighted and mean--variance portfolios. The SAC algorithm, with its entropy-regularized objective, shows greater stability and robustness in noisy market conditions compared to DDPG. These results highlight the potential of deep reinforcement learning for adaptive and data-driven portfolio management in cryptocurrency markets.
No ratings yet
View paper →

Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs

Simon D Angus, Lachlan O'Neill
arXiv·2024
Detecting issue framing in text - how different perspectives approach the same topic - is valuable for social science and policy analysis, yet challenging for automated methods due to subtle linguistic differences. We introduce `paired completion', a novel approach using LLM next-token log probabilities to detect contrasting frames using minimal examples. Through extensive evaluation across synthetic datasets and a human-labeled corpus, we demonstrate that paired completion is a cost-efficient, low-bias alternative to both prompt-based and embedding-based methods, offering a scalable solution for analyzing issue framing in large text collections, especially suited to low-resource settings.
No ratings yet
View paper →
PreviousPage 7 of 7