PaperTrails
Home

Newest and Highest Rated Papers

Publication Date:
NewestOldestClear
Average Rating:
HighestLowestClear

Let's Verify Step by Step

Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
arXiv·2023
In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.
No ratings yet
View paper →

Observational constraints on f(Q,T) gravity from the mass-radius relation and stability of compact stars

S. K. Maurya, Abdul Aziz, Ksh. Newton Singh, G. Mustafa, Y. Sekhmani, Saibal Ray
arXiv·2025
In this investigation we examine the astrophysical consequences of the influence of pressure anisotropy on the physical properties of observed pulsars within the background of $f(Q,T)$ gravity by choosing a specific form $f(Q, T)=ψ_1\, Q + ψ_2 T$, where $ψ_1$ and $ψ_2$ are the model parameters. Initially, we solve the modified field equations for anisotropic stellar configurations by assuming the physically valid metric potential along with anisotropic function for the distribution of the interior matter. We test the derived gravitational model subject to various stability conditions to confirm physically existence of compact stars within the $f(Q,T)$ gravity context. We analyze thoroughly the influence of anisotropy on the effective density, pressure and mass-radius relation of the stars. The present inspection of the model implies that the current gravitational models are non-singular and able to justify for the occurrence of observed pulsars with masses exceeding 2 $M_{\odot}$ as well as masses fall in the {\em mass gap} regime, in particular merger events like GW190814. The predicted radii for the observed stars of different masses fall within the range \{10.5 km, 14.5 km\} for $ψ_1\leq 1.05$ whereas the radius of PSR J074+6620 is predicted to fall within \{13.09 km, 14.66 km\} which is in agreement with the predicted radii range \{11.79 km, 15.01 km\} as can be found in the recent literature.
No ratings yet
View paper →

A Polynomial-Time Algorithm for Variational Inequalities under the Minty Condition

Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm, Brian Hu Zhang
arXiv·2025
Solving variational inequalities (SVIs) is a foundational problem at the heart of optimization. However, this expressivity comes at the cost of computational hardness. As a result, most research has focused on carving out specific subclasses that elude those intractability barriers. A classical property that goes back to the 1960s is the Minty condition, which postulates that the Minty VI (MVI) problem admits a solution. In this paper, we establish the first polynomial-time algorithm -- that is, with complexity growing polynomially in the dimension $d$ and $\log(1/ε)$ -- for solving $ε$-SVIs for Lipschitz continuous mappings under the Minty condition. Prior approaches either incurred an exponentially worse dependence on $1/ε$ or made restrictive assumptions. To do so, we introduce a new variant of the ellipsoid algorithm whereby separating hyperplanes are obtained after taking a gradient descent step from the center of the ellipsoid. It succeeds even though the set of SVIs can be nonconvex and not fully dimensional. Moreover, when our algorithm is applied to an instance with no MVI solution and fails to identify an SVI solution, it produces a succinct certificate of MVI infeasibility. We also show that deciding whether the Minty condition holds is $\mathsf{coNP}$-complete, thereby establishing that the disjunction of those two problems is polynomial-time solvable even though each problem is individually intractable. We provide several extensions and new applications of our main results. Most notably, we obtain the first polynomial-time algorithms for i) globally minimizing a (potentially nonsmooth) quasar-convex function, and ii) computing Nash equilibria in multi-player harmonic games. Finally, in two-player general-sum concave games, we give the first polynomial-time algorithm that outputs either a Nash equilibrium or a strict coarse correlated equilibrium.
No ratings yet
View paper →

GPT-4o System Card

OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis, Alexis Conneau, Ali Kamali, Allan Jabri, Allison Moyer, Allison Tam, Amadou Crookes, Amin Tootoochian, Amin Tootoonchian, Ananya Kumar, Andrea Vallone, Andrej Karpathy, Andrew Braunstein, Andrew Cann, Andrew Codispoti, Andrew Galu, Andrew Kondrich, Andrew Tulloch, Andrey Mishchenko, Angela Baek, Angela Jiang, Antoine Pelisse, Antonia Woodford, Anuj Gosalia, Arka Dhar, Ashley Pantuliano, Avi Nayak, Avital Oliver, Barret Zoph, Behrooz Ghorbani, Ben Leimberger, Ben Rossen, Ben Sokolowsky, Ben Wang, Benjamin Zweig, Beth Hoover, Blake Samic, Bob McGrew, Bobby Spero, Bogo Giertler, Bowen Cheng, Brad Lightcap, Brandon Walkin, Brendan Quinn, Brian Guarraci, Brian Hsu, Bright Kellogg, Brydon Eastman, Camillo Lugaresi, Carroll Wainwright, Cary Bassin, Cary Hudson, Casey Chu, Chad Nelson, Chak Li, Chan Jun Shern, Channing Conger, Charlotte Barette, Chelsea Voss, Chen Ding, Cheng Lu, Chong Zhang, Chris Beaumont, Chris Hallacy, Chris Koch, Christian Gibson, Christina Kim, Christine Choi, Christine McLeavey, Christopher Hesse, Claudia Fischer, Clemens Winter, Coley Czarnecki, Colin Jarvis, Colin Wei, Constantin Koumouzelis, Dane Sherburn, Daniel Kappler, Daniel Levin, Daniel Levy, David Carr, David Farhi, David Mely, David Robinson, David Sasaki, Denny Jin, Dev Valladares, Dimitris Tsipras, Doug Li, Duc Phong Nguyen, Duncan Findlay, Edede Oiwoh, Edmund Wong, Ehsan Asdar, Elizabeth Proehl, Elizabeth Yang, Eric Antonow, Eric Kramer, Eric Peterson, Eric Sigler, Eric Wallace, Eugene Brevdo, Evan Mays, Farzad Khorasani, Felipe Petroski Such, Filippo Raso, Francis Zhang, Fred von Lohmann, Freddie Sulit, Gabriel Goh, Gene Oden, Geoff Salmon, Giulio Starace, Greg Brockman, Hadi Salman, Haiming Bao, Haitang Hu, Hannah Wong, Haoyu Wang, Heather Schmidt, Heather Whitney, Heewoo Jun, Hendrik Kirchner, Henrique Ponde de Oliveira Pinto, Hongyu Ren, Huiwen Chang, Hyung Won Chung, Ian Kivlichan, Ian O'Connell, Ian O'Connell, Ian Osband, Ian Silber, Ian Sohl, Ibrahim Okuyucu, Ikai Lan, Ilya Kostrikov, Ilya Sutskever, Ingmar Kanitscheider, Ishaan Gulrajani, Jacob Coxon, Jacob Menick, Jakub Pachocki, James Aung, James Betker, James Crooks, James Lennon, Jamie Kiros, Jan Leike, Jane Park, Jason Kwon, Jason Phang, Jason Teplitz, Jason Wei, Jason Wolfe, Jay Chen, Jeff Harris, Jenia Varavva, Jessica Gan Lee, Jessica Shieh, Ji Lin, Jiahui Yu, Jiayi Weng, Jie Tang, Jieqi Yu, Joanne Jang, Joaquin Quinonero Candela, Joe Beutler, Joe Landers, Joel Parish, Johannes Heidecke, John Schulman, Jonathan Lachman, Jonathan McKay, Jonathan Uesato, Jonathan Ward, Jong Wook Kim, Joost Huizinga, Jordan Sitkin, Jos Kraaijeveld, Josh Gross, Josh Kaplan, Josh Snyder, Joshua Achiam, Joy Jiao, Joyce Lee, Juntang Zhuang, Justyn Harriman, Kai Fricke, Kai Hayashi, Karan Singhal, Katy Shi, Kavin Karthik, Kayla Wood, Kendra Rimbach, Kenny Hsu, Kenny Nguyen, Keren Gu-Lemberg, Kevin Button, Kevin Liu, Kiel Howe, Krithika Muthukumar, Kyle Luther, Lama Ahmad, Larry Kai, Lauren Itow, Lauren Workman, Leher Pathak, Leo Chen, Li Jing, Lia Guy, Liam Fedus, Liang Zhou, Lien Mamitsuka, Lilian Weng, Lindsay McCallum, Lindsey Held, Long Ouyang, Louis Feuvrier, Lu Zhang, Lukas Kondraciuk, Lukasz Kaiser, Luke Hewitt, Luke Metz, Lyric Doshi, Mada Aflak, Maddie Simens, Madelaine Boyd, Madeleine Thompson, Marat Dukhan, Mark Chen, Mark Gray, Mark Hudnall, Marvin Zhang, Marwan Aljubeh, Mateusz Litwin, Matthew Zeng, Max Johnson, Maya Shetty, Mayank Gupta, Meghan Shah, Mehmet Yatbaz, Meng Jia Yang, Mengchao Zhong, Mia Glaese, Mianna Chen, Michael Janner, Michael Lampe, Michael Petrov, Michael Wu, Michele Wang, Michelle Fradin, Michelle Pokrass, Miguel Castro, Miguel Oom Temudo de Castro, Mikhail Pavlov, Miles Brundage, Miles Wang, Minal Khan, Mira Murati, Mo Bavarian, Molly Lin, Murat Yesildal, Nacho Soto, Natalia Gimelshein, Natalie Cone, Natalie Staudacher, Natalie Summers, Natan LaFontaine, Neil Chowdhury, Nick Ryder, Nick Stathas, Nick Turley, Nik Tezak, Niko Felix, Nithanth Kudige, Nitish Keskar, Noah Deutsch, Noel Bundick, Nora Puckett, Ofir Nachum, Ola Okelola, Oleg Boiko, Oleg Murk, Oliver Jaffe, Olivia Watkins, Olivier Godement, Owen Campbell-Moore, Patrick Chao, Paul McMillan, Pavel Belov, Peng Su, Peter Bak, Peter Bakkum, Peter Deng, Peter Dolan, Peter Hoeschele, Peter Welinder, Phil Tillet, Philip Pronin, Philippe Tillet, Prafulla Dhariwal, Qiming Yuan, Rachel Dias, Rachel Lim, Rahul Arora, Rajan Troll, Randall Lin, Rapha Gontijo Lopes, Raul Puri, Reah Miyara, Reimar Leike, Renaud Gaubert, Reza Zamani, Ricky Wang, Rob Donnelly, Rob Honsby, Rocky Smith, Rohan Sahai, Rohit Ramchandani, Romain Huet, Rory Carmichael, Rowan Zellers, Roy Chen, Ruby Chen, Ruslan Nigmatullin, Ryan Cheu, Saachi Jain, Sam Altman, Sam Schoenholz, Sam Toizer, Samuel Miserendino, Sandhini Agarwal, Sara Culver, Scott Ethersmith, Scott Gray, Sean Grove, Sean Metzger, Shamez Hermani, Shantanu Jain, Shengjia Zhao, Sherwin Wu, Shino Jomoto, Shirong Wu, Shuaiqi, Xia, Sonia Phene, Spencer Papay, Srinivas Narayanan, Steve Coffey, Steve Lee, Stewart Hall, Suchir Balaji, Tal Broda, Tal Stramer, Tao Xu, Tarun Gogineni, Taya Christianson, Ted Sanders, Tejal Patwardhan, Thomas Cunninghman, Thomas Degry, Thomas Dimson, Thomas Raoux, Thomas Shadwell, Tianhao Zheng, Todd Underwood, Todor Markov, Toki Sherbakov, Tom Rubin, Tom Stasi, Tomer Kaftan, Tristan Heywood, Troy Peterson, Tyce Walters, Tyna Eloundou, Valerie Qi, Veit Moeller, Vinnie Monaco, Vishal Kuo, Vlad Fomenko, Wayne Chang, Weiyi Zheng, Wenda Zhou, Wesam Manassra, Will Sheu, Wojciech Zaremba, Yash Patil, Yilei Qian, Yongjik Kim, Youlong Cheng, Yu Zhang, Yuchen He, Yuchen Zhang, Yujia Jin, Yunxing Dai, Yury Malkov
arXiv·2024
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
No ratings yet
View paper →

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, Jeffrey Wu
arXiv·2024
Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstruction and sparsity objectives and the presence of dead latents. We propose using k-sparse autoencoders [Makhzani and Frey, 2013] to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. Additionally, we find modifications that result in few dead latents, even at the largest scales we tried. Using these techniques, we find clean scaling laws with respect to autoencoder size and sparsity. We also introduce several new metrics for evaluating feature quality based on the recovery of hypothesized features, the explainability of activation patterns, and the sparsity of downstream effects. These metrics all generally improve with autoencoder size. To demonstrate the scalability of our approach, we train a 16 million latent autoencoder on GPT-4 activations for 40 billion tokens. We release training code and autoencoders for open-source models, as well as a visualizer.
No ratings yet
View paper →

Can Gibbons-Hawking Radiation and Inflation Arise Due to Spacetime Quanta?

Naouel Boulkaboul
arXiv·2020
In this study, we provide an alternatively reformulated interpretation of Gibbons-Hawking radiation as well as inflation. By using a spacetime quantization procedure, proposed recently by L.C. Céleri et al., in anti-de Sitter space we show that Gibbons-Hawking radiation is an intrinsic property of the concerned space, that arises due to the existence of a scalar field whose quanta "carry" a length $l$ (i.e. the radius of the hyperboloid curvature). Furthermore, within the context of Tsallis q-framework, we propose an inflationary model that depends on the non-extensive parameter $q$. The main source of such an inflation is the same scalar field mentioned before. Being constrained by the observational data, the q-parameter along with the rest of the model's parameters has been used to estimate the time at which inflation ends as well as the reheating temperature. The latter is found to be related to Gibbons-Hawking temperature. Thus, the present model offers an alternative perspective regarding the nature of the cosmic background radiation (CMB).
No ratings yet
View paper →

Spin precession frequencies of a test gyroscope around a naked singularity and quasi-periodic oscillations

Tehreem Zahra, Mubasher Jamil, Mustapha Azreg-Aïnou
arXiv·2025
Various studies show that the gravitational collapse of inhomogeneous matter clouds leads to naked singularity formation. We investigate here the spin precession frequency of a test gyroscope attached to a stationary observer in a rotating naked singularity spacetime. In the weak field limit, Lense-Thirring precession for rotating naked singularity and geodetic precession in the asymptotic limit for null naked singularity are found to be equal to that of a Kerr black hole and a Schwarzschild black hole, respectively. In addition, we can distinguish a rotating naked singularity and a Kerr naked singularity for an observer in the equatorial plane using spin precession. To this end, we have found the constraints on the parameters of rotating naked singularity by employing the Monte Carlo Markov Chain simulation and using the observation from five quasi-periodic sources within the relativistic precession model. Our analysis shows that the measurement of spin parameter estimate for GRO J1655-40 is in disagreement with the value found from the continuum-fitting method, while for XTEJ1859+226 and GRS 1915+105, it is inconsistent with spectral analysis results.
No ratings yet
View paper →

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom
arXiv·2023
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
No ratings yet
View paper →

The New AI Consciousness Paper

Scott Alexander
Substack·2025
...
No ratings yet
View paper →

Competitive Programming with Large Reasoning Models

OpenAI, :, Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaiev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou
arXiv·2025
We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models - OpenAI o1 and an early checkpoint of o3 - with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad in Informatics (IOI). We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed competition constraints, o1-ioi achieved a gold medal. However, when evaluating later models such as o3, we find that o3 achieves gold without hand-crafted domain-specific strategies or relaxed constraints. Our findings show that although specialized pipelines such as o1-ioi yield solid improvements, the scaled-up, general-purpose o3 model surpasses those results without relying on hand-crafted inference heuristics. Notably, o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors. Overall, these results indicate that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, offers a robust path toward state-of-the-art AI in reasoning domains, such as competitive programming.
No ratings yet
View paper →

Memory effects in Kundt wave spacetimes

Indranil Chakraborty, Sayan Kar
arXiv·2020
Memory effects in the exact Kundt wave spacetimes are shown to arise in the behaviour of geodesics in such spacetimes. The types of Kundt spacetimes we consider here are direct products of the form $H^2\times M(1,1)$ and $S^2\times M(1,1)$. Both geometries have constant scalar curvature. We consider a scenario in which initial velocities of the transverse geodesic coordinates are set to zero (before the arrival of the pulse) in a spacetime with non-vanishing background curvature. We look for changes in the separation between pairs of geodesics caused by the pulse. Any relative change observed in the position and velocity profiles of geodesics, after the burst, can be solely attributed to the wave (hence, a memory effect). For constant negative curvature, we find there is permanent change in the separation of geodesics after the pulse has departed. Thus, there is displacement memory, though no velocity memory is found. In the case of constant positive scalar curvature (Plebański-Hacyan spacetimes), we find both displacement and velocity memory along one direction. In the other direction, a new kind of memory (which we term as frequency memory effect) is observed where the separation between the geodesics shows periodic oscillations once the pulse has left. We also carry out similar analyses for spacetimes with a non-constant scalar curvature, which may be positive or negative. The results here seem to qualitatively agree with those for constant scalar curvature, thereby suggesting a link between the nature of memory and curvature.
No ratings yet
View paper →

Price manipulation schemes of new crypto-tokens in decentralized exchanges

Manuel Naviglio, Francesco Tarantelli, Fabrizio Lillo
arXiv·2025
Blockchain technology has revolutionized financial markets by enabling decentralized exchanges (DEXs) that operate without intermediaries. Uniswap V2, a leading DEX, facilitates the rapid creation and trading of new tokens, which offer high return potential but exposing investors to significant risks. In this work, we analyze the financial impact of newly created tokens, assessing their market dynamics, profitability and liquidity manipulations. Our findings reveal that a significant portion of market liquidity is trapped in honeypots, reducing market efficiency and misleading investors. Applying a simple buy-and-hold strategy, we are able to uncover some major risks associated with investing in newly created tokens, including the widespread presence of rug pulls and sandwich attacks. We extract the optimal sandwich amount, revealing that their proliferation in new tokens stems from higher profitability in low-liquidity pools. Furthermore, we analyze the fundamental differences between token price evolution in swap time and physical time. Using clustering techniques, we highlight these differences and identify typical patterns of honeypot and sellable tokens. Our study provides insights into the risks and financial dynamics of decentralized markets and their challenges for investors.
No ratings yet
View paper →
PreviousPage 3 of 7Next