PaperTrails
Home

Oldest Papers

Publication Date:
NewestOldestClear
Average Rating:
HighestLowest

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean
arXiv·2017
The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.
No ratings yet
View paper →

Curiosity-driven Exploration by Self-supervised Prediction

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
arXiv·2017
In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether. In such cases, curiosity can serve as an intrinsic reward signal to enable the agent to explore its environment and learn skills that might be useful later in its life. We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. The proposed approach is evaluated in two environments: VizDoom and Super Mario Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where curiosity allows for far fewer interactions with the environment to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios (e.g. new levels of the same game) where the knowledge gained from earlier experience helps the agent explore new places much faster than starting from scratch. Demo video and code available at https://pathak22.github.io/noreward-rl/
No ratings yet
View paper →

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
arXiv·2017
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
No ratings yet
View paper →

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel
arXiv·2017
Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.
No ratings yet
View paper →

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
arXiv·2017
The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
No ratings yet
View paper →

Nonlinear magnetically charged black holes in 4D Einstein-Gauss-Bonnet gravity

Kimet Jusufi
arXiv·2020
In this letter we present an exact spherically symmetric and magnetically charged black hole solution with exponential model of nonlinear electrodynamics [S. Kruglov, Annals Phys. 378, 59-70 (2017)] in the context of 4D Einstein-Gauss-Bonnet (EGB) gravity. We show that our $-$ve branch, in the limit of GB coupling coefficient $α\rightarrow 0$ and the nonlinear parameter $β\to 0$, reduces to the magnetically charged black hole of Einstein-Maxwell gravity in GR. In addition we study the embedding diagram of the black hole geometry and the thermodynamic properties such as the Hawking temperature and the heat capacity of our black hole solution.
No ratings yet
View paper →

Can Gibbons-Hawking Radiation and Inflation Arise Due to Spacetime Quanta?

Naouel Boulkaboul
arXiv·2020
In this study, we provide an alternatively reformulated interpretation of Gibbons-Hawking radiation as well as inflation. By using a spacetime quantization procedure, proposed recently by L.C. Céleri et al., in anti-de Sitter space we show that Gibbons-Hawking radiation is an intrinsic property of the concerned space, that arises due to the existence of a scalar field whose quanta "carry" a length $l$ (i.e. the radius of the hyperboloid curvature). Furthermore, within the context of Tsallis q-framework, we propose an inflationary model that depends on the non-extensive parameter $q$. The main source of such an inflation is the same scalar field mentioned before. Being constrained by the observational data, the q-parameter along with the rest of the model's parameters has been used to estimate the time at which inflation ends as well as the reheating temperature. The latter is found to be related to Gibbons-Hawking temperature. Thus, the present model offers an alternative perspective regarding the nature of the cosmic background radiation (CMB).
No ratings yet
View paper →

X-ray reflection spectroscopy with Kaluza-Klein black holes

Jiachen Zhu, Askar B. Abdikamalov, Dimitry Ayzenberg, Mustapha Azreg-Ainou, Cosimo Bambi, Mubasher Jamil, Sourabh Nampalliwar, Ashutosh Tripathi, Menglei Zhou
arXiv·2020
Kaluza-Klein theory is a popular alternative theory of gravity, with both non-rotating and rotating black hole solutions known. This allows for the possibility that the theory could be observationally tested. We present a model which calculates the reflection spectrum of a black hole accretion disk system, where the black hole is described by a rotating solution of the Kaluza-Klein theory. We also use this model to analyze X-ray data from the stella-mass black hole in GRS 1915+105 and provide constraints on the free parameters of the Kaluza-Klein black holes.
No ratings yet
View paper →

Memory effects in Kundt wave spacetimes

Indranil Chakraborty, Sayan Kar
arXiv·2020
Memory effects in the exact Kundt wave spacetimes are shown to arise in the behaviour of geodesics in such spacetimes. The types of Kundt spacetimes we consider here are direct products of the form $H^2\times M(1,1)$ and $S^2\times M(1,1)$. Both geometries have constant scalar curvature. We consider a scenario in which initial velocities of the transverse geodesic coordinates are set to zero (before the arrival of the pulse) in a spacetime with non-vanishing background curvature. We look for changes in the separation between pairs of geodesics caused by the pulse. Any relative change observed in the position and velocity profiles of geodesics, after the burst, can be solely attributed to the wave (hence, a memory effect). For constant negative curvature, we find there is permanent change in the separation of geodesics after the pulse has departed. Thus, there is displacement memory, though no velocity memory is found. In the case of constant positive scalar curvature (Plebański-Hacyan spacetimes), we find both displacement and velocity memory along one direction. In the other direction, a new kind of memory (which we term as frequency memory effect) is observed where the separation between the geodesics shows periodic oscillations once the pulse has left. We also carry out similar analyses for spacetimes with a non-constant scalar curvature, which may be positive or negative. The results here seem to qualitatively agree with those for constant scalar curvature, thereby suggesting a link between the nature of memory and curvature.
No ratings yet
View paper →

Rotating five-dimensional electrically charged Bardeen regular black holes

Muhammed Amir, Md Sabir Ali, Sunil D. Maharaj
arXiv·2020
We derive a rotating counterpart of the five-dimensional electrically charged Bardeen regular black holes spacetime by employing the Giampieri algorithm on static one. The associated nonlinear electrodynamics source is computed in order to justify the rotating solution. We thoroughly discuss the energy conditions and the other properties of the rotating spacetime. The black hole thermodynamics of the rotating spacetime is also presented. In particular, the thermodynamic quantities such as the Hawking temperature and the heat capacity are calculated and plotted to see the thermal behavior. The Hawking temperature profile of the black hole implies that the regular black holes are thermally colder than its singular counterpart. On the other hand, we find that the heat capacity has two branches: the negative branch corresponds to the unstable phase and the positive branch corresponds to that of the stable phase for a suitable choice of the physical parameters characterizing the black holes.
No ratings yet
View paper →

Constraining the tidal charge of brane black holes using their shadows

Juliano C. S. Neves
arXiv·2020
A constraint on the tidal charge generated within a brane world is shown. Using the shadow of a rotating black hole in a brane context in order to describe the M87* parameters recently announced by the Event Horizon Telescope Collaboration, the deviation from circularity of the reported shadow produces an upper bound on the bulk's nonlocal effect, which is conceived of as a tidal charge in the four-dimensional brane induced by the five-dimensional bulk. Therefore, a deviation from circularity $\lesssim 10\%$ leads to an upper bound on the tidal charge $\lesssim 0.004M^2$.
No ratings yet
View paper →

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal, Alex Nichol
arXiv·2021
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet 256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256$\times$256 and 3.85 on ImageNet 512$\times$512. We release our code at https://github.com/openai/guided-diffusion
No ratings yet
View paper →
Page 1 of 7Next