Why, in Deep Learning, Non-smooth Activation Function Works Better Than Smooth Ones
نویسندگان
چکیده
Since in the physical world, most dependencies are smooth (differentiable), traditionally, functions were used to approximate these dependencies. In particular, neural networks activation such as sigmoid function. However, successes of deep learning showed that many cases, non-smooth like $$\max (0,z)$$ work much better. this paper, we explain why approximating often better—even when approximated dependence is smooth.
منابع مشابه
Why Bigger Windows Are Better Than Smaller Ones
We investigate the use of multi-term query concepts to improve the performance of text-retrieval systems that accept \natural-language" queries. A relevance feedback process is explained that massively expands an initial query with single and multi-term concepts. The multi-term concepts are modelled as a set of words appearing within windows of varying sizes. Experimental results suggest that w...
متن کاملWhy & When Deep Learning Works: Looking Inside Deep Learnings
In recent years, Deep Learning has emerged as the leading technology for accomplishing broad range of artificial intelligence tasks (LeCun et al. (2015); Goodfellow et al. (2016)). Deep learning is the state-of-the-art approach across many domains, including object recognition and identification, text understating and translation, question answering, and more. In addition, it is expected to pla...
متن کاملSmooth biproximity spaces and P-smooth quasi-proximity spaces
The notion of smooth biproximity space where $delta_1,delta_2$ are gradation proximities defined by Ghanim et al. [10]. In this paper, we show every smooth biproximity space $(X,delta_1,delta_2)$ induces a supra smooth proximity space $delta_{12}$ finer than $delta_1$ and $delta_2$. We study the relationship between $(X,delta_{12})$ and the $FP^*$-separation axioms which had been introduced by...
متن کاملWhy is Posterior Sampling Better than Optimism for Reinforcement Learning?
Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an ̃ O(H p SAT ) Bayesian regret bound for PSRL in finite-horizon episodic Markov decision ...
متن کاملTen good reasons why structured graphs can be better than flat ones
This talk presents our proposal, called ADR, for the design of reconfigurable software systems. ADR is based on hierarchical graphs with interfaces and it has been conceived in the attempt of reconciling software architectures and process calculi by means of graphical methods. We illustrate the main motivations behind ADR and the current advancements on its foundations, applications and tool su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Studies in systems, decision and control
سال: 2023
ISSN: ['2198-4182', '2198-4190']
DOI: https://doi.org/10.1007/978-3-031-16415-6_16