نتایج جستجو برای: k means clustering

تعداد نتایج: 786274  

Journal: :Theor. Comput. Sci. 2011
Tobias Brunsch Heiko Röglin

k-means++ is a seeding technique for the k-means method with an expected approximation ratio of O(log k), where k denotes the number of clusters. Examples are known on which the expected approximation ratio of k-means++ is Ω(log k), showing that the upper bound is asymptotically tight. However, it remained open whether k-means++ yields an O(1)-approximation with probability 1/poly(k) or even wi...

Journal: :journal of computer and robotics 0
rasool azimi faculty of computer and information technology engineering, qazvin branch, islamic azad university, qazvin, iran hedieh sajedi department of computer science, college of science, university of tehran, tehran, iran

identifying clusters or clustering is an important aspect of data analysis. it is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. it is a main task of exploratory data mining, and a common technique for statistical data analysis this paper proposed an improved version of k-means algorithm, namely persistent k...

Journal: :Knowl.-Based Syst. 2017
Marco Capó Aritz Pérez Martínez José Antonio Lozano

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to manipulate and analyze such information. In spite of its dependency on the initial settings and the large number of distance computations that it can require to converge, the K-means algorithm remains as one of the most popular clustering methods for massive data...

2017
Olivier Bachem Mario Lucic Andreas Krause

The k-means++ algorithm is the state of the art algorithm to solve k-Means clustering problems as the computed clusterings are O(log k) competitive in expectation. However, its seeding step requires k inherently sequential passes through the full data set making it hard to scale to massive data sets. The standard remedy is to use the k-means‖ algorithm which reduces the number of sequential rou...

2005
Chinatsu Arima Kazumi Hakamada Masahiro Okamoto Taizo Hanai

2016
Pyotr Bochkaryov Vasiliy Kireev

В настоящее время происходит активное накопление данных большого объёма в различных информационных средах, таких как социальные, корпоративные, научные и другие. Интенсивное использование больших данных в различных областях стимулирует повышенный интерес исследователей к развитию методов и средств обработки и анализа массивных данных огромных объёмов и значительного многообразия. Одним из персп...

Journal: :Softwaretechnik-Trends 2010
Steffen Herbold Jens Grabowski Helmut Neukirchen Stephan Waack

Software projects are usually analyzed by experts based on their previous experience, their intuition and data they gather about the project. In this work, we show an approach for a purely data-driven retrospective project analysis. We plan to build on this work to make predictions about the evolution of software projects.

2015
Deniz Kilinç Fatma Bozyigit Akin Özçift Fatih Yücalar Emin Borandag

Özet. Yazılım teknolojileri hızla ilerlemekte ve buna paralel olarak hem kamu alanında hem de özel sektörde gerçekleştirilen yazılım projelerinin sayısı artmaktadır. Yazılım otomasyon projelerinden elde edilen en büyük çıktılardan birisi kuşkusuz ki üretilen verilerdir. Yüksek boyutlu, anlaşılması güç bu verilerin işlenerek, daha anlamlı ve yönlendirici verilere dönüştürülmesi önemli bir ihtiya...

2011
Xue Jiang Xianpei Han Le Sun

In this paper, we describe our work at subtopic mining subtask in NTCIR-9 in simplified Chinese. To find possible subtopics of a specific query, we select related queries recorded by query log, or titles of searching results provided by Google and Baidu, or the catalog of corresponding entry in Baidu encyclopedia, which are lexically similar as the original query, then we apply k-means algorith...

2008
Mark Ashton John Barnard Florence Casset Michael Charlton Geoffrey Downs Dominique Gorse John Holliday Roger Lahana Peter Willett

This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means clusterbased selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید