Path Machine: the idea of ​​a single algorithm

    Prehistory


    About 15 years ago I learned about the existence of fundamental paths - groups that can distinguish topological spaces by connectedness. Then it will not be about them, but they have prompted the idea of ​​a regressor and a classifier - without any optimizations based on memorizing the sample.

    Further details.

    Ideas and conjectures


    Since the fundamental paths are loops from a selected point and equivalence on them, how can you find such loops in the data? And is it necessary?

    The idea came by analogy with KNN and SVM - "analogistic" algorithms. What if the loop is replaced with a "pipe", a cylinder from one data point to another, and what got into the pipe is assigned to the same class as these two points of the path (not already fundamental)? The “tube” must be wide enough to properly classify ... and in the limit it is straight. The distance to the new, unknown point should be minimal on the way, then we will be able to refer it to the same class as the way.



    A regressor can be constructed from the projection of a point onto a straight line between data points, and interpolating target values ​​corresponding to data points with the same ratio in which the projection divides the path.

    This method of construction remembers the entire sample and gives accurate predictions on the training set.

    Primitive implementation


    How to build a path? I took the maximum element in the norm, and began to look for the one closest to it, connecting to get a path. In the end - he closed with the beginning (arguably of course, but like this).

    Estimator
    Это самая первая версия кода, смотрите ниже обновленный notebook.

    classPathMachine(BaseEstimator):def__init__(self, norm=np.linalg.norm, classify=False):
            self.norm = norm
            self.classify = classify
        deffind_start(self, X):
            index_max = None
            value_max = -np.inf
            for index, x in enumerate(X):
                value = self.norm(x)
                if value > value_max:
                    index_max = index
                    value_max = value
            return index_max
        deffind_next(self, point, target, X, y):
            index_min = None
            value_min = np.inf
            for index, x in enumerate(X):
                if self.classify and (y[index] != target):
                   continue
                value = self.norm(x - point)
                if value < value_min:
                    index_min = index
                    value_min = value
            return index_min
        deffit(self, X, y):
            X = np.copy(X)
            y = np.copy(y).flatten()
            self.paths = {} if self.classify else []
            start_index = self.find_start(X)
            start_value = X[start_index]
            start_target = y[start_index]
            X = np.delete(X, start_index, axis=0)
            y = np.delete(y, start_index, axis=0)
            while len(X) > 0:
                next_index = self.find_next(start_value, start_target, X, y)
                if self.classify and next_index isNone:
                    start_index = self.find_start(X)
                    start_value = X[start_index]
                    start_target = y[start_index]
                    continue
                next_target = y[next_index]
                if self.classify:
                    ifnot next_target in self.paths:
                        self.paths[next_target] = []
                    self.paths[next_target].append({
                        'start': start_value,
                        'next': X[next_index]
                    })
                else:
                    self.paths.append({
                        'start': start_value,
                        'next': X[next_index],
                        'value': start_target,
                        'target': next_target
                    })
                start_value = X[next_index]
                start_target = y[next_index]
                X = np.delete(X, next_index, axis=0)
                y = np.delete(y, next_index, axis=0)
            if self.classify:
                self.paths[start_target].append({
                    'start': start_value,
                    'next': self.paths[start_target][0]['start']
                })
            else:
                self.paths.append({
                    'start': start_value,
                    'next': self.paths[0]['start'],
                    'value': start_target,
                    'target': self.paths[0]['target']
                })
            return self
        defpredict(self, X):
            result = []
            for x in X:
                if self.classify:
                    predicted = None
                    min_distance = np.inf
                    for target in self.paths:
                        for path in self.paths[target]:
                            point = x - path['start']
                            line = path['next'] - path['start']
                            if np.allclose(self.norm(line), 0):
                                continue
                            direction = line / self.norm(line)
                            product = np.dot(point, direction)
                            projection = product * direction
                            distance = self.norm(projection - point)
                            if distance < min_distance:
                                predicted = target
                                min_distance = distance
                    result.append(predicted)
                else:
                    predicted = None
                    min_distance = np.inf
                    for path in self.paths:
                        point = x - path['start']
                        line = path['next'] - path['start']
                        if np.allclose(self.norm(line), 0):
                                continue
                        direction = line / self.norm(line)
                        product = np.dot(point, direction)
                        projection = product * direction
                        parameter = np.sign(product) * self.norm(projection) /\
                                   self.norm(line)
                        distance = self.norm(projection - point)
                        if distance < min_distance:
                            predicted = (1 - parameter) * path['value'] +\
                                       parameter * path['target']
                            min_distance = distance
                    result.append(predicted)
            return np.array(result)
        defscore(self, X, y):if self.classify:
                return f1_score(y.flatten(), self.predict(X), average='micro')
            else:
                return r2_score(y.flatten(), self.predict(X))
    


    Theoretically (and technically), it is also possible to do predict_proba - like 1 - normalized distances. But I didn’t think up a chance to test it outright, so ...

    Some tests


    I started with classic Boston Housing and Iris, and for the baseline I chose Ridge and LogisticRegression. Well, what, and suddenly.

    In general, I suggest to look jupyter notebook .

    In short: the regressor worked worse, the classifier worked better.
    update: after StandardScaler, the regressor is working better.

    I also rolled on synthetic datasets. There is generally a forest that firewood.

    But it was noticed that:

    1. Regressor works well in spaces of small dimensions,
    2. Both the regressor and the classifier are sensitive to noise, starting from a certain threshold,
    3. The regressor, unlike the ruler, suspected multimodality (at Boston Housing),
    4. By construction, the classifier worked well (best of all ... :)) on a linearly inseparable set of moons.

    Conclusions, pros and cons and applicability


    Plus, I personally do not really see in the current implementation. Both technically and mathematically. Perhaps this can be improved to something more sensible. Accordingly, I do not see any particular applicability either. Is it possible that without preprocessing to work with very small data ...

    There are a lot of minuses: it is required to keep all datasets in memory, the generalizing ability is based on extrapolation, the main and only hyperparameter - the norm - cannot be chosen.

    But, anyway, it was a surprise for me, which I decided to share here.

    Also popular now: