PyDTL is a simple Python library for Decision Tree Learning, Bagging and Random Forests.
From here, you can:
The RandomForest constructor needs a training set, represented by a pydtl.LocalTable object, and a target attribute. Local tables can be dumped from a variety of database formats, including SQLite or CSV:
import pydtl db = pydtl.SQLiteDB('observations.sqlite') table = db.dump_table('events') forest = pydtl.RandomForest(table, target='frequentation')
Grow the forest using the grow_trees() method (if you have pygraphviz installed you can see the result using draw(), or print it otherwise):
forest.grow_trees(42) try: forest.draw() except ImportError: print forest
Use the predict() method for new predictions:
square_errors = [] samples = table.sample_rows(42) for inst in samples: y_pred = forest.predict(inst) y_real = inst['frequentation'] square_errors.append((y_pred - y_real)**2) mse = sum(square_errors) / len(square_errors) print "Mean Square Error: %.3f" % mse