PyDTL — Decision Tree Learning in Python

PyDTL is a simple Python library for Decision Tree Learning, Bagging and Random Forests.

Links

From here, you can:

Examples

Random Forest Example

The RandomForest constructor needs a training set, represented by a pydtl.LocalTable object, and a target attribute. Local tables can be dumped from a variety of database formats, including SQLite or CSV:

import pydtl
 
db = pydtl.SQLiteDB('observations.sqlite')
table = db.dump_table('events')
forest = pydtl.RandomForest(table, target='frequentation')

Grow the forest using the grow_trees() method (if you have pygraphviz installed you can see the result using draw(), or print it otherwise):

forest.grow_trees(42)
 
try:
    forest.draw()
except ImportError:
    print forest

Use the predict() method for new predictions:

square_errors = []
samples = table.sample_rows(42)
for inst in samples:
    y_pred = forest.predict(inst)
    y_real = inst['frequentation']
    square_errors.append((y_pred - y_real)**2)
 
mse = sum(square_errors) / len(square_errors)
print "Mean Square Error: %.3f" % mse