I am trying to reproduce the behavior of the RC createDataPartition function in Python. My Boolean target variable is a dataset for learning the machine. I would like to split my dataset into a training set (60%) and a test set (40%).
If I make it completely random, then my goal will not be properly distributed between variable sets.
I am receiving it in R:
inTrain & lt; - createDataPartition (y = Data $ Repeater, P = 0.6, List = F) Training & lt; - Data [InTrain,] test & lt; - Data [-Interrain,]
How can I do this in Python?
PS: I am learning to learn my machine and scikit-learn as dragon pandals.
In skikit-learning, you sklearn.cross_validation import from scalin import datasets tool_test_split from tool train_test_split Receive a value for the use of value and weight. Meals to a person by X-Trader, X_test, y_train, y_test = train_test_split (table ['age')
',' Weight '], selects table [' food option '], test_size = 0.25) # Scalene other examples of using pre-loaded datasets : Iris = Datasets.load_iris () X_iris, y_iris = iris.data, iris.target X, y = X_iris [,,: 2], y_iris X_train, X_test, y_train, y_test = train_test_split (x, y)In this data
- Input for training breaks
- Input for assessment data
- Training data Output output
- output evaluation data for E
respectively. You can also add a keyword argument: test_size = 0.25 to change the percentage of data used for training and testing
To split a single dataset, you get 40% of the test data You can use a call like this for:
& gt; & Gt; & Gt; Data = np .Range (700). Reparp ((100, 7))> gt; & Gt; & Gt; Training, testing = train_test_split (data, test_size = 0.4) & gt; & Gt; & Gt; Print Lane (Data) 100 & gt; & Gt; & Gt; Print Lane (Training) 60 & gt; & Gt; & Gt; Print LAN (test) 40
Comments
Post a Comment