Equivalent of R's createDataPartition in Python -


I am trying to reproduce the behavior of the RC createDataPartition function in Python. My Boolean target variable is a dataset for learning the machine. I would like to split my dataset into a training set (60%) and a test set (40%).

If I make it completely random, then my goal will not be properly distributed between variable sets.

I am receiving it in R:

  inTrain & lt; - createDataPartition (y = Data $ Repeater, P = 0.6, List = F) Training & lt; - Data [InTrain,] test & lt; - Data [-Interrain,]  

How can I do this in Python?

PS: I am learning to learn my machine and scikit-learn as dragon pandals.

In skikit-learning, you sklearn.cross_validation import from scalin import datasets tool_test_split from tool train_test_split Receive a value for the use of value and weight. Meals to a person by X-Trader, X_test, y_train, y_test = train_test_split (table ['age')

 ',' Weight '], selects table [' food option '], test_size = 0.25) # Scalene other examples of using pre-loaded datasets : Iris = Datasets.load_iris () X_iris, y_iris = iris.data, iris.target X, y = X_iris [,,: 2], y_iris X_train, X_test, y_train, y_test = train_test_split (x, y)  

In this data

  • Input for training breaks
  • Input for assessment data
  • Training data Output output
  • output evaluation data for E

respectively. You can also add a keyword argument: test_size = 0.25 to change the percentage of data used for training and testing

To split a single dataset, you get 40% of the test data You can use a call like this for:

  & gt; & Gt; & Gt; Data = np .Range (700). Reparp ((100, 7))> gt; & Gt; & Gt; Training, testing = train_test_split (data, test_size = 0.4) & gt; & Gt; & Gt; Print Lane (Data) 100 & gt; & Gt; & Gt; Print Lane (Training) 60 & gt; & Gt; & Gt; Print LAN (test) 40  

Comments