Google for Developers Korea Blog: TensorFlow Datasets 및 Estimators를 소개합니다.

Google for Developers Korea Blog

한국의 개발자들을 위한 Google for Developers 국문 블로그입니다.

TensorFlow Datasets 및 Estimators를 소개합니다.

2017년 9월 26일 화요일

여기

Datasets: 입력 파이프라인을 만드는(즉, 데이터를 프로그램으로 읽어오는) 완전히 새로운 방식입니다.

Estimators: TensorFlow 모델을 만드는 상위 수준(high-level)의 방식입니다. Estimators에는 일반 기계 학습 작업을 위해 미리 만들어둔 모델이 포함되어 있지만, 이런 모델을 사용하여 자신만의 맞춤 모델을 만들 수도 있습니다.

예제 모델여기꽃받침꽃잎

왼쪽에서 오른쪽으로: Iris setosa(제공: Radomil, CC BY-SA 3.0), Iris versicolor (제공: Dlanglois, CC BY-SA 3.0) 및 Iris virginica(제공: Frank Mayfield, CC BY-SA 2.0). float32

Datasets 소개

Datasets: 데이터세트를 생성하고 변환하는 메서드를 포함한 기본 클래스입니다. 이 클래스를 사용하여 메모리 내 데이터 또는 Python 생성기에서 데이터세트를 초기화할 수도 있습니다.

TextLineDataset: 텍스트 파일에서 줄을 읽어옵니다.

TFRecordDataset: TFRecord 파일에서 레코드를 읽어옵니다.

FixedLengthRecordDataset: 바이너리 파일에서 고정 크기 레코드를 읽어옵니다.

Iterator: 한 번에 하나의 데이터세트 요소에 액세스하는 방식을 제공합니다.

사용할 데이터세트

0 - Iris Setosa

1 - Versicolor

2 - Virginica

데이터세트 표현feature_names = [ 'SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth'] def input_fn(): ...<code>... return ({ 'SepalLength':[values], ..<etc>.., 'PetalWidth':[values] }, [IrisFlowerType])

첫 번째 요소는 각 입력 특징이 키이고 그 다음이 훈련 배치에 대한 값의 목록인 dict여야 합니다.

두 번째 요소는 훈련 배치에 대한 레이블 목록입니다.

input_fn

file_path: 읽어올 데이터 파일입니다.

perform_shuffle: 레코드 순서를 무작위로 할지 여부입니다.

repeat_count: 데이터세트에서 레코드에 대해 반복할 횟수입니다. 예를 들어, 1을 지정하면 각 레코드를 한 번만 읽어옵니다. None을 지정하면 영구적으로 계속 반복됩니다.

def my_input_fn(file_path, perform_shuffle=False, repeat_count=1): def decode_csv(line): parsed_line = tf.decode_csv(line, [[0.], [0.], [0.], [0.], [0]]) label = parsed_line[-1:] # Last element is the label del parsed_line[-1] # Delete last element features = parsed_line # Everything (but last element) are the features d = dict(zip(feature_names, features)), label return d dataset = (tf.contrib.data.TextLineDataset(file_path) # Read text file .skip(1) # Skip header row .map(decode_csv)) # Transform each elem by applying decode_csv fn if perform_shuffle: # Randomizes input using a window of 256 elements (read into memory) dataset = dataset.shuffle(buffer_size=256) dataset = dataset.repeat(repeat_count) # Repeats dataset this # times dataset = dataset.batch(32) # Batch size to use iterator = dataset.make_one_shot_iterator() batch_features, batch_labels = iterator.get_next() return batch_features, batch_labels

TextLineDataset: Dataset API는 파일 기반 데이터세트를 사용할 경우 많은 메모리 관리 작업을 자동으로 수행합니다. 예를 들어, 메모리보다 훨씬 큰 데이터세트 파일을 읽어오거나 목록을 인수로 지정하여 여러 파일을 읽어올 수 있습니다.

shuffle: buffer_size 레코드를 읽은 후 순서를 셔플(무작위 배정)합니다.

map: 데이터세트의 각 요소를 인수로 삼아 decode_csv 함수를 호출합니다(TextLineDataset를 사용하므로 각 요소가 CSV 텍스트의 한 줄이 됨). 그런 다음, 각 줄에 decode_csv 를 적용합니다.

decode_csv: 각 줄을 필드로 분할하고 필요한 경우 기본값을 제공합니다. 그런 다음, 필드 키 및 필드 값과 함께 dict를 반환합니다. 그러면 map 함수가 dict를 사용하여 데이터세트의 각 elem(줄)을 업데이트합니다.

next_batch = my_input_fn(FILE, True) # Will return 32 random elements # Now let's try it out, retrieving and printing one batch of data. # Although this code looks strange, you don't need to understand # the details. with tf.Session() as sess: first_batch = sess.run(next_batch) print(first_batch) # Output ({'SepalLength': array([ 5.4000001, ...<repeat to 32 elems>], dtype=float32), 'PetalWidth': array([ 0.40000001, ...<repeat to 32 elems>], dtype=float32), ... }, [array([[2], ...<repeat to 32 elems>], dtype=int32) # Labels )
Estimators 소개

미리 만든 Estimators - 미리 정의된 Estimators로, 특정 유형의 모델을 생성하려고 만든 것입니다. 이 블로그 게시물에서는 미리 만든 Estimators로서 DNNClassifier라는 것을 사용하겠습니다.

Estimator(기본 클래스) - model_fn 함수를 사용하여 모델 생성 방법을 완벽히 제어할 수 있습니다. 이 작업을 수행하는 방법은 별도의 블로그 게시물에서 설명해드릴 예정입니다.

input_fnmy_input_fn# Create the feature_columns, which specifies the input to our model. # All our input features are numeric, so use numeric_column for each one. feature_columns = [tf.feature_column.numeric_column(k) for k in feature_names] # Create a deep neural network regression classifier. # Use the DNNClassifier pre-made estimator classifier = tf.estimator.DNNClassifier( feature_columns=feature_columns, # The input features to our model hidden_units=[10, 10], # Two layers, each with 10 neurons n_classes=3, model_dir=PATH) # Path to where checkpoints etc are stored
모델 훈련# Train our model, use the previously function my_input_fn # Input to training is a file with training example # Stop training after 8 iterations of train data (epochs) classifier.train( input_fn=lambda: my_input_fn(FILE_TRAIN, True, 8))

lambda:
my_input_fn(FILE_TRAIN, True, 8)

input_fninput_fnlambdafile_path, shuffle setting,repeat_countmy_input_fn,

FILE_TRAIN - 훈련 데이터 파일입니다.

True - Estimators에 데이터를 셔플하도록 지시합니다.

8 - Estimators에 데이터세트를 8회 반복하도록 지시합니다.

훈련된 모델 평가evaluate# Evaluate our model using the examples contained in FILE_TEST # Return value will contain evaluation_metrics such as: loss & average_loss evaluate_result = estimator.evaluate( input_fn=lambda: my_input_fn(FILE_TEST, False, 4) print("Evaluation results") for key in evaluate_result: print(" {}, was: {}".format(key, evaluate_result[key]))model_dir=PATHmodel_dir=PATHDNNClassifier 훈련된 모델을 이용한 예측# Predict the type of some Iris flowers. # Let's predict the examples in FILE_TEST, repeat only once. predict_results = classifier.predict( input_fn=lambda: my_input_fn(FILE_TEST, False, 1)) print("Predictions on test file") for prediction in predict_results: # Will print the predicted class, i.e: 0, 1, or 2 if the prediction # is Iris Sentosa, Vericolor, Virginica, respectively. print prediction["class_ids"][0] 메모리 내 데이터를 기반으로 하는 예측FILE_TESTpredict# Let create a memory dataset for prediction. # We've taken the first 3 examples in FILE_TEST. prediction_input = [[5.9, 3.0, 4.2, 1.5], # -> 1, Iris Versicolor [6.9, 3.1, 5.4, 2.1], # -> 2, Iris Virginica [5.1, 3.3, 1.7, 0.5]] # -> 0, Iris Sentosa def new_input_fn(): def decode(x): x = tf.split(x, 4) # Need to split into our 4 features # When predicting, we don't need (or have) any labels return dict(zip(feature_names, x)) # Then build a dict from them # The from_tensor_slices function will use a memory structure as input dataset = tf.contrib.data.Dataset.from_tensor_slices(prediction_input) dataset = dataset.map(decode) iterator = dataset.make_one_shot_iterator() next_feature_batch = iterator.get_next() return next_feature_batch, None # In prediction, we have no labels # Predict all our prediction_input predict_results = classifier.predict(input_fn=new_input_fn) # Print results print("Predictions on memory data") for idx, prediction in enumerate(predict_results): type = prediction["class_ids"][0] # Get the predicted class (index) if type == 0: print("I think: {}, is Iris Sentosa".format(prediction_input[idx])) elif type == 1: print("I think: {}, is Iris Versicolor".format(prediction_input[idx])) else: print("I think: {}, is Iris Virginica".format(prediction_input[idx])Dataset.from_tensor_slides()TextLineDataset 덤으로 주어지는 이점# Replace PATH with the actual path passed as model_dir argument when the # DNNRegressor estimator was created. tensorboard --logdir=PATH

요약

이 블로그 게시물에 사용된 전체 소스 코드는 여기서 확인할 수 있습니다.

이 주제에 대해 Josh Gordon이 작성한 매우 유용한 Jupyter 노트북. 이 노트북을 참고하면 많은 다양한 유형의 특징(입력 값)이 있는 더욱 광범위한 예제의 실행 방법을 배울 수 있습니다. 여기서 다룬 모델을 떠올려보면 아시겠지만, 우리는 숫자 특징만 사용했습니다.

Datasets에 대해서는 프로그래머 가이드에 새로 추가된 장과 참조 문서를 살펴보세요.

Estimators에 대해서는 프로그래머 가이드에 새로 추가된 장과 참조 문서를 살펴보세요.

.blogimg img { width: 100%; border: 0; margin: 0; padding: 10px 0 10px 0; } .blogcptn { font-size: 85%; font-style: italic; text-align: center !important; } .kwd { font-weight: bold; } .com { font-style: italic; }

Contents

ML/Tensorflow
Android
Flutter
Web/Chrome
Cloud
Google Play
Community
Game
Firebase

검색

Tag

Archive

2025
- 5월
- 4월
- 3월
- 2월
- 1월

2024
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2023
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2022
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2021
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2020
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2019
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2018
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2017
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2016
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2015
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2014
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2013
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 4월
- 3월
- 2월
- 1월

2012
- 12월
- 11월
- 10월
- 9월
- 8월
- 7월
- 6월
- 5월
- 3월
- 2월
- 1월

2011
- 12월
- 11월

Feed

Google
Privacy
Terms