nupicのSpatial PoolerとopenCVを用いて、声優の愛美と山崎はるかの顔照合してみた！

今回はpythonと機械学習のHTM（Hierarchical Temporal Memory）のSP層とopenCVの画像処理を用いて声優の愛美さん（あいみん）と山崎はるかさん（ぴょん吉）の顔照合を簡単にしてみました！

目的

機械学習のHTMを用いた顔照合

実験方法

使用したもの

・HTMのSpatial Pooler

・openCV

・Python

・画像

学習用

f:id:hiro-htm877:20190208222759p:plain

テスト用

f:id:hiro-htm877:20190208222925p:plain

実験の流れ

1.　画像を５０＊５０にresizeする。

2.　グレースケールに変換する。

3.　HTMへ入力するためのエンコードを行う。

・エンコード方法

５０＊５０の画像データの１データは０〜２５５の２５６段階で表現されています。

そこで今回は、入力する次元数を減らすために閾値を１２８として、閾値以下を”０”、以上を”１”としてエンコードを行いました。

例：

変換前：[46, 56, 100, 130, 136, 60, 48, 70, 140, 165, 80, 30 ]

変換後：[0 , 0 , 0 , 1 , 1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 ]

4.　HTMであいみんとぴょん吉の画像一枚を50回学習させる。

5.　あいみんの画像2枚で顔照合のですとを行なった。

では、Let's GO 実験！

ソースコード


import numpy as np
import sys
import cv2
from PIL import Image
from nupic.encoders.category import CategoryEncoder
from nupic.algorithms.spatial_pooler import SpatialPooler

def resize(filename, w, h):
    imageName = filename
    img = Image.open(imageName)
    img_resize_lanczos = img.resize((w, h), Image.LANCZOS)
    savepath = 're'+filename
    img_resize_lanczos.save(savepath)
    img_resize = cv2.imread(savepath)
    return img_resize


def encodeImage(img):
    for i, y in enumerate(img):
        for j, x in enumerate(y):
            # print x
            if x >= 128:
                img[i][j] = 1
            else:
                img[i][j] = 0
    return img

imagePath = 'aimi.jpg'
imagePath2 = 'pyon.jpg'
imagePath3 = 'aimi2.jpg'
imagePath4 = 'aimi3.jpg'

img_resize = resize(imagePath, 50, 50)
img_resize2 = resize(imagePath2, 50, 50)
img_resize3 = resize(imagePath3, 50, 50)
img_resize4 = resize(imagePath4, 50, 50)

# グレースケール化
img_gray = cv2.cvtColor(img_resize, cv2.COLOR_BGR2GRAY)
img_gray2 = cv2.cvtColor(img_resize2, cv2.COLOR_BGR2GRAY)
img_gray3 = cv2.cvtColor(img_resize3, cv2.COLOR_BGR2GRAY)
img_gray4 = cv2.cvtColor(img_resize4, cv2.COLOR_BGR2GRAY)

encodedList = encodeImage(img_gray)
encodedList2 = encodeImage(img_gray2)
encodedList3 = encodeImage(img_gray3)
encodedList4 = encodeImage(img_gray4)


sp = SpatialPooler(inputDimensions=(50*50),
                   columnDimensions=(3),
                   potentialRadius=500,
                   numActiveColumnsPerInhArea=1,
                   globalInhibition=True,
                   synPermActiveInc=0.03,
                   potentialPct=1.0)

print "------------"*4
for column in xrange(3):
    connected = np.zeros((50*50), dtype="int")
    #print connected
    sp.getConnectedSynapses(column, connected)
    print connected

print encodedList
print encodedList2
for y in encodedList:
    print y
print "=========================-"
for y in encodedList2:
    print y
# sys.exit()

output = np.zeros((3), dtype="int")
print "------------"*4
print "20回の学習"
for _ in xrange(50):
    sp.compute(encodedList, learn=True, activeArray=output)
    sp.compute(encodedList2, learn=True, activeArray=output)

for column in xrange(3):
    connected = np.zeros((50*50), dtype="int")
    sp.getConnectedSynapses(column, connected)
    print connected.tolist()

sp.compute(encodedList, learn=False, activeArray=output)
print "output1", output
sp.compute(encodedList2, learn=False, activeArray=output)
print "output2", output
sp.compute(encodedList3, learn=False, activeArray=output)
print "output3", output
sp.compute(encodedList4, learn=False, activeArray=output)
print "output4", output

出力結果

------------------------------------------------
50回の学習後

output1 [0 0 1]　入力：あいみん　予測：あいみん
output2 [1 0 0]　入力：ぴょん吉　予測：ぴょん吉
output3 [0 0 1]　入力：あいみん　予測：あいみん
output4 [0 0 1]　入力：あいみん　予測：あいみん

結果、ちゃんと顔照合できました！

今回はデータが少し少なかったので、次回はもう少し増やして、せめて５枚ぐらいの画像でテストを行いたいと思います。

苦労した点としてはSpatial Poolerのパラーメータ設定で、

sp = SpatialPooler(inputDimensions=(50*50), columnDimensions=(3), potentialRadius=500, numActiveColumnsPerInhArea=1, globalInhibition=True, synPermActiveInc=0.03, potentialPct=1.0)

これの、potentialRadius=500,

をうまく設定しないと、照合がうまくいきませんでした！

potentialRadius - （int）このパラメータは、各列が潜在的に接続できる入力の範囲を決定します。これは、各列に表示される入力ビット、または視野の「receptiveField」と考えることができます。十分に大きい値は「グローバルカバレッジ」になります。つまり、各列はすべての入力ビットに接続される可能性があります。このパラメータは、正方形（またはハイパースクエア）の領域を定義します。列の長さは2 * potentialRadius + 1の最大自乗ポテンシャルプールになります。デフォルト16。

簡単にまとめると、potentialRadiusが小さいとシナプスのつながりが少なくなります。

つまり、入力データのサイズが大きくてpotentialRadiusが小さいと遠くのシナプスとの繋がれないため、入力データ全体を表現出来なくなるということです。

以上です。

f:id:hiro-htm877:20190208224053j:plain