...

HTML
<input type="hidden" name="templateid" value="6d5df9e5-f557-4eab-b01f-9d5d52527323"/>

Page properties

id	Docu

LSoftware Manual - 1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head(L-1015e.Ax)
Document Title	LSoftware Manual - 1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head(L-1015e.Ax)
Document Type	Software Guide
Article Number	L-1015e.Ax
Release Date	XXXX/XX/XX
Is Branch of	Software Manual - phyCORE-i.MX 8M Plus AI Kit Guide (L-1015e.Ax) Head

Multiexcerpt include

MultiExcerptName	Legal 2022
PageWithExcerpt	Sections

...

Table of Contents

outline	true
style	none
printable	false

Introduction

This guide describes the tools and provides a few key elements to get your phyBOARD-Pollux AI kit started. These include:

creating a celebrity look-a-like demo and run inference on an NPU
quantize the Deep Learning model to run on an NPU

Orientation

If you want to build our BSP please continue to read here: Build the Phytec BSP

...

Note

icon	false
title	Note

For further information on the phyBOARD-Pollux, go to https://www.phytec.de/produkte/system-on-modules/phycore-imx-8m-plus/#downloads/. There you will find the following documentation:

QS Guide: A short guide on how to set up and boot a phyCORE board along with brief information on building a BSP, the device tree, and accessing peripherals.
Hardware Manual: A detailed description of the System on Module and accompanying carrier board.
Yocto Guide: A comprehensive guide for the Yocto version the phyCORE uses. This guide contains an overview of Yocto; introducing, installing, and customizing the PHYTEC BSP; how to work with programs like Poky and Bitbake; and much more.
BSP Manual: A manual specific to the BSP version of the phyCORE. Information such as how to build the BSP, booting, updating software, device tree, and accessing peripherals can be found here.
Development Environment Guide: This guide shows how to work with the Virtual Machine (VM) Host PHYTEC has developed and prepared to run various Development Environments. There are detailed step-by-step instructions for Eclipse and Qt Creator, which are included in the VM. There are instructions for running demo projects for these programs on a phyCORE product as well. Information on how to build a Linux host PC yourself is also a part of this guide.
Pinout Table: phyCORE SOMs have an accompanying pin table (in Excel format). This table will show the complete default signal path, from processor to carrier board. The default device tree muxing option will also be included. This gives a developer all the information needed in one location to make muxing changes and design options when developing a specialized carrier board or adapting a PHYTEC phyCORE SOM to an application.

On top of these standard manuals and guides, PHYTEC will also provide Product Change Notifications, Application Notes, and Technical Notes. These will be done on a case-by-case basis.

For more information or details regarding the phyCORE-i.MX 8M Plus / phyBOARD-Pollux, please go to our phyCORE i.MX 8M Plus Product page or contact the PHYTEC Sales department.

phyBOARD-Pollux Quickstart

For instructions on how to connect, boot up, and begin the demo on your kit, head to: L-1016e.A0 phyCORE-i.MX 8M Plus AI Kit Quickstart.

Prerequisites
Anchor
Prerequisites
Prerequisites

Before you begin with this manual, there are a few requirements that must be met.

Software

Your PC will need to be using:

Python 3.6+ environment (we recommend Anaconda with a virtual environment)
TensorFlow 2.x
numpy
opencv-phython
pandas (for preparation)
tflite_runtime
and the tf.keras-vggface model
If you use conda, you can clone the environment with this conda environment-file: here (latest: TF2.3envfile.yml)
To install the tflite_runtime, download this wheel file and install via pip install path_to_file. For example go to file directory where you downloaded your wheel file and run "pip install tflite_runtime-2.5.0-cp36-cp36m-linux_x86_64.whl".
Install mtcnn detector 0.1.1 instead of 0.1.0 to filter your images by embeddings comparison.

Downloads

There are a few files that can be downloaded to help with running the phyBOARD-Pollux AI kit:

The environment file to clone the environment can be found here (latest: TF2.3envfile.yml). You can easily install all required packages by first cloning the git repository and running "conda env create -f <path to yml file>".
If you intend to use the latest environment version, make sure to comment line 205 (keras-vggface-tf==0.7) and line 227 (tflite-runtime==2.5.0) in the yml file. Install the keras-vggface-tf model and the tflite-runtime from the links above.
You can find the model and installation instructions here.
The demo code, as described here, can be found here.
The demo as preinstalled on the device, running optimized in a GUI, can be found here.

Building the PHYTEC BSP

The BSP shipped with the phyCORE-i.MX 8M Plus AI Kit is based on the standard PHYTEC BSP of the phyCORE-i.MX 8M Plus. This means building the BSP with the help of our phyLinux script is quite similar to the standard BSP.

Note

title	Note

This kit is provided with a special SD-Card image, which might differ from our general purpose evaluation kit SD-Card Image based on the standard PHYTEC BSP. So if you want to use any of the other manuals provided for this kit, you might need to download the standard phytec image in order to ensure the functions described in those other manuals are available.

Get the PHYTEC BSP

Create a fresh project directory:

...

Code Block

host$ cd ~/yocto
host$ wget https://download.phytec.de/Software/Linux/Yocto/Tools/phyLinux
host$ chmod +x phyLinux
host$ MACHINE=phyboard-pollux-imx8mp-1 DISTRO=yogurt-vendor-xwayland ./phyLinux init -p topic -r PD-BSP-Yocto-CelebrityFaceMatch-i.MX8MP-v0.2

Start the Build

After you downloaded all the metadata with phyLinux init, you have to set up the shell environment variables. This needs to be done every time you open a new shell for starting builds. We use the shell script provided by Poky in its default configuration. From the root of your project directory type:

...

For more information and documentation of the BSP and our Yocto Distribution please take a look at the following two manuals:

Yocto Manual: Yocto Reference Manual (kirkstone) (L-813e.A13 Yocto Reference Manual (kirkstone)
i.MX 8MP BSP Manual: L-1017e.Ax BSP Manual - phyCORE-i.MX 8M Plus BSP Manual Head(L-1017e.Ax) Head#NPU

The Celebrity Face Match Demo
Anchor
The Celebrity Face Match Demo
The Celebrity Face Match Demo

Scroll Title

anchor	Celebrity Face Match Demo Visual
title	Celebrity Face Match Demo Visual

...

For more general information on the process we used, please check the section L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head Further Reading.

Preparation

Scroll Title

anchor	Preparation Flow
title	Preparation Flow

Image Modified

As shown in the block diagram above, we are using a pre-trained network. We are using a pre-trained network as the task of facial recognition has been very well accomplished by different research groups. We used the network from Refik Can Malli (rcmalli), which was originally trained by Q. Cao on the FaceVGG2 dataset. As rcmalli's model was written with TensorFlow 1.14.0 and Keras 2.2.4, we updated it to TensorFlow version 2.2.0. You can find the updated model here. However, we are still using the weights from the original model.

What are Embeddings

To identify a human face, we need to identify specific facial features such as the length of the nose, the distance between the eyes, the angle between nose and mouth, etc.

...

More information about embedding can be found here.

Creating Embeddings

The pre-trained network we are using gives us output values for 8631 classes as it was trained on this amount of classes.

...

Scroll Title

anchor	Input Image to rcmallis ResNet50
title	Input Image to rcmallis ResNet50

Image Modified

Scroll Title

anchor	[1,10,50,101,150,170] layer output as a block of six images, each block showing the 1st to 6th filter/neuron output
title	[1,10,50,101,150,170] layer output as a block of six images, each block showing the 1st to 6th filter/neuron output

Image Modified

As you can see, the information gets more detailed the deeper you go into the network. The following layer inputs are composed of combinations of the previous layer outputs. in the last layers (170 of 176), you can see the information reached almost pixel-level details.

...

With this truncated network, we can now create a library of embeddings of celebrity faces or faces of our known employees and compare them to new faces later.

Implementation

You can install the rcmalli model as described on their git. It will work the same way as the following method, however, you would have to work with a TensorFlow version < 1.15.3 and install Keras v2.2.4. We will continue with our updated version.

...

Code Block

language	py

1  from keras_vggface_TF.vggfaceTF import VGGFace
2  from keras_vggface_TF import utils
3  pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # pooling: None, avg or max

Gist File

Quantize your Model to int8

Tip

title	Tip

If you are not planning to run your model on an embedded device, you can proceed to section L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head. Create a Database.

More details on quantization can be found in the section Quantize Your Deep Learning Model to Run on an NPU.

For our model, we are using the NPU of NXP's i.MX 8M Plus. The NXP NPU requires the model to be a TFlite or PyTorch model. Either must be fully quantized to int8.

...

For more information on details on quantization, using different TensorFlow versions, or the ins and outs to be quantized, go to L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Headto Quantize Your Deep Learning Model to Run on an NPU.

Create a Database
Anchor
Create a Database
Create a Database

Note

title	Do you need license free images?

If you are looking to use this commercially, please read this article to learn how to tune a license-free dataset to perform better.

...

Code Block

language	py

#Removing the files

lowembeddings=[]
removelist=[]
for idx in EMBEDDINGS_temp.index:
    if EMBEDDINGS_temp.MeanEmb[idx] > 100:
        print(EMBEDDINGS_temp.MeanEmb[idx])
        print('removing: ' + str(path_crops / EMBEDDINGS_temp.Name[idx] / EMBEDDINGS_temp.File[idx]))
        os.remove(str(path_crops / EMBEDDINGS_temp.Name[idx] / EMBEDDINGS_temp.File[idx]))# Remove the file
        EMBEDDINGS_modified=EMBEDDINGS_temp.drop(EMBEDDINGS_temp.index[idx]) # Remove that line from the Embeddings
        lowembeddings.append(str(Path(path_crops / celeb / f))) #Add info to file

filename_csv= path_crops / 'modified.csv'
filename_json= path_crops / 'modified.json'
EMBEDDINGS_modified.to_csv(Path(Path.cwd() / filename_csv), index=False)
EMBEDDINGS_modified.to_json(Path(Path.cwd() / filename_json))
print(time.time()-time1)

Prepare the Dataset and Create an "Only Faces Dataset"

The model expects 224x224 sized images. We are also looking for facial embedding, so we first extract the faces from our dataset and resize them to 224x224. You can use any facial detection algorithm. A good example is MTCNN. As we will later use openCV for facial detection, we will also use an OpenCV variant with a haarcascade classifier here. You can find and download the classifier here.

...

Now we have a data set with cropped faces with dimensions of 224x224. This dataset can also be used for the representative_dataset() for the quantization method mentioned earlier.

Create Embeddings of Each Image in Your Database

For the final part of the preparation, we need to create embedding for each of the 10k facial images. We can use the quantized model we created earlier so that the embeddings are calculated with the same model we will use later for the live steam analysis.

...

Boot your device.
Connect a monitor, mouse, and keyboard.
Open a console and get your IP address with the Linux command:
Code Block
ip a
Use ssh to go to your device and copy data to it.
Go into the folder where your files are stored, then use scp command to copy the files to the device. ip_address is the address you retrieved.
Code Block
scp -r user@ip_address:./
This will copy the entire folder (due to -r) to the device home device.

Live Stream Analysis

Scroll Title

anchor	Live Stream Analysis Flow
title	Live Stream Analysis Flow

...

Load the embeddings from JSON
Load the model
Load the cascade classifier
Define the pre-processing function
Split your data
Set the video pipeline

Load the Embeddings from JSON

We use pandas to read the CSV, however, pandas are more difficult to implement via Yocto Linux on the embedded system. The JSON library is already implemented:

Code Block

language	py

1  import json 
2  f = open((embeddingpath + embeddingsfile),'r') 
3  ImportedData =json.load(f)
4  dataE=[np.array(ImportedData['Embedding'][str(i)]) for i in range(len(ImportedData['Name']))]
5  dataN=[np.array(ImportedData['Name'][str(i)]) for i in range(len(ImportedData['Name']))]
6  dataF=[np.array(ImportedData['File'][str(i)]) for i in range(len(ImportedData['Name']))]

Read JSON

Load the Model

Code Block

language	py

1  try: 
2      interpreter = tf.lite.Interpreter(model_path) 
3  except ValueError as e:
4      print("Error: Modelfile could not be found. Check if you are in the correct workdirectory. Errormessage: " + str(e))
5      #Depending on the version of TF running, check where lite is set :
6      if tf.__version__.startswith ('1.'):
7          print('lite in dir(tf.contrib)' + str('lite' in dir(tf.contrib)))
8      elif tf.__version__.startswith ('2.'):
9          print('lite in dir(tf)? ' + str('lite' in dir(tf)))
10
11  interpreter.allocate_tensors()
12
13  # Get input and output tensors.
14  input_details = interpreter.get_input_details()
15  output_details = interpreter.get_output_details()

Load the Model

Point to the Cascade Classifier

Code Block

language	py

1  #face_cascade = cv2.CascadeClassifier(cascaderpath + 'haarcascade_frontalface_alt.xml')
2  face_cascade = cv2.CascadeClassifier(cascaderpath + 'lbpcascade_frontalface_improved.xml')

...

We use the local binary pattern (LBP) classifier. If you stay with OpenCV, you can choose the haar-classifier which is better in performance but demands more resources. For an embedded device, we recommend the LBP version. However, optimizing the haar-cascader can also run smoothly on our system.

Set Pre-Processing Function

The pre-processing function that was used by rcmalli used a center pixel-mean algorithm. On your PC, you can import this function. However, if you are planning to run this on an embedded device, we recommend writing the function out so that there is less trouble including the function in your board support package (BSP).

...

Code Block

language	py

1  def preprocess_input(x, data_format): #Choose version same as in " 2-Create embeddings database.py or jupyter"
2      x_temp = np.copy(x)
3      if data_format is None:
4          data_format = tf.keras.backend.image_data_format()
5      assert data_format in {'channels_last', 'channels_first'}
6
7      if data_format == 'channels_first':
8         x_temp = x_temp[:, ::-1, ...]
9         x_temp[:, 0, :, :] -= 91.4953
10        x_temp[:, 1, :, :] -= 103.8827
11        x_temp[:, 2, :, :] -= 131.0912
12     else:
13        x_temp = x_temp[..., ::-1]
14        x_temp[..., 0] -= 91.4953
15        x_temp[..., 1] -= 103.8827
16        x_temp[..., 2] -= 131.0912
17
18     return x_temp

Pre-processing

Split Your Data

To split up the comparison of our embeddings vs the celebrity embeddings, we can split the data into x chunks. The embeddings from those threads have to be collected.

Code Block

language	py

1  def splitDataFrameIntoSmaller(df, chunkSize):
2      listOfDf = list()
3      numberChunks = len(df) // chunkSize + 1
4      for i in range(numberChunks):
5          listOfDf.append(df[i*chunkSize:(i+1)*chunkSize])
6      return listOfDf
7
8  def faceembedding(YourFace,CelebDaten):
9      Dist=[]
10     for i in range(len(CelebDaten.File)):
11         Celebs=np.array(CelebDaten.Embedding[i]) 
12         Dist.append(np.linalg.norm(YourFace-Celebs))
13     return Dist
14
15  def faceembeddingNP(YourFace,CelebDaten):
16      Dist=[]
17      for i in range(len(CelebDaten)):
18          Celebs=np.array(CelebDaten[i]) 
19          Dist.append(np.linalg.norm(YourFace-Celebs))
20      return Dist
21
22  # Split data for threadding
23  #-----------------------------------------------------------------------------
24  celeb_embeddings=splitDataFrameIntoSmaller(dataE, int(np.ceil(len(dataE)/4)))

Split Embeddings

Setting the Video Pipeline

If you are not on an embedded system, you can read in your webcam stream via OpenCV.

...

Code Block

language	py

1    videodev = 'video0'
2    buildinfo = cv2.getBuildInformation() 
3    if buildinfo.find("GStreamer") < 0: 
4        print('no GStreamer support in OpenCV') 
5        exit(0) 
6
7    path = os.path.join('/sys/bus/i2c/devices', '2-0010', 'driver')
8    if not os.path.exists(path):
9        return None
10
11   cmd = 'media-ctl -V "31:0[fmt:SGRBG8_1X8/1280x800 (4,4)/1280x800]"' 
12   ret = subprocess.call(cmd, shell=True)
13   cmd = 'media-ctl -V "22:0[fmt:SGRBG8_1X8/1280x800]"' 
14   ret = subprocess.call(cmd, shell=True)
15   cmd = 'v4l2-ctl -d0 -v width=1280,height=800,pixelformat=GRBG' #set size and format
16   ret = subprocess.call(cmd, shell=True)
17   cmd = 'v4l2-ctl -d0 -c vertical_flip=1' #If the image is flipped.
18   ret = subprocess.call(cmd, shell=True)
19   cmd = 'v4l2-ctl -c horizontal_blanking=2500'
20   ret = subprocess.call(cmd, shell=True)
21   cmd = 'v4l2-ctl -c digital_gain_red=1400' #If needed color corrections
22   ret = subprocess.call(cmd, shell=True)
23   cmd = 'v4l2-ctl -c digital_gain_blue=1700' #If needed color corrections
24   ret = subprocess.call(cmd, shell=True)
25
26   pipeline = 'v4l2src device=/dev/{video} ! appsink'.format(video=videodev)
27

Gist File1

Start the Live Stream

Call the Video Pipeline

We can now call the video pipeline with OpenCV and have a constant video stream. As the video stream from the MIPI camera is in the Bayer format, it has to be converted to RGB. You can also use the Gstreamer pipeline. However, we have seen the OpenCV is much faster.

Code Block

language	py

1  cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
2
3  while(True):
4  # CAPTURE FRAME BY FRAME 
5      ret, frame = cap.read() 
6      frame=cv2.cvtColor(frame, cv2.COLOR_BAYER_GB2RGB)
7      cv2.namedWindow('frame', cv2.WND_PROP_FULLSCREEN)
8      cv2.setWindowProperty('frame', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
9      cv2.imshow('frame', frame)

Get Live Stream

Find Faces in the Live Stream

Each frame is then analyzed for faces:

Code Block

language	py

1  #DECTECT FACE IN VIDEO CONTINUOUSLY 
2      faces_detected = face_cascade.detectMultiScale(frame, scaleFactor=1.2, minNeighbors=5)#, Size(50,50))
3      for (x,y,w,h) in faces_detected:
4           rechteck=cv2.rectangle(frame, (x-p, y-p+2), (x+w+p, y+h+p+2), (0, 255, 0), 2) 
5           #rechteck=cv2.rectangle(frame, (x-p, y-p+2), (x+int(np.ceil(height))+p, y+int(np.ceil(height))+p+2), (0, 0, 100), 2) 
6           cv2.imshow('frame', rechteck)

Analyzed Faces

After-button Press

Finding and Cropping Middle Faces

As soon as a button is pressed, we start to find the most middle face and use it by cropping it to 224x224:

Code Block

language	py

1   # DETECT KEY INPUT - ESC OR FIND MOST CENTERED FACE 
2       key = cv2.waitKey(1)
3       if key == 27: #Esc key
4            cap.release()
5            cv2.destroyAllWindows()
6            break
7       if key ==32: 
8          mittleres_Gesicht_X=()
9          mitte=()
10         if len(faces_detected) !=0: # only if the cascader detected a face, otherwise error
11            start1 = time()
12   #FIND MOST MIDDLE FACE 
13            for (x,y,w,h) in faces_detected:
14                mitte=np.append(mitte,(x+w/2)) 
15            mittleres_Gesicht_X = (np.abs(mitte - framemitte)).argmin()
16            print('detect middel face ' ,time()-start1)
17   # FRAME THE DETECTED FACE
18            start2=time()
19            #print(faces_detected[mittleres_Gesicht_X])
20            (x, y, w, h) = faces_detected[mittleres_Gesicht_X]
21            img=frame[y-p+2:y+h+p-2, x-p+2:x+w+p-2] #use only the detected face; crop it +2 to remove frame # CHECK IF IMAGE EMPTY (OUT OF IMAGE = EMPTY) 
22
23            if len(img) != 0: # Check if face is out of the frame, then img=[], throwing error
24                print('detect face ',time()-start2)
25
26   # CROP IMAGE 
27                start3=time()
28                if img.shape > (width,height): #downsampling
29                    img_small=cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA) #resize the image to desired dimensions e.g., 256x256 
30                elif img.shape < (width,height): #upsampling
31                    img_small=cv2.resize(img, (width, height), interpolation=cv2.INTER_CUBIC) #resize the image to desired dimensions e.g., 256x256 
32                cv2.imshow('frame',img_small)
33                cv2.waitKey(1) #hit any key
34                end3=time()
35                print('face crop', end3-start3)

Face Crop

Further Pre-Processing Found Face

After we crop the face, we have to do the same preprocessing as done on the training data and then feed it to the model to create the embedding:

Code Block

language	py

1   # IMAGE PREPROCESSING
2                   start4=time()
3                   if inputtype=='int':
4                       samples = np.expand_dims(img_small, axis=0)
5                       samples = preprocess_input(samples, data_format=None, version=3).astype('int8')#data_format= None, 'channels_last', 'channels_first' . If None, it is determined automatically from the backend
6                   else:
7                       pixels = img_small.astype('float32')
8                       samples = np.expand_dims( pixels, axis=0)
9                       samples = preprocess_input(samples, data_format=None, version=2)#data_format= None, 'channels_last', 'channels_first' . If None, it is determined automatically from the backend
10                  #now using the tflight model
11                  print('preprocess data for model' , time()-start4)
12   # CREATE FACE EMBEDDINGS 
13                  if Loadtype=='armNN':
14                      prep=time()
15                      input_tensors = ann.make_input_tensors([input_binding_info], [samples])
16                      # Get output binding information for an output layer by using the layer name.
17                      output_binding_info = parser.GetNetworkOutputBindingInfo(0, 'model/output')
18                      output_tensors = ann.make_output_tensors([output_binding_info])
19                      runtime.EnqueueWorkload(0, input_tensors, output_tensors)
20                      print('ANN preperation ',time()-prep)
21                      start42=time()
22                      EMBEDDINGS=ann.workload_tensors_to_ndarray(output_tensors)
23                  elif Loadtype=='TL':
24                      prep=time()
25                      input_shape = input_details[0]['shape']
26                      input_data = samples
27                      interpreter.set_tensor(input_details[0]['index'], input_data)
28                      interpreter.invoke()
29                      print('ANN preperation ',time()-prep)
30                      start42=time()
31                      EMBEDDINGS = interpreter.get_tensor(output_details[0]['index'])
32                  print('create face embeddings' , time()-start42)

Preprocess 2

Compare Embeddings

With the resulting embeddings, we can now compare them to existing embeddings. We do this using threading. This uses all four cores of our device and speeds up the process. This is one of the most demanding tasks in this demo.

Code Block

language	py

1   # READ CELEB EMBEDDINGS AND COMPARE 
2                   start_EU=time()
3                   EuDist=[]
4                   with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
5                       ergebniss_1=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[0]))
6                       ergebniss_2=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[1]))
7                       ergebniss_3=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[2]))
8                       ergebniss_4=executor.submit(faceembeddingNP,EMBEDDINGS,np.array(celeb_embeddings[3]))
9
10                  if ergebniss_1.done() & ergebniss_2.done() & ergebniss_3.done() & ergebniss_4.done():
11                      EuDist.extend(ergebniss_1.result())
12                      EuDist.extend(ergebniss_2.result())
13                      EuDist.extend(ergebniss_3.result())
14                      EuDist.extend(ergebniss_4.result())
15                  print('Create_EuDist', time()-start_EU)
16
17                  start_Min=time()
18                  idx = np.argpartition(EuDist, 5) 
19                  folder_idx= dataN[idx[0]]
20                  image_idx = dataF[idx[0]] 
21                  print('find minimum for facematch', time()-start_Min)

Get Minimum

Plot Results

Finally, we stitch our face together with the best matches and plot it. You can also implement a GUI here as well. For simplicity and better understanding, we have done a more "raw" version:

...

This finishes the demo description. On the platform itself, this demo is implemented with a GUI and the use of object-oriented programming. However, the basics are exactly the same.

Porting to Embedded Hardware

If you use a phyBOARD-Pollux kit, the needed software from NXP is already included in the BSP. NXP created eIQ, which facilitates the connection between the onboard NPU and the peripheral components. This is done in core with a tuned google NNAPI, which is capable of understanding TensorFlow lite models and Pytorch models. Therefore, after converting your model to TFlite and quantizing it, eIQ takes over.
If you have created your own AI model, you only have to copy the model and otherwise required files of your application onto the board. Here we suggest using the ssh protocol.

If you want to include your AI application or specific libraries into our BSP using Yocto Linux, that is of course also possible.

...

Quantize Your Deep Learning Model to Run on an NPU
Anchor
Quantize Your Deep Learning Model to Run on an NPU
Quantize Your Deep Learning Model to Run on an NPU

In this section, we explain which steps you have to take to transform and quantize your model with different TensorFlow versions. We are only looking into post-training quantization. The phyBOARD-Pollux incorporates i.MX 8M Plus, which features a dedicated neural network accelerator IP from VeriSilicon (Vivante VIP8000).

Scroll Title

anchor	NXP i.MX 8M Plus Block Diagram
title	NXP i.MX 8M Plus Block Diagram

Image Modified

As the neural processing unit (NPU) from NXP needs a fully int8 quantized model, we have to look into the full int8 quantization of a TensorFlow lite or PyTorch model. Both libraries are supported by the eIQ library from NXP. This manual only works with the TensorFlow variant. The general overview of how to do the post-training quantization can be found on the TensorFlow website.

Why does the NPU utilize int8 when most ANNs are trained in float32?

The operations for floating-point are more complex than for integers (arithmetic, avoiding overflow, etc.). This results in the ability to use only the much simpler and smaller arithmetic units instead of the larger floating-point units.

...

Lower power consumption
Less heat development
The ability to join more calculation units decreases inference time

Post-training Quantization with TensorFlow Version 2.x

Tip

title	Tip

Before you begin, make sure that you have met all of the L-1015e.Ax phyCORE-i.MX 8M Plus AI Kit Guide Head Prerequisites needed.

After you have created and trained a model via tf.keras, there are three possible ways of quantizing the model.

Method One - Directly Quantizing a Trained Model
Anchor
Method One - Directly Quantizing a Trained Model
Method One - Directly Quantizing a Trained Model

The trained TensorFlow model has to be converted into a TFlite model and can be directly quantized as described in the following code block. For the trained model, we explicitly used the updated tf.keras_vggface model based on the work or rcmalli. The transformation starts at line 28.

...

Scroll Title

anchor	TF2.3 Converted Model
title	TF2.3 Converted Model

Image Modified

However, if we do not set the inference_input_type and inference_output_type, our model changes to:

...

The effect is that you can determine which input data type the model accepts and returns. This can be important if you work with an embedded camera, such as the one included with your phyBOARD-Pollux AI kit. The MIPI camera returns 8bit values. So if you want to spare a conversion to float32, int8 input can be useful. Be aware, if you use a model without prediction layers to gain embeddings, an int8 output will result in very poor performance. We recommend an output of float32. This shows that each problem needs a specific solution.

Method Two and Three - Quantize a Saved Model from .h5 or .pb Files
Anchor
Method Two and Three - Quantize a Saved Model from .h5 or .pb Files
Method Two and Three - Quantize a Saved Model from .h5 or .pb Files

If you already have your model, you most likely have it saved somewhere either as a Keras .h5 file or a TensorFlow protocol buffer .pb. We can quickly save our model using TF2.3:

...

The conversion and quantization are very similar to Method One. The only difference is how well load the model in with the converter. Either load the model (see code block below) or continue as in Method One.

Code Block

language	py

1  #Load the h5 model 
2  pretrained_model = tf.keras.models.load_model('my_model.h5')
3  #Then use the converter as before
4  converter = tf.lite.TFLiteConverter.from_keras_model(pretrained_model)

...

Code Block

language	py

1  # Or load the pb file from the modelfolder 
2  # TensorFlow version >2.x
3
4  converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # the folder contains the pb file and a assets and variables folder

Converting from .pb

Converting with TensorFlow Versions Below 2.0

It is possible to convert a model written in TensorFlow version < 1.15.3 using Keras. However, not all options are available for TFlite conversion and quantization. The best way is to save the model with the TensorFlow version it was created in (for example, rcmalli keras-vggface was trained in TF 1.13.2). We suggest not using the "saving and freeze graph" method to create a .pb file as the .pb files differ between TF1 and TF2. The TFLiteConvert.from_saved_model does not work, creating several problems for quantization.

A better method is using Method Two or Three using Keras:

Code Block

language	py

1  import keras 
2  ...
3  pretrained_model.save('my_model.h5')

Then, convert and quantize your model with a TensorFlow version of 1.15.3 or newer, which has many functions that were added in preparation for TF2. We suggest using the latest version, which will result in the same models that were presented earlier.

Next Steps

PHYTEC provides several guides when it comes to customizing your software:

...

All PHYTEC manuals, as well as other information, can be found at https://www.phytec.de/produkte/development-kits/phyboard-pollux-ki-kit/#downloads/

Revision History

Date	Version Numbers	Changes in this Manual
26.11.2020	Manual L-1015e.A0	Preliminary Edition
29.08.2022	Manual L-1015e.A1	Upgraded to Regular Version Added PHYTEC documentation information PDF Version

...

Page tree

Page History

Versions Compared

Old Version 90

New Version Current

Key

Introduction

Orientation

phyBOARD-Pollux Quickstart

Prerequisites AnchorPrerequisitesPrerequisites

Software

Downloads

Building the PHYTEC BSP

Get the PHYTEC BSP

Start the Build

The Celebrity Face Match Demo AnchorThe Celebrity Face Match DemoThe Celebrity Face Match Demo

Preparation

What are Embeddings

Creating Embeddings

Implementation

Quantize your Model to int8

Create a Database AnchorCreate a DatabaseCreate a Database

Prepare the Dataset and Create an "Only Faces Dataset"

Create Embeddings of Each Image in Your Database

Live Stream Analysis

Load the Embeddings from JSON

Load the Model

Point to the Cascade Classifier

Set Pre-Processing Function

Split Your Data

Setting the Video Pipeline

Start the Live Stream

Call the Video Pipeline

Find Faces in the Live Stream

After-button Press

Finding and Cropping Middle Faces

Further Pre-Processing Found Face

Compare Embeddings

Plot Results

Porting to Embedded Hardware

Quantize Your Deep Learning Model to Run on an NPU AnchorQuantize Your Deep Learning Model to Run on an NPUQuantize Your Deep Learning Model to Run on an NPU

Why does the NPU utilize int8 when most ANNs are trained in float32?

Post-training Quantization with TensorFlow Version 2.x

Method One - Directly Quantizing a Trained Model AnchorMethod One - Directly Quantizing a Trained ModelMethod One - Directly Quantizing a Trained Model

Method Two and Three - Quantize a Saved Model from *.h5 or *.pb Files AnchorMethod Two and Three - Quantize a Saved Model from *.h5 or *.pb FilesMethod Two and Three - Quantize a Saved Model from *.h5 or *.pb Files

Converting with TensorFlow Versions Below 2.0

Next Steps

Further Reading AnchorFurther ReadingFurther Reading

Revision History

Prerequisites
Anchor
Prerequisites
Prerequisites

The Celebrity Face Match Demo
Anchor
The Celebrity Face Match Demo
The Celebrity Face Match Demo

Create a Database
Anchor
Create a Database
Create a Database

Quantize Your Deep Learning Model to Run on an NPU
Anchor
Quantize Your Deep Learning Model to Run on an NPU
Quantize Your Deep Learning Model to Run on an NPU

Method One - Directly Quantizing a Trained Model
Anchor
Method One - Directly Quantizing a Trained Model
Method One - Directly Quantizing a Trained Model

Method Two and Three - Quantize a Saved Model from .h5 or .pb Files
Anchor
Method Two and Three - Quantize a Saved Model from .h5 or .pb Files
Method Two and Three - Quantize a Saved Model from .h5 or .pb Files

Further Reading
Anchor
Further Reading
Further Reading