Back

Record mouse and keyboard for automation scripts with Python

December 2, 2023 6 minute read
Home office desk with computers
Source: Pexels

In this article, we'll take a look at how to record mouse clicks and keyboard input with pynput then convert that to a PyAutoGUI automation script for playback.

Why build a mouse and keyboard recorder?

The short answer is to automate boring, time-consuming and repetitive tasks and let Python do them instead while you go enjoy a coffee ☕

PyAutoGUI is excellent for click and type automation tasks, but one of the weaknesses I found with it, is that it's difficult to 'record' a task and get the xy coordinates for the mouse clicks. There is an option to take screenshots and locate images within the screen but I could never get this to work accurately - mouse xy coordinates are much more reliable.

The documentation features a useful program that will constantly print out the position of the mouse cursor:

#! python3
import pyautogui, sys
print('Press Ctrl-C to quit.')
try:
    while True:
        x, y = pyautogui.position()
        positionStr = 'X: ' + str(x).rjust(4) + ' Y: ' + str(y).rjust(4)
        print(positionStr, end='')
        print('\b' * len(positionStr), end='', flush=True)
except KeyboardInterrupt:
    print('\n')

But then you'd have to find all the coordinates and script that up seperately, a tedious task!

Previously I explored Creating a screen and mouse jiggler with Python which was great for keeping the screen active.

Taking this a step further, actually recording the coordinates and also keyboard input then outputting that as a script, would be far better for automation tasks.

I checked out a few existing tools like record-and-play-pynput and pyautogui-mouse-record but none really satisfied what I was looking for, but they did give me a good start and inspiration.

How to run the recorder

You'll need to install a few Python packages first.

python -m pip install pynput pyautogui

Now let's go through step-by-step how to use this mouse and keyboard recorder.

  • Run python record.py to start the recording
  • To end the recording:
    • Hold right click for 2 seconds then release to end the recording for mouse.
    • Press 'ESC' to end the recording for keyboard.
    • Both are needed to finish recording.
    • The recorded mouse and keyboard actions will be saved as 'recording.json'
  • Run python convert.py to convert 'recording.json' into a PyAutoGUI script
    • The conversion will be saved as 'play.py'
  • Run python play.py to play back the actions 😄

All of the code can be found below or in the GitHub repo. Also, at the end there is a video demo of the recorder in action.

Record mouse and keyboard

The first step is to record the mouse and keyboard input. To do this, we are using pynput to listen for on press and on click, then storing those events as a dictionary in the recording list. Once both listeners are terminated, we store this in a file recording.json

record.py
"""
Records mouse and keyboard and outputs the actions
to a JSON file recording.json 

To begin recording:
- Run `python record.py`

To end recording:
- Hold right click for 2 seconds then release to end the recording for mouse.
- Press 'ESC' to end the recording for keyboard.
- Both are needed to finish recording.
"""
import time
import json
from pynput import mouse, keyboard

print("Hold right click for 2 seconds then release to end the recording for mouse")
print("Click 'ESC' to end the recording for keyboard")
print("Both are needed to finish recording")

recording = [] 
count = 0

def on_press(key):
    try:
        json_object = {
            'action':'pressed_key', 
            'key':key.char, 
            '_time': time.time()
        }
    except AttributeError:
        if key == keyboard.Key.esc:
            print("Keyboard recording ended.")
            return False

        json_object = {
            'action':'pressed_key', 
            'key':str(key), 
            '_time': time.time()
        }
        
    recording.append(json_object)


def on_release(key):
    try:
        json_object = {
            'action':'released_key', 
            'key':key.char, 
            '_time': time.time()
        }
    except AttributeError:
        json_object = {
            'action':'released_key', 
            'key':str(key), 
            '_time': time.time()
        }

    recording.append(json_object)
        

def on_move(x, y):
    if len(recording) >= 1:
        if (recording[-1]['action'] == "pressed" and \
            recording[-1]['button'] == 'Button.left') or \
            (recording[-1]['action'] == "moved" and \
            time.time() - recording[-1]['_time'] > 0.02):
            json_object = {
                'action':'moved', 
                'x':x, 
                'y':y, 
                '_time':time.time()
            }

            recording.append(json_object)


def on_click(x, y, button, pressed):
    json_object = {
        'action':'clicked' if pressed else 'unclicked', 
        'button':str(button), 
        'x':x, 
        'y':y, 
        '_time':time.time()
    }

    recording.append(json_object)

    if len(recording) > 1:
        if recording[-1]['action'] == 'unclicked' and \
           recording[-1]['button'] == 'Button.right' and \
           recording[-1]['_time'] - recording[-2]['_time'] > 2:
            with open('recording.json', 'w') as f:
                json.dump(recording, f)
            print("Mouse recording ended.")
            return False


def on_scroll(x, y, dx, dy):
    json_object = {
        'action': 'scroll', 
        'vertical_direction': int(dy), 
        'horizontal_direction': int(dx), 
        'x':x, 
        'y':y, 
        '_time': time.time()
    }

    recording.append(json_object)


def start_recording():
    keyboard_listener = keyboard.Listener(
        on_press=on_press,
        on_release=on_release)

    mouse_listener = mouse.Listener(
            on_click=on_click,
            on_scroll=on_scroll,
            on_move=on_move)

    keyboard_listener.start()
    mouse_listener.start()
    keyboard_listener.join()
    mouse_listener.join()


if __name__ == "__main__":
    start_recording()
    

Convert JSON output to PyAutoGUI script

Now we have the recording.json file, we can use that to convert it into a Python script. We are excluding mouse release and scroll events as these don't really help for the purposes of conversion.

convert.py
"""
Converts the recording.json file to a Python script 
'play.py' to use with PyAutoGUI.

The 'play.py' script may require editing and adapting 
before use.

Always review 'play.py' before running with PyAutoGUI!
"""
import json

key_mappings = {
    "cmd": "win",
    "alt_l": "alt",
    "alt_r": "alt",
    "ctrl_l": "ctrl",
    "ctrl_r": "ctrl"
}


def read_json_file():
    """
    Takes the JSON output 'recording.json'

    Excludes released and scrolling events to 
    keep things simple.
    """
    with open('recording.json') as f:
        recording = json.load(f)

    def excluded_actions(object):
        return "released" not in object["action"] and \
               "scroll" not in object["action"]

    recording = list(filter(excluded_actions, recording))

    return recording


def convert_to_pyautogui_script(recording):
    """
    Converts to a Python template script 'play.py' to 
    use with PyAutoGUI.

    Converts the:

    - Mouse clicks
    - Keyboard input
    - Time between actions calculated
    """
    if not recording: 
        return
    
    output = open("play.py", "w")
    output.write("import time\n")
    output.write("import pyautogui\n\n")
    
    for i, step in enumerate(recording):
        print(step)

        not_first_element = (i - 1) > 0
        if not_first_element:
            ## compare time to previous time for the 'sleep' with a 10% buffer
            pause_in_seconds = (step["_time"] - recording[i - 1]["_time"]) * 1.1 

            output.write(f"time.sleep({pause_in_seconds})\n\n")
        else:
            output.write("time.sleep(1)\n\n")

        if step["action"] == "pressed_key":
            key = step["key"].replace("Key.", "") if "Key." in step["key"] else step["key"]

            if key in key_mappings.keys():
                key = key_mappings[key]

            output.write(f"pyautogui.press('{key}')\n")
        
        if step["action"] == "clicked":
            output.write(f"pyautogui.moveTo({step['x']}, {step['y']})\n")

            if step["button"] == "Button.right":
                output.write("pyautogui.mouseDown(button='right')\n")
            else:
                output.write("pyautogui.mouseDown()\n")

        if step["action"] == "unclicked":
            output.write(f"pyautogui.moveTo({step['x']}, {step['y']})\n")

            if step["button"] == "Button.right":
                output.write("pyautogui.mouseUp(button='right')\n")
            else:
                output.write("pyautogui.mouseUp()\n")

    print("Recording converted. Saved to 'play.py'")


if __name__ == "__main__":
    recording = read_json_file()
    convert_to_pyautogui_script(recording)

As some of the keys from pynput don't correspond directly to PyAutoGUI, the key_mappings dictionary helps out with this. If you come across any more, you can add to this dictionary taking the pynput key and mapping it to the relevant PyAutoGUI keyboard keys.

Play the automation script

Once the conversion ends, play.py will contain a PyAutoGUI script that will look something like:

play.py
import time
import pyautogui

time.sleep(1)

pyautogui.press('win')
time.sleep(1)

pyautogui.press('f')
time.sleep(0.22220540046691897)

pyautogui.press('i')
time.sleep(0.10727632045745851)

pyautogui.press('r')
time.sleep(0.08800437450408936)

pyautogui.press('e')
time.sleep(0.5824827909469605)

pyautogui.press('f')
time.sleep(0.11989445686340333)

pyautogui.press('o')
time.sleep(0.22220461368560793)

pyautogui.press('x')
time.sleep(2.674463224411011)

pyautogui.moveTo(206, 219)
pyautogui.mouseDown()
time.sleep(0.07921419143676758)

pyautogui.moveTo(206, 219)
pyautogui.mouseUp()
time.sleep(5.592976307868958)

pyautogui.moveTo(522, 68)
pyautogui.mouseDown()
time.sleep(0.11439170837402345)

Here is a quick end-to-end video demo recording, converting then playing back an automation process - an example of opening Firefox, navigating to W3Schools, searching for Python, copying some code, then pasting it into Visual Studio Code. This uses left click, right click and keyboard input so applicable to a real-world scenario.

How to change video quality?
How to change video playback speed?

Final cut

Okay this was another fun Python automation article, now you know how to create a mouse and keyboard recorder with Python, and have a solid start to building more advanced robotic process automation (RPA) solutions with PyAutoGUI. You can refer to the documentation for more guidance on using PyAutoGUI and think about what else you might like to build 😄

Although there is functionality for controlling the mouse with pynput I still prefer to have a PyAutoGUI output script.

This program can be modified and adapted further to your needs. You could read in some data with pandas and then introduce a for loop to repeat an automation process for multiple inputs during playback.

If you enjoyed this article be sure to check out other articles on the site including:

Finally, if you have any questions or if you decide to use or extend this program, please leave a comment below. I'd love to know what you use it for and how it's helped you out 👍