Skip to content

Stats

anonymize_email(email_address)

Split the email address into it's identity and domain parts, then return the secure hash of the identity and the domain. Bcrypt is used to securely hash the identity, using the domain as the salt.

Parameters:

Name Type Description Default
email_address

the email address

required

Returns:

Type Description

a 2-tuple of the email address and the domain

Source code in ckanext/query_dois/lib/stats.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def anonymize_email(email_address):
    """
    Split the email address into it's identity and domain parts, then return the secure
    hash of the identity and the domain. Bcrypt is used to securely hash the identity,
    using the domain as the salt.

    :param email_address: the email address
    :returns: a 2-tuple of the email address and the domain
    """
    if email_address is None:
        return None, None

    email_address = email_address.lower()
    # figure out the domain from the email address
    try:
        domain = email_address[email_address.index('@') + 1 :]
    except ValueError:
        # no @ found, just use the whole string
        domain = email_address

    # create a custom salt by base64 encoding the domain and then trimming the whole thing to 22
    # characters (which is bcrypt's required salt length). Note that we fill the right side of the
    # domain with dots to ensure it's at least 18 characters in length. This is necessary as we need
    # to ensure that the base64 encode result is at least 22 characters long and 18 is the minimum
    # input length necessary to create a base64 encoding result of at least 22 characters.
    salt = b'$2b$12$' + base64.b64encode(domain.zfill(18).encode('utf-8'))[:22]
    return bcrypt.hashpw(email_address.encode('utf-8'), salt), domain

record_stat(query_doi, action, email_address=None, domain=None, identifier=None)

Creates a new QueryDOIStat object and saves it to the database.

Parameters:

Name Type Description Default
query_doi

the QueryDOI object against which the stat should be stored

required
action

the action that occurred to trigger this stat (for example: "download")

required
email_address

the email address of the user performing the action

None
domain

an alternate domain name if email not specified

None
identifier

an alternate identifier if email not specified

None

Returns:

Type Description

a new QueryDOIStat object

Source code in ckanext/query_dois/lib/stats.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def record_stat(query_doi, action, email_address=None, domain=None, identifier=None):
    """
    Creates a new QueryDOIStat object and saves it to the database.

    :param query_doi: the QueryDOI object against which the stat should be stored
    :param action: the action that occurred to trigger this stat (for example:
        "download")
    :param email_address: the email address of the user performing the action
    :param domain: an alternate domain name if email not specified
    :param identifier: an alternate identifier if email not specified
    :returns: a new QueryDOIStat object
    """
    if email_address:
        identifier, domain = anonymize_email(email_address)
    if identifier is None:
        # just a random uuid if nothing else is specified, so we don't end up grouping
        # many unrelated users together under the identifier of "None"
        identifier = uuid.uuid4().hex
    stat = QueryDOIStat(
        doi=query_doi.doi,
        action=action,
        domain=domain,
        identifier=identifier,
        timestamp=datetime.now(),
    )
    stat.save()
    return stat