<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.7.19 (Ruby 3.0.2) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-illyes-rep-purpose-00" category="info" consensus="true" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.23.2 -->
  <front>
    <title abbrev="REP purpose">Robots Exclusion Protocol User Agent Purpose Extension</title>
    <seriesInfo name="Internet-Draft" value="draft-illyes-rep-purpose-00"/>
    <author fullname="Gary Illyes">
      <organization>Google LLC.</organization>
      <address>
        <email>garyillyes@google.com</email>
      </address>
    </author>
    <date year="2024" month="October" day="18"/>
    <keyword>robotstxt</keyword>
    <abstract>
      <?line 37?>

<t>The Robots Exclusion Protocol defined in <xref target="RFC9309"/> specifies the user-agent
rule for targeting automatic clients either by prefix matching their
self-defined product token or by a global rule * that matches all clients.</t>
      <t>This document extends <xref target="RFC9309"/> by defining a new rule for targeting
automatic clients based on the clients' purpose for accessing the service.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>
        The latest revision of this draft can be found at <eref target="https://garyillyes.github.io/ietf-rep-purpose/draft-illyes-rep-purpose.html"/>.
        Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-illyes-rep-purpose/"/>.
      </t>
      <t>Source for this draft and an issue tracker can be found at
        <eref target="https://github.com/garyillyes/ietf-rep-purpose"/>.</t>
    </note>
  </front>
  <middle>
    <?line 46?>

<section anchor="introduction">
      <name>Introduction</name>
      <t>(fill in)</t>
    </section>
    <section anchor="specification">
      <name>Specification</name>
      <t>We define user-agent-purpose as the new rule with a predefined set of
values. The values are registered with IANA at ...
Below is an Augmented Backus-Naur Form (ABNF) description, as described
in <xref target="RFC5234"/>.</t>
      <artwork><![CDATA[
purpose = *WS "user-agent-purpose" *WS ":" *WS purpose-token NL
purpose-token = "EXAMPLE-PURPOSE-1" /"EXAMPLE-PURPOSE-2" / "EXAMPLE-PURPOSE-3" ; but check IANA for full list
NL = %x0D / %x0A / %x0D.0A
WS = %x20 / %x09
]]></artwork>
      <section anchor="user-agent-purpose">
        <name>user-agent-purpose</name>
        <t>The <tt>user-agent-purpose</tt> rule is semantically equivalent to the
<tt>user-agent</tt> rule defined in Section 2.2.1. of <xref target="RFC9309"/>. As the
<tt>user-agent</tt> rule, <tt>user-agent-purpose</tt> acts as a starter of rule
groups.</t>
      </section>
      <section anchor="user-agent-purpose-tokens">
        <name>user-agent-purpose tokens</name>
        <t>The <tt>user-agent-purpose</tt> token <bcp14>MUST</bcp14> be a substring of the
identification string that the automatic client sends to the service.
For example, in the case of HTTP <xref target="RFC9110"/>, the purpose token <bcp14>MUST</bcp14> be
a substring in the User-Agent header, along with the product token.
Here's an example of a User-Agent HTTP request header with the
purpose token by the product token:</t>
        <artwork><![CDATA[
User-Agent: Mozilla/5.0 (compatible; ExampleBot/0.1; ExamplePurpose; https://www.example.com/bot.html)
]]></artwork>
        <t>The purpose token <bcp14>MUST</bcp14> be one of the tokens registered with IANA.
Unrecognized tokens <bcp14>MAY</bcp14> be discarded by parsers. Crawlers <bcp14>MUST</bcp14> use
case-insensitive matching to find the group that matches the purpose
token and obey the rules of the group. If there's a group that
matches the product token of the automatic client, the client <bcp14>SHOULD</bcp14>
obey that group. If no matching group exists, crawlers <bcp14>MUST</bcp14> obey the
group with a user-agent line with the "*" value, if present.
If there is more than one group matching the <tt>user-agent-purpose</tt>,
the matching groups' rules <bcp14>MUST</bcp14> be combined into one group and parsed
according to Section X.</t>
      </section>
    </section>
    <section anchor="conventions-and-definitions">
      <name>Conventions and Definitions</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
      <?line -18?>

</section>
    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>The security considerations are the same as in the parent <xref target="RFC9309"/>.</t>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>The vocabulary used as purpose tokens are registered at IANA-URL.</t>
    </section>
    <section anchor="examples">
      <name>Examples</name>
      <artwork><![CDATA[
# robots.txt with purpose
# FooBot and all bots that are crawling for EXAMPLE-PURPOSE-1 are disallowed.
User-Agent: FooBot
User-Agent-Purpose: EXAMPLE-PURPOSE-1
Disallow: /
# EXAMPLE-PURPOSE-2 crawlers are allowed.
User-Agent-Purpose: EXAMPLE-PURPOSE-2
]]></artwork>
    </section>
  </middle>
  <back>
    <references anchor="sec-combined-references">
      <name>References</name>
      <references anchor="sec-normative-references">
        <name>Normative References</name>
        <reference anchor="RFC9110">
          <front>
            <title>HTTP Semantics</title>
            <author fullname="R. Fielding" initials="R." role="editor" surname="Fielding"/>
            <author fullname="M. Nottingham" initials="M." role="editor" surname="Nottingham"/>
            <author fullname="J. Reschke" initials="J." role="editor" surname="Reschke"/>
            <date month="June" year="2022"/>
            <abstract>
              <t>The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document describes the overall architecture of HTTP, establishes common terminology, and defines aspects of the protocol that are shared by all versions. In this definition are core protocol elements, extensibility mechanisms, and the "http" and "https" Uniform Resource Identifier (URI) schemes.</t>
              <t>This document updates RFC 3864 and obsoletes RFCs 2818, 7231, 7232, 7233, 7235, 7538, 7615, 7694, and portions of 7230.</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="97"/>
          <seriesInfo name="RFC" value="9110"/>
          <seriesInfo name="DOI" value="10.17487/RFC9110"/>
        </reference>
        <reference anchor="RFC9309">
          <front>
            <title>Robots Exclusion Protocol</title>
            <author fullname="M. Koster" initials="M." surname="Koster"/>
            <author fullname="G. Illyes" initials="G." surname="Illyes"/>
            <author fullname="H. Zeller" initials="H." surname="Zeller"/>
            <author fullname="L. Sassman" initials="L." surname="Sassman"/>
            <date month="September" year="2022"/>
            <abstract>
              <t>This document specifies and extends the "Robots Exclusion Protocol" method originally defined by Martijn Koster in 1994 for service owners to control how content served by their services may be accessed, if at all, by automatic clients known as crawlers. Specifically, it adds definition language for the protocol, instructions for handling errors, and instructions for caching.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="9309"/>
          <seriesInfo name="DOI" value="10.17487/RFC9309"/>
        </reference>
        <reference anchor="RFC2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner"/>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba"/>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </references>
      <references anchor="sec-informative-references">
        <name>Informative References</name>
        <reference anchor="RFC5234">
          <front>
            <title>Augmented BNF for Syntax Specifications: ABNF</title>
            <author fullname="D. Crocker" initials="D." role="editor" surname="Crocker"/>
            <author fullname="P. Overell" initials="P." surname="Overell"/>
            <date month="January" year="2008"/>
            <abstract>
              <t>Internet technical specifications often need to define a formal syntax. Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power. The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges. This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="68"/>
          <seriesInfo name="RFC" value="5234"/>
          <seriesInfo name="DOI" value="10.17487/RFC5234"/>
        </reference>
      </references>
    </references>
    <?line 138?>

<section numbered="false" anchor="acknowledgments">
      <name>Acknowledgments</name>
      <t>TODO acknowledge.</t>
    </section>
  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA31X627bRhb+P09xlsaiSWDSkuNiG6ZpK9tyYkC+rC9IiyJA
h+RIIkxxtJyhZTfIPkufpU+235khJTJS1j8s8syc23euDMNQ2NwWKqbgRifa
Gho/pUVtcl3SdaWtTnVB90ZVNJqp0tJ1XS21UbhlVcm3AiGTpFKPLGB8TUt/
HohUWjXT1XNMeTnVQmQ6LeUCerJKTm2YF8WzMmGllmHDEg4GwtTJIjcs1j4v
cfd8fHdGtEeyMBoK8jJTS4V/pQ32KVBZbnWVy4JfzkfH+NEVnm7uzgJR1otE
VbHIYEdMh4PDo3A4CIc/iFSXBqbXJiZb1UrA8tcCKiolYxrdjEd4WenqYVbp
ehnTx/f0EW95OaP3TBEP6hnHWSwopMpBZp+seFRlrWKw0pqPX7wbfQEgL2Re
8JVf1JNcLAsVpXrBdFml85jm1i5NfHDQOTyAOIjO7bxOAMRMVs8ewYNc2WkX
xgD3CvhsLO61kjb3Iy8jyvUW58G3IhPN7aIIhJC1neuKHYcOomldFD6kwXvI
p3PHGLgzXc1kmf8pLUIZ03utZ4WiyeQkcqfK+99x45eZu8KuQlGpqwVYHwEo
0c3ZyZvhcNA+vh68iYXgnOrf+f7w9REOwjAkmRhbydQKcTdX9O20ztQ0L1WG
BKXfG9GfyCxVmk9zZciCuUbmh5IzX1Q1XIBWsrKaKcvxBCCajUgpLXLcMaSA
LmoleaZlBelPiLRN53wX9LwSRhXTsNW7rHRWp5asflAlpy7YJM0KnciCnLpX
YJPWC4FFsihaTRE7lxtCWdULLkzFBZmZjieQ5jQ5S6lUK9p2QWy7kEgD24AT
+98Qv2vr2nHLNFUoUu8UAaHHPFWRx36RZ1mhBJL5vLTeP2AuxIspAg2kX/LR
rcc4lf7so2pC0YG7zT2SPhBr81dAGO4A3hZGoyzpqXiURY38Jo65f+aapkrN
cmMVrnvW89HliIBpFEXiWBV6RUBRljSqZ4wjrh3L9KE24aWsKzpDltGL0fHl
2UvYaNIqX7LN+2yWf09UJpoE4hT8BBz+u/4TrRfv6NXHWwq2/Qv8Qex/21bo
M+JyIvqEdxSMfx1dXE/G4fX9zfXV7TgcBnSwRTwEcfvq64DeUlJbQi6lDx4I
DifXMRUASVxOoOKfT4NTsONn5H9Oo8FIwDg+Ohx42puuk2Jvb0fkfPX9sX3w
hw8kYDdoBCVyD3n9TOo/dY64cS5bzTEXHd6Gp1Oxt8plFh1Gh9EwQgJsMj+i
kdktYH+3PegVhiMqyaAykCwsju8L18y52Ha66CvX/B9Pfdwu7m/vKFEsv+be
xKUDDWxizsNsXQvUHLqq57T/ujwBGRe5B2hTekhTaqbFPoPjShd1zFo+3N1d
e2zQRT/tu7Oe/a15omteI4Qnf+gn/1zJTFXI/ELj3NWSE9VtYpH4gEr7zhVU
Yw+bILtynD0Voo0R1QhdSxN9w9DBtlTEvfrayI3pQv+JHiMPvo8G9AKDZAnc
kkK9Ret3lhxrezCIhuv3ZpV5u563q9Uq6s5czA03+l72kv3uW/ihZ6omrE1i
7Gw+kbgvK5XqGeYj6M3Ni9FvLCLLTSqrDHQeIbKCe2hpJ5VcFXjympBngoMb
5m6RyXkIduaMJtRI5qxw2dsfIZ3oC2+9xGWdKA81J71pnXDsEZ27Nx/WjkjR
E9kfZdOdybvfGSl0++HqfnIqGs2wcKOt1Bt3vD71BBjNPqU9IFqrfZW2k2FT
hmhqpdpkavAq8IMBJTLlAQL0bCRa97ghLTR+YU3pYunFdif4ziLfF3zStxgT
00PZ5gbyKWlaFyK0kc7ouzhnAmMVe2UTw7a9/cq9h050+ch9Aqur4zh1c929
+4TEUspLK1pDwBp5IXaaL6/c88343/fnN+NTfr79MJpM1g+iueHDsXnacJ5c
XVyML089M6jUI4kAmYsTtiq4ur47v7ocTQLfPrrrCY9iuAUkAICqgD6PWmnE
eooyz/HJ9d9/DY/o8+d/oGEdDodvvnxpXn4Y/usIL6u5Kr02XWJo+Ffg/yzk
cqlkxVLcmiSXucVng5vUZq5XJXGQgear3xmZTzH9mKTL4dFPDYEd7hFbzHpE
h9k2ZYvZg7iDtEPNGs0e/Suk+/aOfuu9t7h3iD/+7JIfHzw//yTczqXSusrt
M+eSwdSpZCd9THuY9g591Pgcaz4D2UwFJCzHdDNw3b7H+8Qu4Y86lUld8CdC
bVzQ+/1za09DM2Bh4f3NxElu+rXptuG95tsrwseXL/G2q+1hadPo9S5JOBXc
+u9aDOtxLYSLjFefrW3KXUETBp9eqSzqDRgvt0MKmxESbwsSp42QmA7Yha83
tE0rY4071H1b9mFvHPHSnWBjZaBG6UOpITVzq6wRn2P/Fayyd8EUxaCCLwjJ
1ekVNp72Jkrif8i/50cFEAAA

-->

</rfc>
